gpt4 book ai didi

python - Pandas read_html() 缺少列

转载 作者:行者123 更新时间:2023-11-28 02:46:21 32 4
gpt4 key购买 nike

我正在使用以下 read_html() 调用来读取表格(在付费专区后面):

df = pd.read_html('http://markets.ft.com/data/equities/tearsheet/' + 
'financials?s=BAG:LSE&subView=BalanceSheet&periodType=a')[0]

除了缺少最后两列之外,它解析得很好。我正在使用最新版本的 Anaconda(Python 3.5、pandas 0.18.1、html5lib、BeautifulSoup4)。

输出的开头如下所示:

                Fiscal data as of Jan 30 2016  2016    2015    2014
ASSETS NaN NaN NaN
Cash And Short Term Investments 6.80 25 13
Total Receivables, Net 50 49 45
Total Inventory 16 17 16

(太大无法全部显示)

HTML 的开头如下所示:

<table class="mod-ui-table">
<thead>
<tr>
<th class="mod-ui-table__header--text">Fiscal data as of Jan 30 2016</th>
<th>2016</th>
<th class="mod-ui-hide-xsmall">2015</th>
<th class="mod-ui-hide-xsmall">2014</th>
<th class="mod-ui-hide-xsmall">2013</th>
<th class="mod-ui-hide-xsmall">2012</th>
</tr>
</thead>
<tr class="mod-ui-table__row--section-header">
<th colspan="6">ASSETS</th>
</tr>
<tr class="mod-ui-table__row--striped">
<th class="mod-ui-table__header--row-label">Cash And Short Term Investments</th>
<td>6.80</td>
<td class="mod-ui-hide-xsmall">25</td>
<td class="mod-ui-hide-xsmall">13</td>
<td class="mod-ui-hide-xsmall">0.91</td>
<td class="mod-ui-hide-xsmall">8.29</td>
</tr>
<tr>
<th class="mod-ui-table__header--row-label">Total Receivables, Net</th>
<td>50</td>
<td class="mod-ui-hide-xsmall">49</td>
<td class="mod-ui-hide-xsmall">45</td>
<td class="mod-ui-hide-xsmall">42</td>
<td class="mod-ui-hide-xsmall">37</td>
</tr>

HTML 的结尾如下所示:

<tr class="mod-ui-table__row--highlight">
<th class="mod-ui-table__header--row-label">Total liabilities &amp; shareholders&#39; equity</th>
<td>269</td>
<td class="mod-ui-hide-xsmall">255</td>
<td class="mod-ui-hide-xsmall">227</td>
<td class="mod-ui-hide-xsmall">215</td>
<td class="mod-ui-hide-xsmall">196</td>
</tr>
<tr class="mod-ui-table__row--striped">
<th class="mod-ui-table__header--row-label">Total common shares outstanding</th>
<td>117</td>
<td class="mod-ui-hide-xsmall">117</td>
<td class="mod-ui-hide-xsmall">117</td>
<td class="mod-ui-hide-xsmall">117</td>
<td class="mod-ui-hide-xsmall">117</td>
</tr>
<tr>
<th class="mod-ui-table__header--row-label">Treasury shares - common primary issue</th>
<td>0</td>
<td class="mod-ui-hide-xsmall">0</td>
<td class="mod-ui-hide-xsmall">0</td>
<td class="mod-ui-hide-xsmall">0</td>
<td class="mod-ui-hide-xsmall">--</td>
</tr>
</table>

如果不是很明显可能出了什么问题,我将不胜感激关于如何开始单步执行 read_html() 代码以找到问题根源的一些提示。我目前是 Python/pdb 的新手。

最佳答案

事实证明,如果你没有登录 FT 网站,你只能获得三年的数据。

所以我现在着手研究如何登录 FT 网站(可能使用 Twill)。

有个相关问题here

关于python - Pandas read_html() 缺少列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41394409/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com