gpt4 book ai didi

python - 在 Pandas 中使用 read_html() 时出错

转载 作者:行者123 更新时间:2023-11-28 18:34:11 25 4
gpt4 key购买 nike

这段代码给我一个错误:

代码:

import pandas as pd

fiddy_states = pd.read_html("https://simple.wikipedia.org/wiki/List_of_U.S._states")

错误:

> ---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-9-87a39d7446f6> in <module>()
1 import pandas as pd
----> 2 df_states = pd.read_html('http://www.50states.com/abbreviations.htm#.Vmz0ZkorLIU')

C:\Anaconda3\lib\site-packages\pandas\io\html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding)
864 _validate_header_arg(header)
865 return _parse(flavor, io, match, header, index_col, skiprows,
--> 866 parse_dates, tupleize_cols, thousands, attrs, encoding)

C:\Anaconda3\lib\site-packages\pandas\io\html.py in _parse(flavor, io, match, header, index_col, skiprows, parse_dates, tupleize_cols, thousands, attrs, encoding)
716 retained = None
717 for flav in flavor:
--> 718 parser = _parser_dispatch(flav)
719 p = parser(io, compiled_match, attrs, encoding)
720

C:\Anaconda3\lib\site-packages\pandas\io\html.py in _parser_dispatch(flavor)
661 if flavor in ('bs4', 'html5lib'):
662 if not _HAS_HTML5LIB:
--> 663 raise ImportError("html5lib not found, please install it")
664 if not _HAS_BS4:
665 raise ImportError("BeautifulSoup4 (bs4) not found, please install it")

ImportError: html5lib not found, please install it

尽管我安装了 html5lib、lxml 和 BeatifulSoup4 库并进行了更新。

最佳答案

考虑使用 xpath 表达式用 lxml 解析 html 表,然后将列表合并到数据框中:

import urllib.request as rq
import lxml.etree as et
import pandas as pd

# DOWNLOAD WEB PAGE CONTENT
rqpage = rq.urlopen('https://simple.wikipedia.org/wiki/List_of_U.S._states')
txtpage = rqpage.read()
dom = et.HTML(txtpage)

# XPATH EXPRESSIONS TO LISTS (SKIPPING HEADER COLUMN)
abbreviation= dom.xpath("//table[@class='wikitable']/tr[position()>1]/td[1]/b/text()")
state = dom.xpath("//table[@class='wikitable']/tr[position()>1]//td[2]/a/text()")
capital = dom.xpath("//table[@class='wikitable']/tr[position()>1]//td[3]/a/text()")
incorporated = dom.xpath("//table[@class='wikitable']/tr[position()>1]//td[4]/text()")

# CONVERT LISTS TO DATA FRAME
df = pd.DataFrame({'Abbreviation':abbreviation,
'State':state,
'Capital':capital,
'Incorporated':incorporated})

print(df.head())

# Abbreviation Capital Incorporated State
#0 AL Montgomery December 14, 1819 Alabama
#1 AK Juneau January 3, 1959 Alaska
#2 AZ Phoenix February 14, 1912 Arizona
#3 AR Little Rock June 15, 1836 Arkansas
#4 CA Sacramento September 9, 1850 California

关于python - 在 Pandas 中使用 read_html() 时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34059323/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com