gpt4 book ai didi

python - 如何去掉解析的html页面中的\ufeff

转载 作者:太空宇宙 更新时间:2023-11-03 20:41:40 26 4
gpt4 key购买 nike

代码是

!wget -q -O 'boroughs.html' "https://en.wikipedia.org/wiki/List_of_London_boroughs"

with open('boroughs.html', encoding='utf-8-sig') as fp:
soup = BeautifulSoup(fp,"lxml")


data = []
table = soup.find("table", { "class" : "wikitable sortable" })
table_body = table.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [col.text.strip() for col in cols]
data.append([col for col in cols]) # Get rid of empty values
data

经过一番研究后,我添加了 encoding='utf-8-sig' 打开。但在输出 I still see the characters \ufeff:

令我困惑的是,我什至尝试过使用 hacky 的方式

df = df.replace(u'\ufeff', '') 

将数据添加到 pandas 数据框后

而且角色还在那里。

最佳答案

尝试以下操作:

with open('boroughs.html', encoding='utf-8-sig') as fp:

关于python - 如何去掉解析的html页面中的\ufeff,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56825888/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com