gpt4 book ai didi

python - 如何在 beautiful soup 中使用 get_text() 时更改 unicode 格式

转载 作者:太空宇宙 更新时间:2023-11-03 17:56:55 25 4
gpt4 key购买 nike

我在使用 get_text() 时获取 unicode 格式。如何将 DataFrame 中的 Unicode 更改为字符串?

需要正确的文本格式来保持数据整洁......下面是我的代码......

import requests
from pattern import web
from bs4 import BeautifulSoup
from pandas import *
url = 'http://www.mouthshut.com/product-reviews/amazonin-reviews-925670774-srch'
r = requests.get(url)
bs = BeautifulSoup(r.text)
mouthrev = []
Title = []
for revlist in bs.find_all("li","reviewdetails openshare"):
title = revlist.find_all('div','reviewtitle fl')
title = [g.get_text(strip=True) for g in title]

for parent in revlist.find_all("div", itemprop='description'):
review = parent.find_all('p')
review = [g.get_text(strip=True) for g in review]
mouthrev.append(review)
Title.append(title)


mouth1 = DataFrame({'Title' : Series(Title),'Review' : Series(mouthrev)})
mouth1.to_csv('D:\\Review.csv')

我得到结果:

Title   Review
[u'Wrong product need immediate refund'] [u'I have been shopping with amazon for almost 6 months now and for the 1st time I ordered a Tuxedo. Looking at the item online it seemed perfect. My actual size for the suit is 40 which fits me perfectly. I ordered for the same size. Firstly the delivery didnt happen though I received a text statin ...']
[u'Cheating customers by sending a dummy tracking no.'] [u'Order #171-0709329-6021113( amazon.in)', u'I have placed this order on 15th Jan 2015 and I received a mail from amazon on 15th Jan 2015 itself as my order has shipped. Also I have received a tracking number of Speed Post.', u'Today it is 03rd Feb 2015, till now there is no status/details a...']
[u'BAD in Delivery. Unpredictable delivery date/time.'] [u'If Ordering from Amazon.In, be prepared for Delivery nightmares.', u'The Delivery team does NOT call you up before coming.', u'Amazon does send you Courier persons name and mobile. My experience has been is that this information is not reliable(Happened to me twice that the Delivery person I ...']

最佳答案

如果我理解你的意思是正确的,为什么不使用 str()

review = [str(g.get_text(strip=True)) for g in review]

这会起作用

关于python - 如何在 beautiful soup 中使用 get_text() 时更改 unicode 格式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28337061/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com