gpt4 book ai didi

python - 从 Pandas 文本中删除unicode

转载 作者:行者123 更新时间:2023-12-03 18:15:53 25 4
gpt4 key购买 nike

对于一个字符串,下面的代码删除了 unicode 字符和换行符/回车符:

t = "We've\xe5\xcabeen invited to attend TEDxTeen, an independently organized TED event focused on encouraging youth to find \x89\xdb\xcfsimply irresistible\x89\xdb\x9d solutions to the complex issues we face every day.,"

t2 = t.decode('unicode_escape').encode('ascii', 'ignore').strip()
import sys
sys.stdout.write(t2.strip('\n\r'))

但是当我尝试在 Pandas 中编写一个函数以将其应用于列的每个单元格时,它要么由于属性错误而失败,要么我收到警告,提示正在尝试在 DataFrame 的切片副本上设置值
def clean_text(row):
row= row["text"].decode('unicode_escape').encode('ascii', 'ignore')#.strip()
import sys
sys.stdout.write(row.strip('\n\r'))
return row

应用于我的数据框:
df["text"] = df.apply(clean_text, axis=1)

如何将此代码应用于系列的每个元素?

最佳答案

问题似乎是您正在尝试访问和更改 row['text']并在执行 apply 函数时返回行本身,当您执行 apply 时在 DataFrame ,它适用于每个系列,所以如果改为这个应该有帮助:

import pandas as pd

df = pd.DataFrame([t for _ in range(5)], columns=['text'])

df
text
0 We've������been invited to attend TEDxTeen, an ind...
1 We've������been invited to attend TEDxTeen, an ind...
2 We've������been invited to attend TEDxTeen, an ind...
3 We've������been invited to attend TEDxTeen, an ind...
4 We've������been invited to attend TEDxTeen, an ind...
def clean_text(row):
# return the list of decoded cell in the Series instead
return [r.decode('unicode_escape').encode('ascii', 'ignore') for r in row]

df['text'] = df.apply(clean_text)

df
text
0 We'vebeen invited to attend TEDxTeen, an indep...
1 We'vebeen invited to attend TEDxTeen, an indep...
2 We'vebeen invited to attend TEDxTeen, an indep...
3 We'vebeen invited to attend TEDxTeen, an indep...
4 We'vebeen invited to attend TEDxTeen, an indep...

或者,您可以使用 lambda如下,直接申请仅 text柱子:
df['text'] = df['text'].apply(lambda x: x.decode('unicode_escape').\
encode('ascii', 'ignore').\
strip())

关于python - 从 Pandas 文本中删除unicode,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30337402/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com