gpt4 book ai didi

python - 添加一个新列,将短语中的所有大写单词附加到每行的列表中

转载 作者:行者123 更新时间:2023-12-02 16:34:02 26 4
gpt4 key购买 nike

我有一个类似于以下的数据集:

      Date                          Sentence                                      Text              Verified
_
0 2020-01-18 00:00:00 LUKE Diamond is a famous Updates · BREAKING News False
1 2020-01-18 00:00:00 Blog - TASTY YUMMIES Brush with ... False
2 2020-01-18 00:00:00 ACNE Alternative Remedies: Manuka HONEY Learn more from WEBMD False
3 2020-01-18 00:00:00 Looking back at 10 YEARS As the LOCAL community False

我想选择可能在句子或文本中的大写单词(我想将这些结果逐行保存到两个单独的列中)。

      Date                          Sentence                                      Text              Verified    CS                            CT
_
0 2020-01-18 00:00:00 LUKE Diamond is a famous Updates · BREAKING News False ['LUKE'] ['BREAKING']
1 2020-01-18 00:00:00 Blog - TASTY YUMMIES Brush with ... False ['TASTY', 'YUMMIES'] []
2 2020-01-18 00:00:00 ACNE Alternative Remedies: Manuka HONEY Learn more from WEBMD False ['ACNE','HONEY'] ['WEBMD']
3 2020-01-18 00:00:00 Looking back at 10 YEARS As the LOCAL community False ['YEARS'] ['LOCAL']

我试过如下(这应该生成一个大写单词列表,逐行),但我收到以下错误:TypeError: expected string or bytes-like object in上限。您能否告诉我什么不起作用以及如何获得预期的输出?非常感谢。

import re

def s_l(file):
s = []

for c in file['Sentence'].tolist():
caps = re.findall('([A-Z]+(?:(?!\s?[A-Z][a-z])\s?[A-Z])+)', c)
s.append(caps)
for c in file['Text'].tolist():
caps = re.findall('([A-Z]+(?:(?!\s?[A-Z][a-z])\s?[A-Z])+)', c)
s.append(caps)

file['CS'] = pd.Series(s)
s = [x for x in s if x != [] and len(x)>1]
file['CT'] = pd.Series(s)
s = [x for x in s if x != [] and len(x)>1]

return file, s

s_df, s =s_l(df)

我想我都错误地考虑了一个单词列表,所以我可能应该添加另一个列表(不仅仅是 s)。

最佳答案

创建一个列表推导对象 m 将值与 .upper() 进行比较以获得所有大写字母,并将值与 .isalpha() 进行比较确保您没有引入 .upper() 对它们没有任何作用的字符串/数字。然后,简单地创建新的列,利用 .apply(m)

的列表理解
m = lambda x: [y for y in str(x).split(' ') if y.upper() == y and y.isalpha()]
df['CS'] = df['Sentence'].apply(m)
df['CT'] = df['Text'].apply(m)

输出:

    Date                          Sentence                                      Text              Verified    CS                            CT               
0 2020-01-18 00:00:00 LUKE Diamond is a famous Updates · BREAKING News False ['LUKE'] ['BREAKING']
1 2020-01-18 00:00:00 Blog - TASTY YUMMIES Brush with ... False ['TASTY', 'YUMMIES'] []
2 2020-01-18 00:00:00 ACNE Alternative Remedies: Manuka HONEY Learn more from WEBMD False ['ACNE','HONEY'] ['WEBMD']
3 2020-01-18 00:00:00 Looking back at 10 YEARS As the LOCAL community False ['YEARS'] ['LOCAL']

关于python - 添加一个新列,将短语中的所有大写单词附加到每行的列表中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63044804/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com