gpt4 book ai didi

python - 在 Pandas Advice 中对两列数据进行切片并输出新值

转载 作者:行者123 更新时间:2023-12-01 09:26:26 26 4
gpt4 key购买 nike

假设我们有以下数据框:

import pandas as pd
df = pd.read_csv('subjects.csv')
Col A, Interest, Col Start, Col Go, Col Learn,
Learn English Lit
Go Mathematics
Start Science
Learn Science
Go English
Start Math
Learn Math
Go Biology
Start English

我编写了一些代码来从类似的数据集中提取兴趣,如下所示

#Map Interests 
Mapper = ['English', 'Math', 'Maths', 'Mathematics', 'Biology', 'Science']
#Join Mapper to Interest Column
pat = '|'.join(r"\b{}\b".format(x) for x in Mapper)
df['interest'] = df['col A'].str.extract('('+ pat + ')', expand=False)


#Align Interest Names by creating a dict and replacing values
enter code here
d = {'English Lit' : 'English', 'Biology' : 'Science', 'Mathematics' : 'Maths'}
df['Interests'] = df['Interests'].replace(d, inplace=False)

>>> Output:

Col A, Interest, Col Start, Col Go, Col Learn,
Learn English Lit English
Go Mathematics Maths
Start Science Science
Learn Science Science
Go English English
Start Math Maths
Learn Math Maths
Go Biology Science
Start English English

现在我需要衡量 A 上校对关键字和兴趣的参与度。

我已经这样做了,如下所示,但我确信有更好的方法来做到这一点。

df['Col Start'][df['col A'].str.contains("Learn", na=False) & df['interest'].str.contains("Science")] = 'Learn'

此外,将多个值附加到一列中的最佳方法是什么?例如,如果我有:

Col A                         
Learn Science, Math, Biology.

我希望将关键字 + 兴趣映射到新列中,值之间用逗号分隔。这就是我当前的脚本崩溃的地方,它用以前的值重写了新值,我试图捕获所有参与级别(如果这有意义......)

Col A                         Col B
Learn Science, Math, Biology. Learn S, Learn, M, Learn B

任何帮助将不胜感激。 (请温柔一点,我从二月份开始编码!)

为了清晰起见进行编辑:

df.loc[df['col A'].str.contains("Learn", na=False) & df['interest'].str.contains("Science"), 'Col Start'] = 'Learn S'
df.loc[df['col A'].str.contains("Learn", na=False) & df['interest'].str.contains("English"), 'Col Start'] = 'Learn E'
df.loc[df['col A'].str.contains("Learn", na=False) & df['interest'].str.contains("Math"), 'Col Start'] = 'Learn M'


Col A Col Learn
Learn Science, Math Learn S, Learn M
Learn Math, English Learn M, Learn E
Learn Science Learn S.

在我的 DF 中,Col A 和 Interest 可能会重叠并具有重复输出。我想要的是捕获它们全部而不是覆盖它们,而是用逗号附加任何新输入。

最佳答案

我认为需要findall如果需要通过列表提取所有值,并使用列表理解和 join 来附加字符串Learn:

#better is use loc for set new column
df.loc[df['col A'].str.contains("Learn", na=False) & df['interest'].str.contains("Science"), 'Col Start'] = 'Learn'

df['new'] = df['col A'].str.findall('('+ pat + ')').apply(lambda x: ', '.join(['Learn ' + y for y in x]))
print (df)

col A interest Interests Col Start \
0 Learn English Lit English English NaN
1 Go Mathematics Mathematics Maths NaN
2 Start Science Science Science NaN
3 Learn Science Science Science Learn
4 Go English English English NaN
5 Start Math Math Math NaN
6 Learn Math Math Math NaN
7 Go Biology Biology Science NaN
8 Learn Science, Math, Biology. Science Science Learn

new
0 Learn English
1 Learn Mathematics
2 Learn Science
3 Learn Science
4 Learn English
5 Learn Math
6 Learn Math
7 Learn Biology
8 Learn Science, Learn Math, Learn Biology

编辑:

print (df)
col A Col Learn
0 Learn Science, Math Learn S, Learn M
1 Learn Math, English Learn M, Learn E
2 Learn Science Learn S
3 Science val

#create dictionary for new values by keys
d = {'Science':'S', 'English':'E', 'Math':'M'}
#check if Learn
mask = df['col A'].str.contains("Learn", na=False)
#extract all values by keys of dict, replace values by dicts by lookup and join with Learn
s = (df['col A'].str.findall('('+ '|'.join(d.keys()) + ')')
.apply(lambda x: ', '.join(['Learn ' + d[y] for y in x])))

df['new'] = np.where(mask, s, df['col A'])
print (df)
col A Col Learn new
0 Learn Science, Math Learn S, Learn M Learn S, Learn M
1 Learn Math, English Learn M, Learn E Learn M, Learn E
2 Learn Science Learn S Learn S
3 Science val Science

关于python - 在 Pandas Advice 中对两列数据进行切片并输出新值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50346992/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com