gpt4 book ai didi

python - 建议向单元格批量添加值的正确方法取决于其他单元格

转载 作者:行者123 更新时间:2023-12-01 09:18:13 25 4
gpt4 key购买 nike

请建议根据其他单元格向单元格批量添加值的正确方法。
我有 csv 文件,我需要搜索第 1 列中的行是否包含关键字 AAA 或 BBB 或 CCC,然后将字符串 XXX 添加到第 3 列中的单元格,并将字符串 YYY 添加到同一行中第 4 列中的单元格。
如果第 1 列中的行包含关键字 DDD 或 EEE 或 FFF,则将字符串 VVV 添加到同一行中第 3 列的单元格中,并将字符串 WWW 添加到第 4 列的单元格中。

大约有一千个关键字,它们可以是不同的大小写。

原始 csv:

1,"AAA 329823 3298","23"
2,"BBB 87682 23423","64"
3,"ccc 73838 72653","45"
4,"DDD 86738 86398","23"
5,"EEE 64833 34322","45"

我想要:

1,"AAA 329823 3298","23",XXX,YYY
2,"BBB 87682 23423","64",XXX,YYY
3,"ccc 73838 72653","45",XXX,YYY
4,"DDD 86738 86398","23","VVV","WWW"
5,"EEE 64833 34322","45","VVV","WWW"

现在我使用以下代码,但我认为这是错误的方式:

with open(r_file,'r') as csvinput:
with open(w_file, 'w') as csvoutput:
writer = csv.writer(csvoutput)
reader = csv.reader(csvinput)

all = []

for row in reader:
if any(c in row[1] for c in ("AAA", "BBB", "CCC")):
row.append("XXX")
row.append("YYY")
if any(c in row[1] for c in ("DDD", "EEE", "FFF")):
row.append("VVV")
row.append("WWW")

最佳答案

您可以使用numpy.select对于通过多个条件设置值 contains用于检查子字符串:

m1 = df[1].str.contains("AAA|BBB|CCC")[:,None]
m2 = df[1].str.contains("DDD|EEE|FFF")[:,None]

df[[3,4]] = pd.DataFrame(np.select([m1, m2], [['XXX','YYY'],['VVV','WWW']], ['','']))
print (df)
0 1 2 3 4
0 1 AAA 329823 3298 23 XXX YYY
1 2 BBB 87682 23423 64 XXX YYY
2 3 ccc 73838 72653 45
3 4 DDD 86738 86398 23 VVV WWW
4 5 EEE 64833 34322 45 VVV WWW

设置:

如果没有 csv header ,请使用 header=None 参数:

import pandas as pd

temp=u'''1,"AAA 329823 3298","23"
2,"BBB 87682 23423","64"
3,"ccc 73838 72653","45"
4,"DDD 86738 86398","23"
5,"EEE 64833 34322","45"'''
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=None)

print (df)

0 1 2
0 1 AAA 329823 3298 23
1 2 BBB 87682 23423 64
2 3 ccc 73838 72653 45
3 4 DDD 86738 86398 23
4 5 EEE 64833 34322 45

编辑:

#setup dictionary
d = {'AAA':['XXX','YYY'], 'BBB':['XXX','YYY'], 'CCC':['XXX','YYY'],
'DDD':['VVV','WWW'],'EEE':['VVV','WWW'], 'FFF':['VVV','WWW']}

#create DataFrame
df1 = pd.DataFrame.from_dict(d, orient='index', columns=['a','b'])
print (df1)
a b
AAA XXX YYY
BBB XXX YYY
CCC XXX YYY
DDD VVV WWW
EEE VVV WWW
FFF VVV WWW

#extract valus of dictionary keys to new column
pat = '|'.join(r"\b{}\b".format(x) for x in d.keys())
df['new'] = df[1].str.extract('(' + pat + ')')
print (df)
0 1 2 new
0 1 AAA 329823 3298 23 AAA
1 2 BBB 87682 23423 64 BBB
2 3 ccc 73838 72653 45 NaN
3 4 DDD 86738 86398 23 DDD
4 5 EEE 64833 34322 45 EEE

#join df1 by column new
df = df.join(df1, on='new')
print (df)
0 1 2 new a b
0 1 AAA 329823 3298 23 AAA XXX YYY
1 2 BBB 87682 23423 64 BBB XXX YYY
2 3 ccc 73838 72653 45 NaN NaN NaN
3 4 DDD 86738 86398 23 DDD VVV WWW
4 5 EEE 64833 34322 45 EEE VVV WWW

关于python - 建议向单元格批量添加值的正确方法取决于其他单元格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51035607/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com