gpt4 book ai didi

python - pandas - 如果包含列表中的值,则删除带有值列表的行

转载 作者:太空宇宙 更新时间:2023-11-03 15:47:23 24 4
gpt4 key购买 nike

我有大量数据。类似于 100k 行,如果包含列表的行包含来自另一个数据帧的值,我试图从数据帧中删除一行。这是一个小时间示例。

has = [['@a'], ['@b'], ['#c, #d, #e, #f'], ['@g']]
use = [1,2,3,5]
z = ['#d','@a']
df = pd.DataFrame({'user': use, 'tweet': has})
df2 = pd.DataFrame({'z': z})

tweet user
0 [@a] 1
1 [@b] 2
2 [#c, #d, #e, #f] 3
3 [@g] 5
z
0 #d
1 @a

期望的结果是

              tweet  user
0 [@b] 2
1 [@g] 5

我尝试过的事情

#this seems to work for dropping @a but not #d
for a in range(df.tweet.size):
for search in df2.z:
if search in df.loc[a].tweet:
df.drop(a)

#this works for my small scale example but throws an error on my big data
df['tweet'] = df.tweet.apply(', '.join)
test = df[~df.tweet.str.contains('|'.join(df2['z'].astype(str)))]

#the error being "unterminated character set at position 1343770"
#i went to check what was on that line and it returned this
basket.iloc[1343770]

user_id 17060480
tweet [#IfTheyWereBlackOrBrownPeople, #WTF]
Name: 4612505, dtype: object

如有任何帮助,我们将不胜感激。

最佳答案

['#c, #d, #e, #f'] 1 个字符串或像这样的列表 ['#c', '#d', '#e ', '#f'] ?

has = [['@a'], ['@b'], ['#c', '#d', '#e', '#f'], ['@g']]
use = [1,2,3,5]
z = ['#d','@a']
df = pd.DataFrame({'user': use, 'tweet': has})
df2 = pd.DataFrame({'z': z})

简单的解决方案是

screen = set(df2.z.tolist())
to_delete = list() # this will speed things up doing only 1 delete
for id, row in df.iterrows():
if set(row.tweet).intersection(screen):
to_delete.append(id)
df.drop(to_delete, inplace=True)

速度比较(10 000 行):

st = time.time()
screen = set(df2.z.tolist())
to_delete = list()
for id, row in df.iterrows():
if set(row.tweet).intersection(screen):
to_delete.append(id)
df.drop(to_delete, inplace=True)
print(time.time()-st)
2.142000198364258

st = time.time()
for a in df.tweet.index:
for search in df2.z:
if search in df.loc[a].tweet:
df.drop(a, inplace=True)
break
print(time.time()-st)
43.99799990653992

关于python - pandas - 如果包含列表中的值,则删除带有值列表的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49209155/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com