gpt4 book ai didi

python - Pandas:通过在列中查找子字符串改进算法

转载 作者:行者123 更新时间:2023-11-30 22:49:15 26 4
gpt4 key购买 nike

我有数据框,我尝试仅获取字符串,其中某些列包含一些字符串。

我使用:

df_res = pd.DataFrame()
for i in substr:
res = df[df['event_address'].str.contains(i)]

df 看起来像:

member_id,event_address,event_time,event_duration
g1497o1ofm5a1963,fotki.yandex.ru/users/atanusha/albums,2015-05-01 00:00:05,8
g1497o1ofm5a1963,9829192.ru/apple-iphone.html,2015-05-01 00:00:15,2
g1497o1ofm5a1963,fotki.yandex.ru/users/atanusha/album/165150?&p=3,2015-05-01 00:00:17,2
g1497o1ofm5a1963,fotki.yandex.ru/tags/%D0%B1%D0%BE%D1%81%D0%B8%D0%BA%D0%BE%D0%BC?text=%D0%B1%D0%BE%D1%81%D0%B8%D0%BA%D0%BE%D0%BC&search_author=utpaladev&&p=2,2015-05-01 00:01:31,10
g1497o1ofm5a1963,3gmaster.net,2015-05-01 00:01:41,6
g1497o1ofm5a1963,fotki.yandex.ru/search.xml?text=%D0%B1%D0%BE%D1%81%D0%B8%D0%BA%D0%BE%D0%BC&&p=2,2015-05-01 00:02:01,6
g1497o1ofm5a1963,fotki.yandex.ru/search.xml?text=%D0%B1%D0%BE%D1%81%D0%B8%D0%BA%D0%BE%D0%BC&search_author=Sunny-Fanny&,2015-05-01 00:02:31,2
g1497o1ofm5a1963,fotki.9829192.ru/apple-iphone.html,2015-05-01 00:03:25,6

substr是:

123.ru/gadgets/communicators
320-8080.ru/mobilephones
3gmaster.net
3-q.ru/products/smartfony/s
9829192.ru/apple-iphone.html
9829192.ru/index.php?cat=1
acer.com/ac/ru/ru/content/group/smartphones
aj.ru

我用这段代码得到了理想的结果,但它太长了。我还尝试使用 column(substr it's a substr = urls.url.values.tolist())我尝试

res = df[df['event_address'].str.contains(urls.url)]

但它返回:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

有什么办法可以让它更快或者我错了?

最佳答案

这样做:

def check_exists(x):
for i in substr:
if i in x:
return True
return False

df2 = df.ix[df.event_address.map(check_exists)]

或者如果您喜欢将其写成一行:

df.ix[df.event_address.map(lambda x: any([True for i in substr if i in x]))]
<小时/>

关于python - Pandas:通过在列中查找子字符串改进算法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39847973/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com