gpt4 book ai didi

python - Pandas:使用正则表达式替换列中的值

转载 作者:太空宇宙 更新时间:2023-11-03 15:52:54 25 4
gpt4 key购买 nike

我有 2 个数据帧,我需要使用第二个数据帧的值将新列获取到第一个数据帧第一 df 是

ID,"url","used_at","active_seconds"
8075643aab791cec7dc9d18926958b67,"sberbank.ru/ru/person/promo/10mnl?utm_source=Vesti.ru&utm_medium=html&utm_campaign=10_million_users_SBOL_dec2015&utm_term=every14_syncbanners",2016-01-01 00:03:16,183
a04a8041ffa6fe1b85471ca5af1ee575,"online.rsb.ru/hb/faces/system/login/rslogin.jsp?credit=false",2016-01-01 00:04:36,42
a04a8041ffa6fe1b85471ca5af1ee575,"online.rsb.ru/hb/faces/system/login/sms/sms.jsp?smsAuth=true",2016-01-01 00:05:18,22
a04a8041ffa6fe1b85471ca5af1ee575,"online.rsb.ru/hb/faces/rs/RSIndex.jspx",2016-01-01 00:05:40,14
a04a8041ffa6fe1b85471ca5af1ee575,"online.rsb.ru/hb/faces/rs/payments/PaymentReq.jspx",2016-01-01 00:05:54,22
ba880911a6d54f6ea6d3145081a0e0dd,"homecredit.ru/help/quest/feedback.php",2016-01-01 00:06:12,2

第二个 df 看起来像

URL Code
citibank\.ru\/russia\/info\/rus\/contacts_form\.htm 15
citibank\.ru\/russia\/info\/rus\/contacts\.htm 15
gazprombank\.ru\/contacts\/ 15
gazprombank\.ru\/feedback\/ 15
gazprombank\.ru\/additional_office\/ 15
homecredit\.ru\/help\/quest\/feedback\.php 15
homecredit\.ru\/offices\/* 15

如果我没有正则表达式,我会使用

df1['code'] = df1.url.map(df2.set_index('URL')['Code'])

但我不能这样做,因为 df2.URL 是正则表达式。但是

df1['code'] = df1['url'].replace(df2['URL'], df2['Code'], regex=True)

不起作用。

最佳答案

根据我的评论,pandas.Series.replace()方法不允许使用Series对象作为to_replace参数。相反,传递列表是有效的:

df1['code'] = df1.url.replace(df2.URL.values, df2.Code.values, regex=True)
print df1[['url', 'code']]

产生以下输出:

                                                 url  \
0 sberbank.ru/ru/person/promo/10mnl?utm_source=V...
1 online.rsb.ru/hb/faces/system/login/rslogin.js...
2 online.rsb.ru/hb/faces/system/login/sms/sms.js...
3 online.rsb.ru/hb/faces/rs/RSIndex.jspx
4 online.rsb.ru/hb/faces/rs/payments/PaymentReq....
5 homecredit.ru/help/quest/feedback.php

code
0 sberbank.ru/ru/person/promo/10mnl?utm_source=V...
1 online.rsb.ru/hb/faces/system/login/rslogin.js...
2 online.rsb.ru/hb/faces/system/login/sms/sms.js...
3 online.rsb.ru/hb/faces/rs/RSIndex.jspx
4 online.rsb.ru/hb/faces/rs/payments/PaymentReq....
5 15

为了回答您的附加评论,您无法在 df1.code 中的 df1.url 不获取的行中获取 df2.Code不匹配任何正则表达式字符串,但您可以提供一个值(例如 None),以便将这些情况放入列中。例如,可以通过添加以下行来完成:

df1['code'] = df1.apply(lambda x: None if x.code == x.url else x.code, axis=1)

其中 print df1[['url', 'code']] 返回以下内容:

                                                 url  code
0 sberbank.ru/ru/person/promo/10mnl?utm_source=V... NaN
1 online.rsb.ru/hb/faces/system/login/rslogin.js... NaN
2 online.rsb.ru/hb/faces/system/login/sms/sms.js... NaN
3 online.rsb.ru/hb/faces/rs/RSIndex.jspx NaN
4 online.rsb.ru/hb/faces/rs/payments/PaymentReq.... NaN
5 homecredit.ru/help/quest/feedback.php 15.0

关于python - Pandas:使用正则表达式替换列中的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41097221/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com