gpt4 book ai didi

python - 从 python pandas 的数据框列中搜索匹配的字符串模式

转载 作者:行者123 更新时间:2023-11-28 22:37:14 24 4
gpt4 key购买 nike

我有一个像下面这样的数据框

 name         genre
satya |ACTION|DRAMA|IC|
satya |COMEDY|BIOPIC|SOCIAL|
abc |CLASSICAL|
xyz |ROMANCE|ACTION|DARMA|
def |DISCOVERY|SPORT|COMEDY|IC|
ghj |IC|

现在我想查询数据框,以便我可以获得第 1,5 行和第 6 行。i:e 我想找到 |IC|单独使用或与其他类型的任何组合。

到目前为止,我可以使用以下方法进行精确搜索

df[df['genre'] == '|ACTION|DRAMA|IC|']  ######exact value yields row 1

或者一个字符串包含搜索

 df[df['genre'].str.contains('IC')]  ####yields row 1,2,3,5,6
# as BIOPIC has IC in that same for CLASSICAL also

但我不想要这两个。

#df[df['genre'].str.contains('|IC|')]  #### row 6
# This also not satisfying my need as i am missing rows 1 and 5

所以我的要求是找到具有 |IC| 的流派在它们中。(我的字符串搜索失败,因为 python 将 '|' 视为或运算符)

有人建议一些 reg 或任何方法来做到这一点。感谢 ADv。

最佳答案

我认为您可以将 \ 添加到正则表达式以进行转义,因为没有 \| 被解释为 OR :

'|'

A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. An arbitrary number of REs can be separated by the '|' in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by '|' are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the '|' operator is never greedy. To match a literal '|', use \|, or enclose it inside a character class, as in [|].

print df['genre'].str.contains(u'\|IC\|')
0 True
1 False
2 False
3 False
4 True
5 True
Name: genre, dtype: bool

print df[df['genre'].str.contains(u'\|IC\|')]
name genre
0 satya |ACTION|DRAMA|IC|
4 def |DISCOVERY|SPORT|COMEDY|IC|
5 ghj |IC|

关于python - 从 python pandas 的数据框列中搜索匹配的字符串模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36740680/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com