gpt4 book ai didi

python - 如何根据多个条件用字符串拆分 Pandas 数据框列

转载 作者:行者123 更新时间:2023-12-03 18:30:26 25 4
gpt4 key购买 nike

我有一个 Pandas 数据框,如下所示:

    ID       Col.A

28654 This is a dark chocolate which is sweet
39876 Sky is blue 1234 Sky is cloudy 3423
88776 Stars can be seen in the dark sky
35491 Schools are closed 4568 but shops are open
我试图在单词 Col.Adark 之前拆分 digits 。我想要的结果如下。
     ID             Col.A                             Col.B

28654 This is a dark chocolate which is sweet
39876 Sky is blue 1234 Sky is cloudy 3423
88776 Stars can be seen in the dark sky
35491 Schools are closed 4568 but shops are open
我尝试将包含单词 dark 的行分组到一个数据帧,并将带有数字的行分组到另一个数据帧,然后相应地拆分它们。之后,我可以连接生成的数据帧以获得预期的结果。代码如下:
df = pd.DataFrame({'ID':[28654,39876,88776,35491], 'Col.A':['This is a dark chocolate which is sweet', 
'Sky is blue 1234 Sky is cloudy 3423',
'Stars can be seen in the dark sky',
'Schools are closed 4568 but shops are open']})

df1 = df[df['Col.A'].str.contains(' dark ')==True]
df2 = df.merge(df1,indicator = True, how='left').loc[lambda x : x['_merge']!='both']
df1 = df1["Col.A"].str.split(' dark ', expand = True)
df2 = df2["Col.A"].str.split('\d+', expand = True)
pd.concat([[df1, df2], axis =0)
得到的结果与预期的不同。那是,
      0                              1
0 This is a chocolate which is sweet
2 Stars can be seen in the sky
1 Sky is blue Sky is cloudy
3 Schools are closed but shops are open
我错过了字符串中的数字和结果中的单词 dark
那么如何解决这个问题并获得结果而不遗漏拆分单词和数字呢?
有没有办法在不删除它们的情况下“在预期的单词或数字之前切片”?

最佳答案

Series.str.split

s = df['Col.A'].str.split(r'\s+(?=\b(?:dark|\d+)\b)', n=1, expand=True)
df[['ID']].join(s.set_axis(['Col.A', 'Col.B'], 1))
      ID                     Col.A                          Col.B
0 28654 This is a dark chocolate which is sweet
1 39876 Sky is blue 1234 Sky is cloudy 3423
2 88776 Stars can be seen in the dark sky
3 35491 Schools are closed 4568 but shops are open
正则表达式详细信息:
  • \s+ : 匹配任意空白字符一次或多次
  • (?=\b(?:dark|\d+)\b) : 正向预测
  • \b : 防止部分匹配的词边界
  • (?:dark|\d+) : 非捕获组
  • dark : First Alternative 匹配字符暗字
  • \d+ : 匹配任何数字一次或多次的第二种选择

  • \b : 防止部分匹配的词边界


  • 查看在线 regex demo

    关于python - 如何根据多个条件用字符串拆分 Pandas 数据框列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67287696/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com