gpt4 book ai didi

python - 如何通过标点符号拆分 Pandas 列中的长字符串

转载 作者:行者123 更新时间:2023-12-03 21:57:18 26 4
gpt4 key购买 nike

我有一个 df 看起来像这样:

words                                              col_a   col_b  
I guess, because I have thought over that. Um, 1 0
That? yeah. 1 1
I don't always think you're up to something. 0 1

我想在出现标点字符的任何地方拆分 df.words (.,?!:;)成一个单独的行。但是,我想为每个新行保留原始行中的 col_b 和 col_b 值。例如,上面的 df 应该是这样的:
words                                              col_a   col_b  
I guess, 1 0
because I have thought over that. 1 0
Um, 1 0
That? 1 1
yeah. 1 1
I don't always think you're up to something. 0 1

最佳答案

一种方法是使用 str.findall 带图案(.*?[.,?!:;])匹配任何这些标点符号和它前面的字符(非贪婪),并分解结果列表:

(df.assign(words=df.words.str.findall(r'(.*?[.,?!:;])'))
.explode('words')
.reset_index(drop=True))

words col_a col_b
0 I guess, 1 0
1 because I have thought over that. 1 0
2 Um, 1 0
3 That? 1 1
4 yeah. 1 1
5 I don't always think you're up to something. 0 1

关于python - 如何通过标点符号拆分 Pandas 列中的长字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61331415/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com