gpt4 book ai didi

python - Pandas read_csv 不遵守正则表达式 sep

转载 作者:太空宇宙 更新时间:2023-11-03 13:10:26 25 4
gpt4 key购买 nike

数据:

from io import StringIO
import pandas as pd

s = '''ID,Level,QID,Text,ResponseID,responseText,date_key,last
375280046,S,D3M,Which is your favorite?,D5M0,option 1,2012-08-08 00:00:00,ynot
375280046,S,D3M,How often? (at home, at work, other),D3M0,Work,2010-03-31 00:00:00,okkk
375280046,M,A78,Do you prefer a, b, or c?,A78C,a,2010-03-31 00:00:00,abc
376918925,M,A78,Which ONE (select only one),A78E,Milk,2004-02-02 00:00:00,launch Wed., '''

df = pd.read_csv(StringIO(s), sep=r',(?!\s)')

问题:我问了一个问题here .不过我遇到了一个新问题。请注意在最后一行的末尾,它是一个逗号和一个空格。 sep=r',(?!\s)' 中的正则表达式应该忽略后跟空格的逗号。

问题:有没有一种方法可以将最后一列读作字面意思launch Wed.,,其中逗号不是分隔符/定界符,而是字面上的逗号last 列文本 - 仅使用 pd.read_csv

错误:

ValueError: Expected 8 fields in line 5, saw 9. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

预期/期望的输出:

          ID Level  QID                                  Text ResponseID  \
0 375280046 S D3M Which is your favorite? D5M0
1 375280046 S D3M How often? (at home, at work, other) D3M0
2 375280046 M A78 Do you prefer a, b, or c? A78C
3 376918925 M A78 Which ONE (select only one) A78E

responseText date_key last
0 option 1 2012-08-08 00:00:00 ynot
1 Work 2010-03-31 00:00:00 okkk
2 a 2010-03-31 00:00:00 abc
3 Milk 2004-02-02 00:00:00 launch Wed.,

最佳答案

让我们看看这个SO Post .

使用上面解释的正则表达式 r',(?=\S)'

from io import StringIO
import pandas as pd

s = '''ID,Level,QID,Text,ResponseID,responseText,date_key,last
375280046,S,D3M,Which is your favorite?,D5M0,option 1,2012-08-08 00:00:00,ynot
375280046,S,D3M,How often? (at home, at work, other),D3M0,Work,2010-03-31 00:00:00,okkk
375280046,M,A78,Do you prefer a, b, or c?,A78C,a,2010-03-31 00:00:00,abc
376918925,M,A78,Which ONE (select only one),A78E,Milk,2004-02-02 00:00:00,launch Wed., '''

df = pd.read_csv(StringIO(s), sep=r',(?=\S)')

输出:

              ID                                 Level   QID      Text  \
375280046 S D3M Which is your favorite? D5M0 option 1
S D3M How often? (at home, at work, other) D3M0 Work
M A78 Do you prefer a, b, or c? A78C a
376918925 M A78 Which ONE (select only one) A78E Milk

ResponseID responseText date_key last
375280046 S 2012-08-08 00 0 0 ynot
S 2010-03-31 00 0 0 okkk
M 2010-03-31 00 0 0 abc
376918925 M 2004-02-02 00 0 0 launch Wed.,

关于python - Pandas read_csv 不遵守正则表达式 sep,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44787408/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com