我想使用逗号分隔符将 df
中的 ji
列拆分为两列 - 去掉 周围的括号也很好ji
值。我尝试了各种方法并不断出错。我想暂时避免使用 lambda 表达式
!还有其他想法吗?
例子
ji length
0 (75.0, 5.0) 3283.458479
1 (96.0, 5.0) 1431.312901
2 (97.0, 5.0) 1364.592959
3 (247.0, 5.0) 3736.322308
4 (81.0, 7.0) 2655.910005
5 (93.0, 7.0) 1752.293687
6 (242.0, 7.0) 427.844417
7 (248.0, 7.0) 3725.823013
8 (254.0, 7.0) 2318.937332
9 (255.0, 7.0) 2292.673905
10 (242.0, 8.0) 145.811907
11 (254.0, 8.0) 2222.447786
12 (255.0, 8.0) 2196.184360
13 (248.0, 9.0) 441.222866
14 (253.0, 9.0) 853.095032
15 (256.0, 9.0) 2076.942682
16 (91.0, 10.0) 1743.310744
17 (93.0, 10.0) 1256.337420
18 (105.0, 10.0) 523.447658
19 (174.0, 10.0) 1530.617012
20 (176.0, 10.0) 1697.614009
21 (248.0, 10.0) 440.000463
22 (253.0, 10.0) 904.706003
23 (256.0, 10.0) 1991.662604
24 (258.0, 10.0) 1850.995862
25 (172.0, 11.0) 1301.179960
26 (174.0, 11.0) 1436.984094
27 (176.0, 11.0) 1695.954099
28 (179.0, 11.0) 1548.015013
29 (228.0, 11.0) 4640.928585
30 (242.0, 11.0) 169.617203
31 (251.0, 11.0) 784.921333
32 (253.0, 11.0) 983.118859
33 (255.0, 11.0) 1181.474433
34 (256.0, 11.0) 1303.398235
您可以使用以下方式加载上面的示例:
import pandas as pd
from io import StringIO
csv = """\
ji:length
(75.0,5.0):3283.458479
(96.0,5.0):1431.312901
(97.0,5.0):1364.592959
(247.0,5.0):3736.322308
(81.0,7.0):2655.910005
(93.0,7.0):1752.293687
(242.0,7.0):427.844417
(248.0,7.0):3725.823013
(254.0,7.0):2318.937332
(255.0,7.0):2292.673905
(242.0,8.0):145.811907
(254.0,8.0):2222.447786
(255.0,8.0):2196.184360
(248.0,9.0):441.222866
(253.0,9.0):853.095032
(256.0,9.0):2076.942682
(91.0,10.0):1743.310744
(93.0,10.0):1256.337420
(105.0,10.0):523.447658
(174.0,10.0):1530.617012
(176.0,10.0):1697.614009
(248.0,10.0):440.000463
(253.0,10.0):904.706003
(256.0,10.0):1991.662604
(258.0,10.0):1850.995862
(172.0,11.0):1301.179960
(174.0,11.0):1436.984094
(176.0,11.0):1695.954099
(179.0,11.0):1548.015013
(228.0,11.0):4640.928585
(242.0,11.0):169.617203
(251.0,11.0):784.921333
(253.0,11.0):983.118859
(255.0,11.0):1181.474433
(256.0,11.0):1303.398235
"""
df = pd.read_csv(StringIO(csv), sep=":")
如果 ji
列中的字符串的解决方案 - pop
提取柱,strip
和 split
对 DataFrame
使用 expand=True
:
print (type(df.loc[0, 'ji']))
<class 'str'>
df[['a','b']] = df.pop('ji').str.strip('()').str.split(', ', expand=True).astype(float)
或者如果没有缺失值并且性能很重要,则使用列表理解
:
L = [x.strip('()').split(', ') for x in df.pop('ji')]
df[['a','b']] = pd.DataFrame(L, index=df.index).astype(float)
print (df)
length a b
0 3283.458479 75.0 5.0
1 1431.312901 96.0 5.0
2 1364.592959 97.0 5.0
3 3736.322308 247.0 5.0
4 2655.910005 81.0 7.0
5 1752.293687 93.0 7.0
6 427.844417 242.0 7.0
7 3725.823013 248.0 7.0
If tuples 然后创建嵌套的元组列表并传递给 DataFrame
构造函数:
print (type(df.loc[0, 'ji']))
<class 'tuple'>
df[['a','b']] = pd.DataFrame(df.pop('ji').values.tolist(), index=df.index)
我是一名优秀的程序员,十分优秀!