gpt4 book ai didi

python - 从具有不同值和类型的一列创建新的数据框列

转载 作者:行者123 更新时间:2023-12-04 02:32:40 24 4
gpt4 key购买 nike

我正在尝试按鱼类名称创建新列,并将整数作为值,保留索引以在之后进行数据框连接。

import pandas as pd
df = pd.read_csv("fishCounts.csv",index_col=0)
countsdf = df[["Fish Count"]].copy()
countsdf.head()

Fish Count
0 38 Sand Bass, 16 Sculpin, 10 Blacksmith
1 138 Sculpin, 28 Sand Bass
2 150 Sculpin Released, 102 Sculpin, 40 Sanddab
3 156 Sculpin, 29 Sand Bass, 5 Black Croaker, 3 ...
4 161 Sculpin

countsdf.columns = ["fish"]
countsdf.fish = countsdf.fish.str.split(", ", expand=False)
countsdf.head()

fish
0 [38 Sand Bass, 16 Sculpin, 10 Blacksmith]
1 [138 Sculpin, 28 Sand Bass]
2 [150 Sculpin Released, 102 Sculpin, 40 Sanddab]
3 [156 Sculpin, 29 Sand Bass, 5 Black Croaker, 3...
4 [161 Sculpin]

这是我不确定该去哪里的地方。遍历数据框行?列出字典?我可以以不同的方式导入数据以使其更容易吗?

编辑:这就是我想要达到的目标。

  Sand Bass   Sculpin   Blacksmith   Sculpin Released  Sanddab  Black Croaker
0 38 16 10
1 28 138
2 102 150 40
3 29 156 5
4 161

最佳答案

IIUC,我们可以将str.splitstr.extractstack 一起使用

s = df['Fish Count'].str.split(',',expand=True).stack()
s.str.extract('(\d+)(\D+)')

产量-

       0                  1
0 0 38 Sand Bass
1 16 Sculpin
2 10 Blacksmith
1 0 138 Sculpin
1 28 Sand Bass
2 0 150 Sculpin Released
1 102 Sculpin
2 40 Sanddab
3 0 156 Sculpin
1 29 Sand Bass
2 5 Black Croaker
3 3 ...
4 0 161 Sculpin

然后由您决定您想要/需要的格式。

s.str.extract('(\d+)(\D+)').groupby(level=[1]).agg(list)

0 1
0 [38, 138, 150, 156, 161] [ Sand Bass, Sculpin, Sculpin Released, Scu...
1 [16, 28, 102, 29] [ Sculpin, Sand Bass, Sculpin, Sand Bass]
2 [10, 40, 5] [ Blacksmith, Sanddab, Black Croaker]
3 [3] [ ...]

s.str.extract('(\d+)(\D+)').unstack(1)

0 1
0 1 2 3 0 1 2 3
0 38 16 10 NaN Sand Bass Sculpin Blacksmith NaN
1 138 28 NaN NaN Sculpin Sand Bass NaN NaN
2 150 102 40 NaN Sculpin Released Sculpin Sanddab NaN
3 156 29 5 3 Sculpin Sand Bass Black Croaker ...
4 161 NaN NaN NaN Sculpin NaN NaN NaN

s.str.extract('(\d+)(\D+)').values


array([['38', ' Sand Bass'],
['16', ' Sculpin'],
['10', ' Blacksmith'],
['138', ' Sculpin'],
['28', ' Sand Bass'],
['150', ' Sculpin Released'],
['102', ' Sculpin'],
['40', ' Sanddab'],
['156', ' Sculpin'],
['29', ' Sand Bass'],
['5', ' Black Croaker'],
['3', ' ...'],
['161', ' Sculpin']], dtype=object)

你可以把它变成一个字典。

# actually i'd use fish : num - 
# sorry closed my ide keys can only be unique in a dict.
{num : fish for num, fish in s.str.extract('(\d+)(\D+)').values}

{'38': ' Sand Bass',
'16': ' Sculpin',
'10': ' Blacksmith',
'138': ' Sculpin',
'28': ' Sand Bass',
'150': ' Sculpin Released',
'102': ' Sculpin',
'40': ' Sanddab',
'156': ' Sculpin',
'29': ' Sand Bass',
'5': ' Black Croaker',
'3': ' ...',
'161': ' Sculpin'}

关于python - 从具有不同值和类型的一列创建新的数据框列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63254598/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com