gpt4 book ai didi

Python 将一列拆分为多列并将拆分的列重新附加到原始数据帧中

转载 作者:行者123 更新时间:2023-12-04 09:49:20 25 4
gpt4 key购买 nike

我想将数据帧中的一列拆分为多列,然后将这些列附加回原始数据帧,并根据拆分列是否包含特定字符串来划分原始数据帧。

我有一个数据框,其中有一列值由分号分隔,如下所示。

import pandas as pd
data = {'ID':['1','2','3','4','5','6','7'],
'Residence':['USA;CA;Los Angeles;Los Angeles', 'USA;MA;Suffolk;Boston', 'Canada;ON','USA;FL;Charlotte', 'NA', 'Canada;QC', 'USA;AZ'],
'Name':['Ann','Betty','Carl','David','Emily','Frank', 'George'],
'Gender':['F','F','M','M','F','M','M']}
df = pd.DataFrame(data)

然后我将列拆分如下,并根据是否包含字符串 USA 将拆分的列分成两部分。或不。
address = df['Residence'].str.split(';',expand=True)
country = address[0] != 'USA'
USA, nonUSA = address[~country], address[country]

现在,如果您运行 USAnonUSA ,您会注意到 nonUSA 中有额外的列,还有一行没有国家信息。所以我摆脱了那些 NA值。
USA.columns = ['Country', 'State', 'County', 'City']
nonUSA.columns = ['Country', 'State']
nonUSA = nonUSA.dropna(axis=0, subset=[1])
nonUSA = nonUSA[nonUSA.columns[0:2]]

现在我想附上 USAnonUSA到我的原始数据帧,这样我就会得到两个如下所示的数据帧:
USAdata = pd.DataFrame({'ID':['1','2','4','7'], 
'Name':['Ann','Betty','David','George'],
'Gender':['F','F','M','M'],
'Country':['USA','USA','USA','USA'],
'State':['CA','MA','FL','AZ'],
'County':['Los Angeles','Suffolk','Charlotte','None'],
'City':['Los Angeles','Boston','None','None']})
nonUSAdata = pd.DataFrame({'ID':['3','6'],
'Name':['David','Frank'],
'Gender':['M','M'],
'Country':['Canada', 'Canada'],
'State':['ON','QC']})

我被困在这里了。如何将我的原始数据帧拆分为 Residence 的人包括 USA与否,并附上来自 Residence 的拆分列( USAnonUSA )回到我的原始数据帧?

(此外,我刚刚上传了迄今为止的所有内容,但我很好奇是否有更清洁/更智能的方法来做到这一点。)

最佳答案

原始数据中有唯一索引,两个DataFrame的下一段代码都没有改变,所以你可以使用 concat 用于连接在一起,然后通过 DataFrame.join 添加到原始文件中或 concataxis=1 :

address = df['Residence'].str.split(';',expand=True)
country = address[0] != 'USA'
USA, nonUSA = address[~country], address[country]
USA.columns = ['Country', 'State', 'County', 'City']

nonUSA = nonUSA.dropna(axis=0, subset=[1])
nonUSA = nonUSA[nonUSA.columns[0:2]]
#changed order for avoid error
nonUSA.columns = ['Country', 'State']
df = pd.concat([df, pd.concat([USA, nonUSA])], axis=1)

或者:
df = df.join(pd.concat([USA, nonUSA]))
print (df)
ID Residence Name Gender Country State \
0 1 USA;CA;Los Angeles;Los Angeles Ann F USA CA
1 2 USA;MA;Suffolk;Boston Betty F USA MA
2 3 Canada;ON Carl M Canada ON
3 4 USA;FL;Charlotte David M USA FL
4 5 NA Emily F NaN NaN
5 6 Canada;QC Frank M Canada QC
6 7 USA;AZ George M USA AZ

County City
0 Los Angeles Los Angeles
1 Suffolk Boston
2 NaN NaN
3 Charlotte None
4 NaN NaN
5 NaN NaN
6 None None

但似乎可以简化:
c = ['Country', 'State', 'County', 'City']
df[c] = df['Residence'].str.split(';',expand=True)
print (df)
ID Residence Name Gender Country State \
0 1 USA;CA;Los Angeles;Los Angeles Ann F USA CA
1 2 USA;MA;Suffolk;Boston Betty F USA MA
2 3 Canada;ON Carl M Canada ON
3 4 USA;FL;Charlotte David M USA FL
4 5 NA Emily F NA None
5 6 Canada;QC Frank M Canada QC
6 7 USA;AZ George M USA AZ

County City
0 Los Angeles Los Angeles
1 Suffolk Boston
2 None None
3 Charlotte None
4 None None
5 None None
6 None None

关于Python 将一列拆分为多列并将拆分的列重新附加到原始数据帧中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62035985/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com