gpt4 book ai didi

python - 使用python将单元格数据拆分为多行

转载 作者:行者123 更新时间:2023-12-03 16:35:44 25 4
gpt4 key购买 nike

我想使用 python 将单元格中包含的数据拆分为多行。下面给出了这样的一个例子:

这是我的数据:

fuel          cert_region   veh_class   air_pollution      city_mpg     hwy_mpg    cmb_mpg  smartway
ethanol/gas FC SUV 6/8 9/14 15/20 1/16 yes
ethanol/gas FC SUV 6/3 1/14 14/19 10/16 no

我想把它转换成这种形式:
fuel          cert_region   veh_class   air_pollution     city_mpg     hwy_mpg    cmb_mpg   smartway
ethanol FC SUV 6 9 15 1 yes
gas FC SUV 8 14 20 16 yes
ethanol FC SUV 6 1 14 10 no
gas FC SUV 3 14 19 16 no

以下代码返回错误:
import numpy as np
from itertools import chain

# return list from series of comma-separated strings
def chainer(s):
return list(chain.from_iterable(s.str.split('/')))

# calculate lengths of splits
lens = df_08['fuel'].str.split('/').map(len)

# create new dataframe, repeating or chaining as appropriate
res = pd.DataFrame({
'cert_region': np.repeat(df_08['cert_region'], lens),
'veh_class': np.repeat(df_08['veh_class'], lens),
'smartway': np.repeat(df_08['smartway'], lens),
'fuel': chainer(df_08['fuel']),
'air_pollution': chainer(df_08['air_pollution']),
'city_mpg': chainer(df_08['city_mpg']),
'hwy_mpg': chainer(df_08['hwy_mpg']),
'cmb_mpg': chainer(df_08['cmb_mpg'])})

它给了我这个错误:
 TypeError                                 Traceback (most recent call last)
<ipython-input-31-916fed75eee2> in <module>()
20 'fuel': chainer(df_08['fuel']),
21 'air_pollution_score': chainer(df_08['air_pollution_score']),
---> 22 'city_mpg': chainer(df_08['city_mpg']),
23 'hwy_mpg': chainer(df_08['hwy_mpg']),
24 'cmb_mpg': chainer(df_08['cmb_mpg']),

<ipython-input-31-916fed75eee2> in chainer(s)
4 # return list from series of comma-separated strings
5 def chainer(s):
----> 6 return list(chain.from_iterable(s.str.split('/')))
7
8 # calculate lengths of splits

TypeError: 'float' object is not iterable

但是 city_mpgObject数据类型:
   <class 'pandas.core.frame.DataFrame'>
RangeIndex: 2404 entries, 0 to 2403
Data columns (total 14 columns):
fuel 2404 non-null object
cert_region 2404 non-null object
veh_class 2404 non-null object
air_pollution 2404 non-null object
city_mpg 2205 non-null object
hwy_mpg 2205 non-null object
cmb_mpg 2205 non-null object
smartway 2404 non-null object

最佳答案

我的建议是退出 Pandas ,进行计算并将结果放回数据帧中。在我看来,操纵起来要容易得多,而且我想更快地相信:

from itertools import chain

第 1 步:转换为 dict :
M = df.to_dict('records')

第 2 步:进行列表理解并拆分值:
res = [[(key,*value.split('/'))
for key,value in d.items()]
for d in M]

步骤3:找到最长行的长度。我们需要这样来确保所有行的长度相同:
 longest = max(len(line) for line in chain(*res))
print(longest)
#3

第四步:最长的条目是3;我们需要确保调整小于 3 的行:
explode = [[(entry[0], entry[-1], entry[-1])
if len(entry) < longest else entry for entry in box]
for box in res]

print(explode)

[[('fuel', 'ethanol', 'gas'),
('cert_region', 'FC', 'FC'),
('veh_class', 'SUV', 'SUV'),
('air_pollution', '6', '8'),
('city_mpg', '9', '14'),
('hwy_mpg', '15', '20'),
('cmb_mpg', '1', '16'),
('smartway', 'yes', 'yes')],
[('fuel', 'ethanol', 'gas'),
('cert_region', 'FC', 'FC'),
('veh_class', 'SUV', 'SUV'),
('air_pollution', '6', '3'),
('city_mpg', '1', '14'),
('hwy_mpg', '14', '19'),
('cmb_mpg', '10', '16'),
('smartway', 'no', 'no')]]

第 4 步:现在我们可以将键与相应的值配对以获取字典:
result = {start[0] :(*start[1:],*end[1:])
for start,end in zip(*explode)}

print(result)

{'fuel': ('ethanol', 'gas', 'ethanol', 'gas'),
'cert_region': ('FC', 'FC', 'FC', 'FC'),
'veh_class': ('SUV', 'SUV', 'SUV', 'SUV'),
'air_pollution': ('6', '8', '6', '3'),
'city_mpg': ('9', '14', '1', '14'),
'hwy_mpg': ('15', '20', '14', '19'),
'cmb_mpg': ('1', '16', '10', '16'),
'smartway': ('yes', 'yes', 'no', 'no')}

将结果读入数据帧:
pd.DataFrame(result)

fuel cert_region veh_class air_pollution city_mpg hwy_mpg cmb_mpg smartway
0 ethanol FC SUV 6 9 15 1 yes
1 gas FC SUV 8 14 20 16 yes
2 ethanol FC SUV 6 1 14 10 no
3 gas FC SUV 3 14 19 16 no

关于python - 使用python将单元格数据拆分为多行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61149788/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com