gpt4 book ai didi

python-3.x - 读取由 nan 行分割的数据框,并在 Python 中将它们 reshape 为多个数据框

转载 作者:行者123 更新时间:2023-12-04 01:16:41 24 4
gpt4 key购买 nike

我有一个示例 excel 文件 data1.xlsx 来自 here ,其中有一个 Sheet1 如下:

enter image description here

现在我想用 openpyxlpandas 读取它,然后将它们转换成新的 df1df2 ,我最终将它们保存为 pricequantity 表:

价格表:

enter image description here

数量表

enter image description here

我用过的代码:

df = pd.read_excel('./data1.xlsx', sheet_name = 'Sheet1')
df_list = np.split(df, df[df.isnull().all(1)].index)

for df in df_list:
print(df, '\n')

输出:

         bj  Unnamed: 1  Unnamed: 2  Unnamed: 3 Unnamed: 4
0 year 2018.0 2019.0 2020.0 sum
1 price 12.0 4.0 5.0 21
2 quantity 5.0 5.0 3.0 13

bj Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
3 NaN NaN NaN NaN NaN
4 sh NaN NaN NaN NaN
5 year 2018.0 2019.0 2020.0 sum
6 price 5.0 6.0 7.0 18
7 quantity 7.0 5.0 4.0 16

bj Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
8 NaN NaN NaN NaN NaN

bj Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
9 NaN NaN NaN NaN NaN
10 gz NaN NaN NaN NaN
11 year 2018.0 2019.0 2020.0 sum
12 price 2.0 3.0 1.0 6
13 quantity 6.0 9.0 3.0 18

bj Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
14 NaN NaN NaN NaN NaN

bj Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
15 NaN NaN NaN NaN NaN
16 sz NaN NaN NaN NaN
17 year 2018.0 2019.0 2020.0 sum
18 price 8.0 2.0 3.0 13
19 quantity 5.0 4.0 3.0 12

我如何在 Python 中做到这一点?非常感谢。

最佳答案

使用:

#add header=None for default columns names
df = pd.read_excel('./data1.xlsx', sheet_name = 'Sheet1', header=None)

#convert columns by second row
df.columns = df.iloc[1].rename(None)

#create new column `city` by forward filling non missing values by second column
df.insert(0, 'city', df.iloc[:, 0].mask(df.iloc[:, 1].notna()).ffill())
#convert floats to integers
df.columns = [int(x) if isinstance(x, float) else x for x in df.columns]
#convert column year to index
df = df.set_index('year')

print (df)
city 2018 2019 2020 sum
year
bj bj NaN NaN NaN NaN
year bj 2018.0 2019.0 2020.0 sum
price bj 12.0 4.0 5.0 21
quantity bj 5.0 5.0 3.0 13
NaN bj NaN NaN NaN NaN
sh sh NaN NaN NaN NaN
year sh 2018.0 2019.0 2020.0 sum
price sh 5.0 6.0 7.0 18
quantity sh 7.0 5.0 4.0 16
NaN sh NaN NaN NaN NaN
NaN sh NaN NaN NaN NaN
gz gz NaN NaN NaN NaN
year gz 2018.0 2019.0 2020.0 sum
price gz 2.0 3.0 1.0 6
quantity gz 6.0 9.0 3.0 18
NaN gz NaN NaN NaN NaN
NaN gz NaN NaN NaN NaN
sz sz NaN NaN NaN NaN
year sz 2018.0 2019.0 2020.0 sum
price sz 8.0 2.0 3.0 13
quantity sz 5.0 4.0 3.0 12

df1 = df.loc['price'].reset_index(drop=True)
print (df1)
city 2018 2019 2020 sum
0 bj 12.0 4.0 5.0 21
1 sh 5.0 6.0 7.0 18
2 gz 2.0 3.0 1.0 6
3 sz 8.0 2.0 3.0 13

df2 = df.loc['quantity'].reset_index(drop=True)
print (df2)
city 2018 2019 2020 sum
0 bj 5.0 5.0 3.0 13
1 sh 7.0 5.0 4.0 16
2 gz 6.0 9.0 3.0 18
3 sz 5.0 4.0 3.0 12

通过mode='a' 参数可以将DataFrame 写入现有文件,link :

with pd.ExcelWriter('data1.xlsx', mode='a') as writer:  
df1.to_excel(writer, sheet_name='price')
df2.to_excel(writer, sheet_name='quantity')

关于python-3.x - 读取由 nan 行分割的数据框,并在 Python 中将它们 reshape 为多个数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63245428/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com