gpt4 book ai didi

python - Pandas Dataframe(来自 CSV)在整个数据中具有多个标题行

转载 作者:行者123 更新时间:2023-11-28 17:11:23 25 4
gpt4 key购买 nike

Test data file我正在使用从 CSV 文件创建的数据框。数据在整个数据中都有标题行,这些标题行标识了该数据下方的行,直到下一个标题行。

数据看起来像这样。

2001|     |colour |Price | Quantity sold<br>
Shoes|<br>
Blank | High heal Shoes| red |£22|44<br>
Blank | Low heal Shoes|red |£22|44<br>
Slippers|<br>
Blank | High heal Slippers| red |£22|44<br>
Blank | High heal Slippers| blue |£22|44<br>
Blank | Low heal Slippers| red |£22|44<br>
2002| |colour |Price | Quantity sold<br>
Shoes|<br>
Blank | High heal Shoes| red |£22|44<br>
Blank | Low heal Shoes|red |£22|44<br>
Slippers|<br>
Blank | High heal Slippers| red |£22|44<br>
Blank | High heal Slippers| blue |£22|44<br>
Blank | Low heal Slippers| red |£22|44<br>

这是什么类型的结构?

我需要通读这个数据框,从标题行(所以 2001 年、2002 年等等)中获取每年特定项目(比如拖鞋)的所有数据。即使在每个数据行旁边添加一行对应的年份也会有所帮助。

我会很感激一些关于如何做到这一点的帮助?

最佳答案

使用:

df = pd.read_csv('test.csv')

#get value of first column (here 2001)
col = df.columns[0]

#forward fill last previous value
df[col] = df[col].ffill()
#convert first column to numeric
num = pd.to_numeric(df[col], errors='coerce')
#forward fill again, first group replace by value of first column name
df['Year'] = num.ffill().fillna(col)
#change columns names
df = df.rename(columns={col:'Shoes', 'Unnamed: 1':'Names'})
#remove unnecessary rows
df = df[num.isnull() & df['colour'].notnull()].reset_index(drop=True)
print (df)
Shoes Names colour price Quantity sold Year
0 Type A shoes Sub type A red 22 5 2001
1 Type A shoes Sub type A green 11 5 2001
2 Type A shoes Sub type A yellow 44 5 2001
3 Type A shoes Sub type B red 33 5 2001
4 Type A shoes Sub type B green 66 5 2001
5 Type A shoes Sub type B yellow 22 5 2001
6 Type B shoes Sub type A red 11 5 2001
7 Type B shoes Sub type A green 44 5 2001
8 Type B shoes Sub type A yellow 33 5 2001
9 Type B shoes Sub type B red 66 5 2001
10 Type B shoes Sub type B green 21 5 2001
11 Type B shoes Sub type B yellow 22 5 2001
12 Type A shoes Sub type A red 22 5 2002
13 Type A shoes Sub type A green 11 5 2002
14 Type A shoes Sub type A yellow 44 5 2002
15 Type A shoes Sub type B red 33 5 2002
16 Type A shoes Sub type B green 66 5 2002
17 Type A shoes Sub type B yellow 22 5 2002
18 Type B shoes Sub type A red 11 5 2002
19 Type B shoes Sub type A green 44 5 2002
20 Type B shoes Sub type A yellow 33 5 2002
21 Type B shoes Sub type B red 66 5 2002
22 Type B shoes Sub type B green 21 5 2002
23 Type B shoes Sub type B yellow 22 5 2002

编辑:

df = pd.read_csv('testV2.csv', sep='\t')
#print (df)

#get value of first column (here 2001)
col = df.columns[0]

#forward fill last previous value
df[col] = df[col].ffill()
#convert first column to numeric
num = pd.to_numeric(df[col], errors='coerce')
#forward fill again, first group replace by value of first column name
df['Year'] = num.ffill().fillna(col)
#change columns names
df = df.rename(columns={col:'Top Category', 'Unnamed: 1':'Names'})
#remove unnecessary rows
df = df[num.isnull() & (df['Top Category'] != 'Top Category')].reset_index(drop=True)

print (df)

Top Category Names Colour Price Sold Year
0 Item 1 Type 1 - 2 NaN 2001
1 Item 2 Type 1 - 2 NaN 2001
2 Item 3 Type 1 red 2 5 2001
3 Item 3 Type 2 blue 2 5 2001
4 Item 3 Type 3 green 2 5 2001
5 item 4 Type 1 red 2 5 2001
6 item 4 Type 2 blue 3 NaN 2001
7 item 4 Type 3 green 3 NaN 2001
8 Item 1 Type 1 - 3 NaN 2002
9 Item 2 Type 1 - 3 NaN 2002
10 Item 3 Type 1 red 3 5 2002
11 Item 3 Type 2 blue 3 5 2002
12 Item 3 Type 3 green 3 5 2002
13 Item4 Type 1 red 3 NaN 2002
14 Item4 Type 2 blue 3 NaN 2002
15 Item4 Type 3 green 3 NaN 2002
16 Item 1 Type 1 - 3 NaN 2003
17 Item 2 Type 1 - 3 NaN 2003
18 Item 3 Type 1 red 3 5 2003
19 Item 3 Type 2 blue 3 5 2003
20 Item 3 Type 3 green 3 5 2003
21 Item4 Type 1 red 3 NaN 2003
22 Item4 Type 2 blue 3 NaN 2003
23 Item4 Type 3 green 3 NaN 2003

关于python - Pandas Dataframe(来自 CSV)在整个数据中具有多个标题行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47261515/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com