gpt4 book ai didi

Python - 转换数据框和切片

转载 作者:太空狗 更新时间:2023-10-30 01:18:05 26 4
gpt4 key购买 nike

我已附上屏幕截图以帮助解释。我有一个从克利夫兰心脏数据集中提取的数据框,它包含 76 列并将它们放入 7 列并将额外的列包装到下一行。我试图弄清楚如何将该数据帧转换为可读格式,如右侧数据帧所示。

enter image description here

变量 xyz 将始终相同,但我列出的其他字母变量将不同。我以为我可以使用 data.loc[:, :'xyz'] 开始,但我不确定从这里去哪里:

data = pd.read_csv("../resources/cleveland.data")
data.loc[:, :'xyz']

然后我将不得不从那里开始并为这些变量分配列名。令人惊讶的是,一旦我解决了这个问题,训练、测试和验证部分就会容易得多。先谢谢您的帮助。 (我是菜鸟)

最佳答案

输入数据

1   a   b   c
d xyz 2 e
f g h xyz
3 i j k

代码

import pandas as pd
import numpy as np

# The initial data doesn't contain header so set header to None
df = pd.read_csv("../resources/cleveland.data", header=None)
cols = df.columns.tolist()

# Reset the index to get the line number in the durty file
df = df.reset_index()

# After having melt the df, you can filter the df in order to have every values in one column.
# Those values are in the right order
df = pd.melt(df, id_vars=['index'], value_vars=cols)
df = df.sort_values(by=['index', 'variable'])

# Then you can set the line number
df['line'] = np.where(df.value == 'xyz', 1, np.nan)
df.line = df.line.cumsum()
df.line = df.line.bfill()

# If the file doesn't end with 'xyz', we have to set the line number to df.line.max() + 1
df.loc[df.line.isna(), 'line'] = df.line.max() + 1
df.line = df.line.ffill()

# We can set the column names as interger with a groupby cumsum
df['one'] = 1
df['col_name'] = df.groupby(['line'])['one'].cumsum()
df['col_name'] = "col_" + df['col_name'].astype('str')

# Then we can pivot the table
df = df[['value', 'line', 'col_name']]
df = df.pivot(index='line', columns='col_name', values='value')
print(df)

输出数据

col_name col_1 col_2 col_3 col_4 col_5 col_6
line
1.0 1 a b c d xyz
2.0 2 e f g h xyz
3.0 3 i j k NaN NaN

关于Python - 转换数据框和切片,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54727597/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com