gpt4 book ai didi

python - reshape Pandas 数据框失败

转载 作者:太空宇宙 更新时间:2023-11-04 00:10:16 25 4
gpt4 key购买 nike

我想 reshape 我的数据框,它只有键值对。

例如,

             key                                              value
0 Message-ID <5525962.1075855679785.JavaMail.evans@thyme>
1 Date Wed, 13 Dec 2000 07:04:00 -0800 (PST)
2 From phillip.allen@enron.com
3 To christi.nicolay@enron.com, james.steffes@enron...
4 X-From Phillip K Allen
5 X-To Christi L Nicolay, James D Steffes, Jeff Dasov...
6 X-cc: None
7 X-bcc: None
8 X-Origin Allen-P
9 Message-ID <4650921.1075855679981.JavaMail.evans@thyme>
10 Date Tue, 5 Dec 2000 07:31:00 -0800 (PST)
11 From ina.rangel@enron.com
12 To amanda.huble@enron.com
13 X-From Ina Rangel
14 X-To Amanda Huble
15 X-cc: None
16 X-bcc: None
17 X-Origin Allen-P

我想把它变成:

Message-ID       Date                  From             To        X-From                 X-To                            X-cc:  X-bcc:  X-Origin
<5525962.10... Wed, 13 Dec 2000... phillip.allen... christi.nicolay.. Phillip K Allen.. Christi L Nicolay, Ja... NaN NaN Allen-P
<4650921.10... Tue, 5 Dec 2000 ... ina.rangel... amanda.huble@... Ina Rangel Amanda Huble NaN NaN Allen-P

我尝试旋转一个表,但我很困惑我应该给出什么作为索引参数。请帮我解决这个问题。

如果您发现它,请随时将其标记为重复。

最佳答案

如果每个组总是有 9 个值,则可以使用 numpy.reshape对于带有 DataFrame 构造函数的 2d array,对于列值也采用列 key 的前 9 个值:

print (df['value'].values.reshape(-1, 9))
[['<5525962.1075855679785.JavaMail.evans@thyme>'
'Wed, 13 Dec 2000 07:04:00 -0800 (PST)' 'phillip.allen@enron.com'
'christi.nicolay@enron.com, james.steffes@enron...' 'Phillip K Allen'
'Christi L Nicolay, James D Steffes, Jeff Dasov...' 'None' 'None'
'Allen-P']
['<4650921.1075855679981.JavaMail.evans@thyme>'
'Tue, 5 Dec 2000 07:31:00 -0800 (PST)' 'ina.rangel@enron.com'
'amanda.huble@enron.com' 'Ina Rangel' 'Amanda Huble' 'None' 'None'
'Allen-P']]


df = pd.DataFrame(df['value'].values.reshape(-1, 9), columns=df['key'].iloc[:9])
print (df)
key Message-ID \
0 <5525962.1075855679785.JavaMail.evans@thyme>
1 <4650921.1075855679981.JavaMail.evans@thyme>

key Date From \
0 Wed, 13 Dec 2000 07:04:00 -0800 (PST) phillip.allen@enron.com
1 Tue, 5 Dec 2000 07:31:00 -0800 (PST) ina.rangel@enron.com

key To X-From \
0 christi.nicolay@enron.com, james.steffes@enron... Phillip K Allen
1 amanda.huble@enron.com Ina Rangel

key X-To X-cc: X-bcc: X-Origin
0 Christi L Nicolay, James D Steffes, Jeff Dasov... None None Allen-P
1 Amanda Huble None None Allen-P

如果每个组的数据中总是 Message-ID 行是可能的,请使用 set_index使用由 cumsum 创建的助手 Series bool 掩码 - 比较 eq == 用于标识每个组的开始:

df = df.set_index([df['key'].eq('Message-ID').cumsum(), 'key'])['value'].unstack()
print (df)
key Date From \
key
1 Wed, 13 Dec 2000 07:04:00 -0800 (PST) phillip.allen@enron.com
2 Tue, 5 Dec 2000 07:31:00 -0800 (PST) ina.rangel@enron.com

key Message-ID \
key
1 <5525962.1075855679785.JavaMail.evans@thyme>
2 <4650921.1075855679981.JavaMail.evans@thyme>

key To X-From \
key
1 christi.nicolay@enron.com, james.steffes@enron... Phillip K Allen
2 amanda.huble@enron.com Ina Rangel

key X-Origin X-To X-bcc: X-cc:
key
1 Allen-P Christi L Nicolay, James D Steffes, Jeff Dasov... None None
2 Allen-P Amanda Huble None None

关于python - reshape Pandas 数据框失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52733660/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com