作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我有以下广泛的数据集:
import pandas as pd
from io import StringIO
testcsv = """P,N,N_relerr,F,F_relerr
10,6073.98,0.0022,61.973,0.0036
12,6412.97,0.0021,65.405,0.0036
4,4141.24,0.0019,42.8202,0.0032
6,5009.83,0.0019,51.9615,0.0031
8,5601.87,0.0025,57.8129,0.0042"""
csvfile = StringIO(testcsv)
df = pd.read_csv(csvfile)
P N N_relerr F F_relerr
0 10 6073.98 0.0022 61.9730 0.0036
1 12 6412.97 0.0021 65.4050 0.0036
2 4 4141.24 0.0019 42.8202 0.0032
3 6 5009.83 0.0019 51.9615 0.0031
4 8 5601.87 0.0025 57.8129 0.0042
我想将其转换为具有“计数”(N 和 F 列)和相关错误(N_relerr 和 F_relerr)的长数据集:
P which count err
0 10 N 6073.9800 0.0022
1 12 N 6412.9700 0.0021
2 4 N 4141.2400 0.0019
3 6 N 5009.8300 0.0019
4 8 N 5601.8700 0.0025
5 10 F 61.9730 0.0036
6 12 F 65.4050 0.0036
7 4 F 42.8202 0.0032
8 6 F 51.9615 0.0031
9 8 F 57.8129 0.0042
因为这是我需要使用 plotnine 绘制误差线的格式,其中“N”和“F”计数相互区分。我目前非常丑陋的解决方案是:
dflong = (df[['P', 'N', 'F']]
.melt(id_vars=['P'],
var_name='which',
value_name='count'))
dferr = (df[['P', 'N_relerr', 'F_relerr']]
.melt(id_vars=['P'],
var_name='which',
value_name='count_relerr'))
dflong['err'] = dferr['count_relerr'].copy()
我的猜测是,有一种优雅的方法可以使用多索引列和堆栈来做到这一点,从一个看起来像这样的数据集开始:
N F
P counts relerr counts relerr
0 10 6073.98 0.0022 61.9730 0.0036
1 12 6412.97 0.0021 65.4050 0.0036
2 4 4141.24 0.0019 42.8202 0.0032
3 6 5009.83 0.0019 51.9615 0.0031
4 8 5601.87 0.0025 57.8129 0.0042
我可以从以下位置创建该数据框:
cols = {'P': 'P',
'N': ('N', 'counts'), 'N_relerr': ('N', "relerr"),
'F': ('F', 'counts'), 'F_relerr': ('F', 'relerr')}
nested_df = df.rename(columns=cols)
nested_df.columns = [c if isinstance(c, tuple)
else ('', c) for c in nested_df.columns]
nested_df.columns = pd.MultiIndex.from_tuples(nested_df.columns)
(我想一定有更好的方法),但我还没有弄清楚如何有效地使用堆栈来获得我想要的东西。
有人知道规范的解决方案吗?谢谢!
最佳答案
你可以使用 pd.wide_to_long
,非常适合“同时熔化”的情况,只需对列进行一些重命名。
import pandas as pd
from io import StringIO
testcsv = """P,N,N_relerr,F,F_relerr
10,6073.98,0.0022,61.973,0.0036
12,6412.97,0.0021,65.405,0.0036
4,4141.24,0.0019,42.8202,0.0032
6,5009.83,0.0019,51.9615,0.0031
8,5601.87,0.0025,57.8129,0.0042"""
csvfile = StringIO(testcsv)
df = pd.read_csv(csvfile)
#Rename columns with set_axis
d1 = df.set_axis(['P', 'Count_N', 'Err_N', 'Count_F', 'Err_F'], axis=1, inplace=False)
#Use pd.wide_to_long to reshape dataframe
pd.wide_to_long(d1, ['Count', 'Err'], 'P', 'which', sep='_', suffix='.+')
输出:
Count Err
P which
10 N 6073.9800 0.0022
12 N 6412.9700 0.0021
4 N 4141.2400 0.0019
6 N 5009.8300 0.0019
8 N 5601.8700 0.0025
10 F 61.9730 0.0036
12 F 65.4050 0.0036
4 F 42.8202 0.0032
6 F 51.9615 0.0031
8 F 57.8129 0.0042
关于python - 如何使用 pandas melt 获取值及其错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54371343/
我是一名优秀的程序员,十分优秀!