我有一个数据框 df1,如下所示:
Observed PeakFlow (cfs) Modelled Peak Flow (cfs)
9.78768 10.93963
1.999368 2.037152
11.63652 8.541796
3.237471 3.970588
54.04929 22.94427
4.68197 3.139319
16.41346 12.17337
14.97399 7.224458
2.114172 5.775542
22.80021 22.69659
25.3347 13.0805
33.4092 11.3452
13.81051 7.640867
6.794793 4.26161
9.008561 6.634675
5.957804 4.176471
2.337406 2.071208
32.6419 4.368421
3.567871 2.894737
5.776844 3.0387
39.54993 5.849845
4.511765 2.28483
6.989101 3.218266
14.63979 9.024768
我还有另一个数据框 df2,如下所示:
1-1 Match | -15% Peak Flow | +25% Peak Flow
-----------------------------------------------------
X-Axis| Y-Axis | X-Axis| Y-Axis | X-Axis| Y-Axis
-----------------------------------------------------
0 | 0 | 0 | 0 | 0 | 0
200 | 200 | 200 | 170 | 200 | 250
我想要这两个数据帧的散点图。所需的输出如下图所示。这怎么可能?
当我将 df2 加载为 csv 时,我得到如下图所示的结果。如何删除未命名的部分并将其作为代码中所示的合并列?
您可以使用:
print (df2)
1-1 Match -15% Peak Flow +25% Peak Flow
X-Axis Y-Axis X-Axis Y-Axis X-Axis Y-Axis
0 0 0 0 0 0 0
1 200 200 200 170 200 250
print (df2.columns)
MultiIndex(levels=[['+25% Peak Flow', '-15% Peak Flow', '1-1 Match'], ['X-Axis', 'Y-Axis']],
labels=[[2, 2, 1, 1, 0, 0], [0, 1, 0, 1, 0, 1]])
ax = df.plot.scatter(x='Modelled Peak Flow (cfs)', y='Observed PeakFlow (cfs)', s=50)
for i, df3 in df2.groupby(level=0, axis=1):
df3 = df3.set_index([(i, 'X-Axis')])
df3.index.name = None
df3.columns = [i]
# print (df3)
df3.plot(ax=ax)
如果需要自定义颜色
和标记
:
ax = df.plot.scatter(x='Modelled Peak Flow (cfs)',
y='Observed PeakFlow (cfs)',
s=50,
marker='d',
color='r')
df21 = df2.xs('1-1 Match', axis=1).set_index('X-Axis')
df21.index.name = None
df21.columns = ['1-1 Match']
df21.plot(c='black', ax=ax)
df22 = df2.xs('-15% Peak Flow', axis=1).set_index('X-Axis')
df22.index.name = None
df22.columns = ['-15% Peak Flow']
df22.plot(c='blue',ls='--', ax=ax)
df23 = df2.xs('+25% Peak Flow', axis=1).set_index('X-Axis')
df23.index.name = None
df23.columns = ['+25% Peak Flow']
df23.plot(c='blue',ls='--', ax=ax)
编辑1:
MultiIndex
是有问题的,所以需要:
df2 = df2.read_csv('file', header=[0,1])
print (df2)
1-1 Match Unnamed: 1_level_0 -15% Peak Flow Unnamed: 3_level_0 \
X-Axis Y-Axis X-Axis Y-Axis
0 0 0 0 0
1 200 200 200 170
+25% Peak Flow Unnamed: 5_level_0
X-Axis Y-Axis
0 0 0
1 200 250
cols = df2.columns.get_level_values(0)
cols = cols.where(~cols.str.contains('Unnamed')).to_series().ffill().tolist()
df2.columns = [cols, df2.columns.get_level_values(1)]
df2 = df2.sort_index(level=0, axis=1)
print (df2)
+25% Peak Flow -15% Peak Flow 1-1 Match
X-Axis Y-Axis X-Axis Y-Axis X-Axis Y-Axis
0 0 0 0 0 0 0
1 200 250 200 170 200 200
print (df2.columns)
MultiIndex(levels=[['+25% Peak Flow', '-15% Peak Flow', '1-1 Match'],
['X-Axis', 'Y-Axis']],
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])
我是一名优秀的程序员,十分优秀!