我如何使用 join/merge/concat/append/add 将这两个表粘贴在一起,使人口年龄 0-14 和 15-64 列并排?
我不需要这两个 DataFrame 的笛卡尔积。
我试过:
population ages = t3.merge(t4, on='Country Name', how='inner')
t3
Country Name Year Population Age 0 - 14
0 Aruba 1960 43.847771
1 Andorra 1960 NaN
2 Afghanistan 1960 43.712284
3 Angola 1960 43.759289
4 Albania 1960 41.757282
t4
Country Name Population Age 15 - 64
0 Aruba 53.667355
1 Andorra NaN
2 Afghanistan 53.834637
3 Angola 53.587101
4 Albania 52.941044
理想情况下
Country Name Population Age 15 - 64 Population Ages 0 - 14
0 Aruba 53.667355 43.847771
1 Andorra NaN NaN
2 Afghanistan 53.834637 43.712284
3 Angola 53.587101 43.759289
4 Albania 52.941044 41.757282
测试结果:
population_ages = t3.merge(t4, on='Country Name', how='inner')
我收到一个 DataFrame,它是 t3、t4 的笛卡尔积,形状为 (734832, 4) 而不是 (13608, 4)
Country Name Year Population Age 0 - 14 Population Age 15 - 64
0 Aruba 1960 43.847771 53.667355
1 Aruba 1960 43.847771 53.890141
2 Aruba 1960 43.847771 54.216911
3 Aruba 1960 43.847771 54.637810
4 Aruba 1960 43.847771 55.119324
5 Aruba 1960 43.847771 55.631104
6 Aruba 1960 43.847771 56.168560
7 Aruba 1960 43.847771 56.736549
8 Aruba 1960 43.847771 57.341782
9 Aruba 1960 43.847771 57.983109
10 Aruba 1960 43.847771 58.674343
11 Aruba 1960 43.847771 59.404758
12 Aruba 1960 43.847771 60.164749
怎么样
t4['Population Age 0 - 14'] = t3['Population Age 0 - 14']
或
pd.concat( t4, t3['Population Age 0 - 14'], axis=1)
完整的工作示例:
import pandas as pd
from StringIO import StringIO
d1 = '''Country Name Year Population Age 0 - 14
Aruba 1960 43.847771
Andorra 1960 NaN
Afghanistan 1960 43.712284
Angola 1960 43.759289
Albania 1960 41.757282'''
d2 = '''Country Name Population Age 15 - 64
Aruba 53.667355
Andorra NaN
Afghanistan 53.834637
Angola 53.587101
Albania 52.941044'''
t3 = pd.DataFrame.from_csv( StringIO(d1), sep='\s{2,}', index_col=None )
print '\nt3:\n',t3
t4 = pd.DataFrame.from_csv( StringIO(d2), sep='\s{2,}', index_col=None )
print '\nt4:\n',t3
print '\n--- merge ---\n'
print pd.merge( t4, t3, on='Country Name')
print pd.merge( t4, t3[ ['Country Name', 'Population Age 0 - 14'] ], on='Country Name')
print '\n--- concat ---\n'
print pd.concat( (t4, t3['Population Age 0 - 14']), axis=1)
print '\n--- [xxx] = [xxx] ---\n'
t4['Population Age 0 - 14'] = t3['Population Age 0 - 14']
print t4
结果:
Country Name Population Age 15 - 64 Population Age 0 - 14
0 Aruba 53.667355 43.847771
1 Andorra NaN NaN
2 Afghanistan 53.834637 43.712284
3 Angola 53.587101 43.759289
4 Albania 52.941044 41.757282
我是一名优秀的程序员,十分优秀!