gpt4 book ai didi

python - 通过构造函数创建 MultiIndexed 数据框

转载 作者:太空宇宙 更新时间:2023-11-03 12:44:08 25 4
gpt4 key购买 nike

给定两个数组:

x
[('010_628', '2543677'), ('010_228', '2543677'), ('015_634', '2543677')]

y
array([['me', 10228955],
['me', 10228955],
['me', 10228955]], dtype=object)

目前,这段代码为我提供了一个具有元组平面索引的数据框:

df = pd.DataFrame(x, index=y, columns=['pm_code',   'sec_pm'])
df
pm_code sec_pm
(me, 10228955) 010_628 2543677
(me, 10228955) 010_228 2543677
(me, 10228955) 015_634 2543677

我怎样才能创建一个像这样的 MultiIndex 数据框?

                  pm_code   sec_pm
state site_no
me 10228955 010_628 2543677
010_228 2543677
015_634 2543677

我试过使用 pd.MultiIndex.from_tuples 但我没能做到这一点。感谢您的帮助。


附录:性能比较

# unutbu #1
%timeit pd.DataFrame(x, index=pd.MultiIndex.from_arrays(y.T), columns=['pm_code', 'sec_pm'])
1000 loops, best of 3: 1.25 ms per loop

# unutbu #2
%timeit pd.DataFrame(x, index=pd.MultiIndex.from_tuples(y.tolist()), columns=['pm_code', 'sec_pm'])
1000 loops, best of 3: 1.47 ms per loop

# piRSquared
%timeit pd.DataFrame(x, index=y.T.tolist(), columns=['pm_code', 'sec_pm'])
1000 loops, best of 3: 1.41 ms per loop

# Andrew L
%timeit pd.DataFrame(x, index=[y[:,0], y[:,1]], columns=['pm_code', 'sec_pm'])
1000 loops, best of 3: 1.29 ms per loop

x2 = np.repeat(x, 10000, 0)
y2 = np.repeat(x, 10000, 0)

# unutbu #1
%timeit pd.DataFrame(x2, index=pd.MultiIndex.from_arrays(y2.T), columns=['pm_code', 'sec_pm'])
100 loops, best of 3: 17.3 ms per loop

# unutbu #2
%timeit pd.DataFrame(x2, index=pd.MultiIndex.from_tuples(y2.tolist()), columns=['pm_code', 'sec_pm'])
10 loops, best of 3: 30.5 ms per loop

# piRSquared
%timeit pd.DataFrame(x2, index=y2.T.tolist(), columns=['pm_code', 'sec_pm'])
10 loops, best of 3: 37.2 ms per loop

# Andrew L
%timeit pd.DataFrame(x2, index=[y2[:,0], y2[:,1]], columns=['pm_code', 'sec_pm'])
100 loops, best of 3: 22 ms per loop

来自这个 question 的数据.

最佳答案

您可以使用 pd.MultiIndex.from_arrays(y.T):

In [53]: pd.DataFrame(x, index=pd.MultiIndex.from_arrays(y.T), columns=['pm_code',   'sec_pm'])
Out[53]:
pm_code sec_pm
me 10228955 010_628 2543677
10228955 010_228 2543677
10228955 015_634 2543677

pd.MultiIndex.from_tuples(y.tolist()):

In [54]: pd.DataFrame(x, index=pd.MultiIndex.from_tuples(y.tolist()), columns=['pm_code',   'sec_pm'])
Out[54]:
pm_code sec_pm
me 10228955 010_628 2543677
10228955 010_228 2543677
10228955 015_634 2543677

关于python - 通过构造函数创建 MultiIndexed 数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45946507/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com