gpt4 book ai didi

Pandas:如何在将系列分配为行时将缺失的列添加到 DataFrame

转载 作者:行者123 更新时间:2023-12-05 06:40:13 25 4
gpt4 key购买 nike

我有一堆 Pandas Series,它们一次生成一个,我想将它们中的每一个分配为 DataFrame 中的一行,DataFrame 的列是所有 Series 索引值的并集。

例如:

import numpy as np
import pandas as pd

# the names of all series are known in advance
df = pd.DataFrame(index=['A', 'B'])

# in reality there are many long series, not just two
a = pd.Series({'v':0, 'w':1, 'x':2, 'y':3}, name='A')
b = pd.Series({ 'x':4, 'y':5, 'z':6}, name='B')

# generate and assign each series as one row in the frame
for row in (a,b):
# create new columns - this is what I want to eliminate
for column in row.index.difference(df.columns):
df[column] = np.nan

df.loc[row.name] = row

print(df)

这会产生预期的结果:

     v    w    x    y    z
A 0.0 1.0 2.0 3.0 NaN
B NaN NaN 4.0 5.0 6.0

但是如果没有 for column 循环,它会生成一个没有列的空 DataFrame。

我希望消除 for column 循环。我不知道所有的专栏提前。我还希望以矢量化方式将 np.nan 分配给所有新列,但由于我在此处提交的旧问题,这不起作用:https://github.com/pandas-dev/pandas/issues/13658

最佳答案

pd.DataFrame.set_value将自动添加列。

df = pd.DataFrame()

# in reality there are many long series, not just two
a = pd.Series({'v':0, 'w':1, 'x':2, 'y':3}, name='A')
b = pd.Series({ 'x':4, 'y':5, 'z':6}, name='B')

# generate and assign each series as one row in the frame
for row in (a,b):
for i, v in row.iteritems():
df.set_value(row.name, i, v)

print(df)

v w x y z
A 0.0 1.0 2.0 3.0 NaN
B NaN NaN 4.0 5.0 6.0

这仍然是一个循环,但是 set_value 非常灵活。

时间测试
小数据

df = pd.DataFrame()
los = [pd.Series(1, [i], name=i) for i in range(10)]

stmt1 = """
for row in los:
for column in row.index.difference(df.columns):
df[column] = np.nan

df.loc[row.name, row.index] = row
"""

stmt2 = """
for row in los:
for col, value in row.iteritems():
df.set_value(row.name, col, value)
"""

setup = """
from __main__ import df, los, np
"""

print(timeit(stmt1, setup, number=100))
print(timeit(stmt2, setup, number=100))

0.5426401197910309
0.01039268122985959

大数据

df = pd.DataFrame()
los = [pd.Series(1, [i], name=i) for i in range(1000)]

stmt1 = """
for row in los:
for column in row.index.difference(df.columns):
df[column] = np.nan

df.loc[row.name, row.index] = row
"""

stmt2 = """
for row in los:
for col, value in row.iteritems():
df.set_value(row.name, col, value)
"""

setup = """
from __main__ import df, los, np
"""

print(timeit(stmt1, setup, number=100))
print(timeit(stmt2, setup, number=100))

63.69273182330653
1.1242545540444553

关于Pandas:如何在将系列分配为行时将缺失的列添加到 DataFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43084398/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com