gpt4 book ai didi

python - 在 Pandas 中组合系列

转载 作者:太空宇宙 更新时间:2023-11-04 03:41:56 26 4
gpt4 key购买 nike

我需要合并多个包含字符串值的 Pandas Series。该系列是多个验证步骤产生的消息。我尝试将这些消息组合成 1 个 Series 以将其附加到 DataFrame。问题是结果是空的。

这是一个例子:

import pandas as pd

df = pd.DataFrame({'a': ['a', 'b', 'c', 'd'], 'b': ['aa', 'bb', 'cc', 'dd']})

index1 = df[df['a'] == 'b'].index
index2 = df[df['a'] == 'a'].index

series = df.iloc[index1].apply(lambda x: x['b'] + '-bbb', axis=1)
series += df.iloc[index2].apply(lambda x: x['a'] + '-aaa', axis=1)

print series
# >>> series
# 0 NaN
# 1 NaN

更新

import pandas as pd

df = pd.DataFrame({'a': ['a', 'b', 'c', 'd'], 'b': ['aa', 'bb', 'cc', 'dd']})

index1 = df[df['a'] == 'b'].index
index2 = df[df['a'] == 'a'].index

series1 = df.iloc[index1].apply(lambda x: x['b'] + '-bbb', axis=1)
series2 = df.iloc[index2].apply(lambda x: x['a'] + '-aaa', axis=1)
series3 = df.iloc[index2].apply(lambda x: x['a'] + '-ccc', axis=1)

# series3 causes a ValueError: cannot reindex from a duplicate axis
series = pd.concat([series1, series2, series3])
df['series'] = series
print df

更新2

在此示例中,索引似乎混淆了。

import pandas as pd

df = pd.DataFrame({'a': ['a', 'b', 'c', 'd'], 'b': ['aa', 'bb', 'cc', 'dd']})

index1 = df[df['a'] == 'a'].index
index2 = df[df['a'] == 'b'].index
index3 = df[df['a'] == 'c'].index

series1 = df.iloc[index1].apply(lambda x: x['a'] + '-aaa', axis=1)
series2 = df.iloc[index2].apply(lambda x: x['a'] + '-bbb', axis=1)
series3 = df.iloc[index3].apply(lambda x: x['a'] + '-ccc', axis=1)

print series1
print
print series2
print
print series3
print

df['series'] = pd.concat([series1, series2, series3], ignore_index=True)
print df
print

df['series'] = pd.concat([series2, series1, series3], ignore_index=True)
print df
print

df['series'] = pd.concat([series3, series2, series1], ignore_index=True)
print df
print

这导致了这个输出:

0    a-aaa
dtype: object

1 b-bbb
dtype: object

2 c-ccc
dtype: object

a b series
0 a aa a-aaa
1 b bb b-bbb
2 c cc c-ccc
3 d dd NaN

a b series
0 a aa b-bbb
1 b bb a-aaa
2 c cc c-ccc
3 d dd NaN

a b series
0 a aa c-ccc
1 b bb b-bbb
2 c cc a-aaa
3 d dd NaN

我希望第 0 行只有 a,第 1 行只有 b,第 2 行只有 c,但事实并非如此......

更新 3

这里有一个更好的例子,它应该展示预期的行为。正如我所说,用例是对于给定的 DataFrame,函数计算每一行并可能为某些行返回错误消息作为 Series(某些索引有的有,有的没有;如果没有错误返回,则错误系列为空)。

In [12]:

s1 = pd.Series(['b', 'd'], index=[1, 3])
s2 = pd.Series(['a', 'b'], index=[0, 1])
s3 = pd.Series(['c', 'e'], index=[2, 4])
s4 = pd.Series([], index=[])
pd.concat([s1, s2, s3, s4]).sort_index()

# I'd like to get:
#
# 0 a
# 1 b b
# 2 c
# 3 d
# 4 e
Out[12]:
0 a
1 b
1 b
2 c
3 d
4 e
dtype: object

最佳答案

当连接默认是使用现有的索引,但是如果它们发生冲突,那么这将引发一个 ValueError 正如您所发现的,因此您需要设置 ignore_index=True:

In [33]:

series = pd.concat([series1, series2, series3], ignore_index=True)
df['series'] = series
print (df)
a b series
0 a aa bb-bbb
1 b bb a-aaa
2 c cc a-ccc
3 d dd NaN

编辑

我想我现在知道你想要什么了,你可以通过将系列转换为数据框然后使用索引合并来实现你想要的:

In [96]:

df = pd.DataFrame({'a': ['a', 'b', 'c', 'd'], 'b': ['aa', 'bb', 'cc', 'dd']})

index1 = df[df['a'] == 'b'].index
index2 = df[df['a'] == 'a'].index

series1 = df.iloc[index1].apply(lambda x: x['b'] + '-bbb', axis=1)
series2 = df.iloc[index2].apply(lambda x: x['a'] + '-aaa', axis=1)
series3 = df.iloc[index2].apply(lambda x: x['a'] + '-ccc', axis=1)
# we now don't ignore the index in order to preserve the identity of the row we want to merge back to later
series = pd.concat([series1, series2, series3])
# construct a dataframe from the series and give the column a name
df1 = pd.DataFrame({'series':series})
# perform an outer merge on both df's indices
df.merge(df1, left_index=True, right_index=True, how='outer')

Out[96]:
a b series
0 a aa a-aaa
0 a aa a-ccc
1 b bb bb-bbb
2 c cc NaN
3 d dd NaN

关于python - 在 Pandas 中组合系列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25973514/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com