gpt4 book ai didi

python - 将包含元组列表的列扩展到当前数据帧

转载 作者:行者123 更新时间:2023-12-01 22:46:40 25 4
gpt4 key购买 nike

我有一个以下格式的数据框:

df = pd.DataFrame({'column_with_tuples': [[('word1', 10), ('word2', 20), ('word3', 30)], [('word4', 40), ('word5', 50), ('word6', 60)]],
'category':['category1','category2']})

我想将元组移动到两个单独的列中并保留类别列,以便能够轻松过滤每个类别最常见的单词。

所以最终结果应该是这样的:

df_new = pd.DataFrame({'word': ['word1','word2', 'word3','word4','word5','word6'],
'frequency': [10, 20, 30, 40, 50, 60],
'category':['category1','category1', 'category1', 'category2', 'category2', 'category2']})

我尝试使用此代码,但结果不是我期望的:

df_tuples = pd.concat([pd.DataFrame(x) for x in df['column_with_tuples']], ignore_index=True)

df_tuples.columns = ['word', 'frequency']

df.drop(['column_with_tuples'], axis=1, inplace=True)

df = pd.concat([df, df_tuples], axis=1)

我希望能得到一些帮助。

最佳答案

您应该使用.explode()方法将 column_with_tuples 列中的元组展开为单独的行。之后,引入 .rename() 方法来更改列的名称,然后将元组解压到单独的列中,并使用 .apply 添加 category 列() 方法。最后用 assign() 方法将 category 列添加到数据帧中。

df_new = df.explode("column_with_tuples")
df_new = df_new.rename(columns={"column_with_tuples": "word"})
df_new[["word", "frequency"]] = df_new["word"].apply(pd.Series)

df_new = df_new.assign(category=df["category"])
df_new = df_new[["word", "frequency", "category"]]
df_new.reset_index(drop=True, inplace=True)
print(df_new)

上述代码的简化版本:

df_new = df.explode("column_with_tuples").rename(columns={"column_with_tuples": "word"})
df_new[["word", "frequency"]] = df_new["word"].apply(pd.Series)
df_new.assign(category=df["category"])

df_new = df_new[["word", "frequency", "category"]].reset_index(drop=True)
print(df_new)

    word  frequency   category
0 word1 10 category1
1 word2 20 category1
2 word3 30 category1
3 word4 40 category2
4 word5 50 category2
5 word6 60 category2

关于python - 将包含元组列表的列扩展到当前数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75170451/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com