gpt4 book ai didi

python - Pandas 数据框 : how to turn one row into separate rows based on labelled column value

转载 作者:太空宇宙 更新时间:2023-11-04 00:10:09 26 4
gpt4 key购买 nike

我正在为外汇新闻分析创建一个基于实体的情绪分类。对于每篇新闻文章,可能会识别出多种货币。但我正在努力解决如何将一行(例如 {'USD':1, "JPY":-1} 根据现有人工标签)变成单独的行。

现在的示例数据框是:

       sentiment                                               text
0 USD:1,CNY:-1 US economy is improving while China is struggling
1 USD:-1, JPY:1 Unemployment is high for US while low for Japan

并且想像这样转换成多行:

  currency sentiment                                               text
0 USD 1 US economy is improving while China is struggling
1 CNY -1 US economy is improving while China is struggling
2 USD -1 Unemployment is high for US while low for Japan
3 JPY 1 Unemployment is high for US while low for Japan

非常感谢您的帮助

最佳答案

您可以在 ,|: 上拆分 sentiment col,然后展开 & stack

然后使用pd.reindex & pd.index.repeat根据拆分的 len 重复 text 列。

# Split the col on both , and : then stack.
s = df['sentiment'].str.split(',|:',expand=True).stack()

# Reindex and repeat cols on len of split and reset index.
df1 = df.reindex(df.index.repeat(df['sentiment'].fillna("").str.split(',').apply(len)))
df1 = df1.reset_index(drop=True)

df1['currency'] = s[::2].reset_index(drop=True)
df1['sentiment'] = s[1::2].reset_index(drop=True)

print (df1.sort_index(axis=1))

输出:

    currency  sentiment              text
0 USD 1 US economy is improving while China is struggling
1 CNY -1 US economy is improving while China is struggling
2 USD -1 Unemployment is high for US while low for Japan
3 JPY 1 Unemployment is high for US while low for Japan

关于python - Pandas 数据框 : how to turn one row into separate rows based on labelled column value,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52800063/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com