gpt4 book ai didi

python - 为什么通过公共(public)列合并两个 DataFrame 会产生空结果?

转载 作者:行者123 更新时间:2023-11-30 22:37:27 25 4
gpt4 key购买 nike

我正在使用调查中的数据处理两个 DataFrame 对象,但无法正确合并它们。结构如下所示:

In [93]: numeric_answers
Out[93]:
ANSWER_COUNT RESPONSE
1 50 1
2 21 2
4 3 4


In [94]: readable_values
Out[94]:
MEANING
RESPONSE
1 male
2 female
3 transgender
5 non-binary, genderqueer, or gender non-conforming
6 a different identity (please specify)
4 prefer not to disclose
-9 Not answered

我的目标是:

  • 使用 RESPONSE 列合并它们
  • 生成包含 ['RESPONSE', 'MEANING', 'ANSWER_COUNT'] 列的 DataFrame
  • 将缺少的值设置为N/A(尽管 0 也可以)

所需输出的示例:

RESPONSE                                        MEANING  ANSWER_COUNT
1 male 50
2 female 21
3 transgender NaN
5 non-binary, genderqueer, or gender non-conforming NaN
6 a different identity (please specify) NaN
4 prefer not to disclose 3
-9 Not answered NaN

阅读了 merge 的文档后,我得出结论,我需要的是 pd.merge(read_values, numeric_answers),但此操作会产生空结果:

Empty DataFrame
Columns: [RESPONSE, MEANING, ANSWER_COUNT]
Index: []

在尝试了各种参数后,我通过 merge(readed_values, numeric_answers, on='RESPONSE', how='outer') 得到了一些有希望的结果:

(Pdb) pd.merge(readable_values, numeric_answers, on='RESPONSE', how='outer')
RESPONSE MEANING ANSWER_COUNT
0 1.0 male NaN
1 2.0 female NaN
2 3.0 transgender NaN
3 5.0 non-binary, genderqueer, or gender non-conforming NaN
4 6.0 a different identity (please specify) NaN
5 4.0 prefer not to disclose NaN
6 -9.0 Not answered NaN
7 1.0 NaN 50.0
8 2.0 NaN 21.0
9 4.0 NaN 3.0

但是,它通过附加值进行合并,而我需要它使用 RESPONSE 列来相交条目。用 Pandas 实现这一目标的意识形态推荐方法是什么?

最佳答案

read_values 将 RESPONSE 作为索引,而不是作为列。
您可以按以下方式进行合并:

In [11]: numeric_answers.merge(readable_values, left_on='RESPONSE', right_index=True, how='outer')
Out[11]:
ANSWER_COUNT RESPONSE MEANING
1 50.0 1 male
2 21.0 2 female
4 3.0 4 prefer not to disclose
4 NaN 3 transgender
4 NaN 5 non-binary, genderqueer, or gender non-conforming
4 NaN 6 a different identity (please specify)
4 NaN -9 Not answered

另一种方法是先重置read_valuesreset_index:

In [12]: numeric_answers.merge(readable_values.reset_index(), on='RESPONSE', how='outer')
Out[12]:
ANSWER_COUNT RESPONSE MEANING
0 50.0 1 male
1 21.0 2 female
2 3.0 4 prefer not to disclose
3 NaN 3 transgender
4 NaN 5 non-binary, genderqueer, or gender non-conforming
5 NaN 6 a different identity (please specify)
6 NaN -9 Not answered
<小时/>

请注意它们的呈现方式之间的区别:

In [21]: readable_values
Out[21]:
MEANING
RESPONSE
1 male
2 female
3 transgender
5 non-binary, genderqueer, or gender non-conforming
6 a different identity (please specify)
4 prefer not to disclose
-9 Not answered

In [22]: readable_values.reset_index() # RESPONSE is now a column
Out[22]:
RESPONSE MEANING
0 1 male
1 2 female
2 3 transgender
3 5 non-binary, genderqueer, or gender non-conforming
4 6 a different identity (please specify)
5 4 prefer not to disclose
6 -9 Not answered

关于python - 为什么通过公共(public)列合并两个 DataFrame 会产生空结果?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43880848/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com