gpt4 book ai didi

python - df.join() : ValueError: You are trying to merge on object and int64 columns 出现问题

转载 作者:行者123 更新时间:2023-12-03 15:09:24 35 4
gpt4 key购买 nike

None of these questions adress the issue: Question 1 and Question 2 nor could I find the answer in pandas documentation.



您好,我正在尝试查找此错误的根本原因:
ValueError: You are trying to merge on object and int64 columns.

我知道我可以使用 Pandas 解决这个问题 concatmerge函数,但我试图了解错误的原因。问题是:为什么我会得到这个 ValueError ?

这是 head(5) 的输出和 info()在使用的两个数据帧上。
print(the_big_df.head(5))输出:
  account  apt  apt_p  balance       date  day    flag  month  reps     reqid  year
0 AA0420 0 0.0 -578.30 2019-03-01 1 1 3 10 82f2d761 2019
1 AA0420 0 0.1 -578.30 2019-03-02 2 1 3 10 82f2d761 2019
2 AA0420 0 0.1 -578.30 2019-03-03 3 1 3 10 82f2d761 2019
3 AA0421 0 0.1 -607.30 2019-03-04 4 1 3 10 82f2d761 2019
4 AA0421 0 0.1 -610.21 2019-03-05 5 1 3 10 82f2d761 2019
print(the_big_df.info())输出:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36054 entries, 0 to 36053
Data columns (total 11 columns):
account 36054 non-null object
apt 36054 non-null int64
apt_p 36054 non-null float64
balance 36054 non-null float64
date 36054 non-null datetime64[ns]
day 36054 non-null int64
flag 36054 non-null int64
month 36054 non-null int64
reps 36054 non-null int32
reqid 36054 non-null object
year 36054 non-null int64
dtypes: datetime64[ns](1), float64(2), int32(1), int64(5), object(2)
memory usage: 3.2+ MB

这是我传递给 join() 的数据帧; print(df_to_join.head(5)) :
      reqid     id
0 54580f39 13301
1 3ba905c0 77114
2 5f2d80da 13302
3 a1478e98 77115
4 9b09854b 78598
print(df_to_join.info())输出:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14332 entries, 0 to 14331
Data columns (total 2 columns):
reqid 14332 non-null object
dni 14332 non-null object

上述 4 次打印后的确切下一行是:
the_max_df = the_big_df.join(df_to_join,on='reqid')

输出是,如上所述:
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

为什么会发生这种情况,之前已经明确说明该栏 reqid是两个数据帧中的对象吗?谢谢。

最佳答案

这里的问题是 对连接工作方式的误解 : 当你说 the_big_df.join(df_to_join,on='reqid')这并不意味着加入 the_big_df.reqid == df_to_join.reqid乍一看会假设,而是加入 the_big_df.reqid == df_to_join.index .如 requid类型为 object并且索引的类型是 int64你得到错误。

docs for join :

Join columns with other DataFrame either on index or on a key column.
...
on : str, list of str, or array-like, optional
Column or index level name(s) in the caller to join on the index in other, otherwise joins index-on-index.



看下面的例子:
df1 = pd.DataFrame({'id1': [1, 2], 'val1': [11,12]})
df2 = pd.DataFrame({'id2': [3, 4], 'val2': [21,22]})
print(df1)
# id1 val1
#0 1 11
#1 2 12
print(df2)
# id2 val2
#0 3 21
#1 4 22

# join on df1.id1 (int64) == df2.index (int64)
print(df1.join(df2, on='id1'))
# id1 val1 id2 val2
#0 1 11 4.0 22.0
#1 2 12 NaN NaN

# now df3 same as df1 but id3 as object:
df3 = pd.DataFrame({'id3': ['1', '2'], 'val1': [11,12]})

# try to join on df3.id3 (object) == df2.index (int64)
df3.join(df2, on='id3')
#ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

请注意:以上内容适用于现代版本的 Pandas 。版本 20.3 给出了以下结果:
>>> df3.join(df2, on='id3')
id3 val1 id2 val2
0 1 11 NaN NaN
1 2 12 NaN NaN

关于python - df.join() : ValueError: You are trying to merge on object and int64 columns 出现问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57795399/

35 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com