gpt4 book ai didi

python - Pandas : merge on column of ByteArray

转载 作者:太空宇宙 更新时间:2023-11-04 05:27:33 26 4
gpt4 key购买 nike

关于如何在一个通常命名的 bytearray 字段上连接两个 pandas 数组有什么想法吗?源 (Teradata) 中的字段是一个实际的 ByteArray,并且从 Teradata 端来看,不能将其强制为字符或在 Teradata 之外可用的东西)

Teradata Export 可以完美地读入 Panda 的数组。但是我无法合并两个具有通用命名字段 (DatabaseId) 的表,其中该字段是字节数组。

(将 pandas 导入为 pd 和 itertools)

当我尝试简单合并时:

merge1 = pd.merge(tvm, dbase, on="DatabaseId")

我得到以下错误:

TypeError: type object argument after * must be a sequence, not itertools.imap

我搜索了 StackOverflow 并找到了一个 similar problem for joining on a cell containing a collection

dbase['DBID'] = dbase.DatabaseId.apply(lambda r: type(sorted(r.iteritems())))

但是我得到了错误:

AttributeError: 'bytearray' object has no attribute 'iteritems'

更新

数据示例使用

通过 pandas 收集的数据
dbase = pd.read_sql('select databaseid, databasename from ud812.dbase sample 10', conn)
conn is a connection to a teradata database

来自 Teradata 的数据类型对于所有列都是 Varchar,除了:

DatabaseID = bytearray (Byte(4))
TVMID = bytearray (Byte(4))

>>> dbase.dtypes
DatabaseId object
DatabaseName object
dtype: object
>>> dbase
DatabaseId DatabaseName
0 [2, 0, 243, 185] PCDW_CRS_BBCONV3_TB
1 [2, 0, 168, 114] PAMLIF_TB
2 [2, 0, 133, 153] PADW_PRESN_TB
3 [2, 0, 29, 184] CEDW_MOBILE_TB
4 [2, 0, 190, 183] CEDW_MODEL_SCORE_TB
5 [2, 0, 71, 55] PBBBAM_TB
6 [2, 0, 169, 183] CEDW_OCC_TB
7 [2, 0, 201, 183] CCDW_DGTL_DEAL_TB
8 [0, 0, 139, 8] PRECDSS_TB
9 [2, 0, 142, 203] CDBDW_TB
>>>
>>>
>>> tvm.dtypes
TVMId object
DatabaseId object
TVMName object
TableKind object
CreateText object
dtype: object
>>> tvm
TVMId DatabaseId TVMName \
0 [230, 1, 41, 11, 0, 0] [2, 0, 67, 183] JCP_03538_112002
1 [214, 1, 60, 133, 0, 0] [2, 0, 186, 52] STL_AUTHNCTD_RULE_EXECN
2 [193, 2, 59, 48, 0, 0] [2, 0, 225, 150] uye177_Xsell_EM_OPCL_TB2
3 [0, 2, 235, 154, 0, 0] [2, 0, 244, 181] PL_CALCD_INVSTR_MTHLY_HIST_ST
4 [255, 1, 131, 76, 0, 0] [2, 0, 110, 63] IMH867_AVA0803_SNAP
5 [125, 1, 217, 138, 0, 0] [2, 0, 237, 153] FD_ACCT_STMT_ADR_ST
6 [224, 0, 80, 233, 0, 0] [2, 0, 243, 127] EXP_SRCH_RSLT_DESC
7 [208, 1, 72, 15, 0, 0] [2, 0, 8, 57] SGI_PAY_DENIED_SEP_112012
8 [246, 0, 27, 61, 0, 0] [2, 0, 143, 130] CR_INDIVD
9 [186, 1, 242, 167, 0, 0] [0, 0, 244, 18] wzu448_sb_apps

TableKind CreateText
0 T None
1 V CREATE VIEW ... ... ... ... ... ... ... ... ...
2 T None
3 V CREATE VIEW ... ... ... ... ... ... ... ... ...
4 T None
5 V CREATE VIEW ... ... ... ... ... ... ... ... ...
6 V CREATE VIEW ... ... ... ... ... ... ... ... ...
7 V CREATE VIEW ... ... ... ... ... ... ... ... ...
8 V CREATE VIEW ... ... ... ... ... ... ... ... ...
9 T None

最佳答案

将您的 bytearray 转换为其不可变表亲 bytes

import pandas as pd

# Create your example `dbase`
DatabaseId_dbase = list(map(bytearray, [[2, 0, 243, 185], [2, 0, 168, 114],
[2, 0, 133, 153], [2, 0, 29, 184], [2, 0, 190, 183], [2, 0, 71, 55],
[2, 0, 169, 183], [2, 0, 201, 183], [0, 0, 139, 8], [2, 0, 142, 203]]))
DatabaseName = ['PCDW_CRS_BBCONV3_TB', 'PAMLIF_TB', 'PADW_PRESN_TB',
'CEDW_MOBILE_TB', 'CEDW_MODEL_SCORE_TB', 'PBBBAM_TB', 'CEDW_OCC_TB',
'CCDW_DGTL_DEAL_TB', 'PRECDSS_TB', 'CDBDW_TB']
dbase = pd.DataFrame({'DatabaseId': DatabaseId_dbase,
'DatabaseName': DatabaseName})

# Create your example `tvm`
DatabaseId_tvm = list(map(bytearray, [[2, 0, 67, 183], [2, 0, 186, 52],
[2, 0, 225, 150], [2, 0, 244, 181], [2, 0, 110, 63], [2, 0, 237, 153],
[2, 0, 243, 127], [2, 0, 243, 185], [2, 0, 143, 130], [0, 0, 244, 18]]))
TVMId = list(map(bytearray, [[230, 1, 41, 11, 0, 0], [214, 1, 60, 133, 0, 0],
[193, 2, 59, 48, 0, 0], [0, 2, 235, 154, 0, 0], [255, 1, 131, 76, 0, 0],
[125, 1, 217, 138, 0, 0], [224, 0, 80, 233, 0, 0], [208, 1, 72, 15, 0, 0],
[246, 0, 27, 61, 0, 0], [186, 1, 242, 167, 0, 0]]))
TVMName = ['JCP_03538_112002', 'STL_AUTHNCTD_RULE_EXECN',
'uye177_Xsell_EM_OPCL_TB2', 'PL_CALCD_INVSTR_MTHLY_HIST_ST',
'IMH867_AVA0803_SNAP', 'FD_ACCT_STMT_ADR_ST', 'EXP_SRCH_RSLT_DESC',
'SGI_PAY_DENIED_SEP_112012', 'CR_INDIVD', 'wzu448_sb_apps']
TableKind = ['T', 'V', 'T', 'V', 'T', 'V', 'V', 'V', 'V', 'T']
tvm = pd.DataFrame({'DatabaseId': DatabaseId_tvm, 'TVMId': TVMId,
'TVMName': TVMName, 'TableKind': TableKind})

# This line would fail with the following error
# TypeError: type object argument after * must be a sequence, not map
# merge = pd.merge(tvm, dbase, on='DatabaseId')

# Apply the `bytes` constructor to the `bytearray` columns
dbase['DatabaseId'] = dbase['DatabaseId'].apply(bytes)
tvm['DatabaseId'] = tvm['DatabaseId'].apply(bytes)
tvm['TVMId'] = tvm['TVMId'].apply(bytes)

# Now it works!
merge = pd.merge(tvm, dbase, on='DatabaseId')

生成的合并

   DatabaseId                     TVMId                    TVMName  \
0 b'\x02\x00\xf3\xb9' b'\xd0\x01H\x0f\x00\x00' SGI_PAY_DENIED_SEP_112012

TableKind DatabaseName
0 V PCDW_CRS_BBCONV3_TB

(我必须更改您的 tvm 中其中一行的 DatabaseId 字段,否则 merge 将是空的。我也没有包含 CreateText 列——对 SO 来说太尴尬了)

关于python - Pandas : merge on column of ByteArray,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38245661/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com