gpt4 book ai didi

Python:对交易进行分类的最有效方法

转载 作者:行者123 更新时间:2023-12-01 06:33:05 27 4
gpt4 key购买 nike

我有一大堆交易需要分类。它看起来像这样:

transactions: [
{
"id": "20200117-16045-0",
"date": "2020-01-17",
"creationTime": null,
"text": "SuperB Vesterbro T 74637",
"originalText": "SuperB Vesterbro T 74637",
"details": null,
"category": null,
"amount": {
"value": -160.45,
"currency": "DKK"
},
"balance": {
"value": 12572.68,
"currency": "DKK"
},
"type": "Card",
"state": "Booked"
},
{
"id": "20200117-4800-0",
"date": "2020-01-17",
"creationTime": null,
"text": "Rent 45228",
"originalText": "Rent 45228",
"details": null,
"category": null,
"amount": {
"value": -48.00,
"currency": "DKK"
},
"balance": {
"value": 12733.13,
"currency": "DKK"
},
"type": "Card",
"state": "Booked"
},
{
"id": "20200114-1200-0",
"date": "2020-01-14",
"creationTime": null,
"text": "Superbest 86125",
"originalText": "SUPERBEST 86125",
"details": null,
"category": null,
"amount": {
"value": -12.00,
"currency": "DKK"
},
"balance": {
"value": 12781.13,
"currency": "DKK"
},
"type": "Card",
"state": "Booked"
}
]

我像这样加载了数据:

with open('transactions.json') as transactions:
file = json.load(transactions)

data = json_normalize(file)['transactions'][0]
return pd.DataFrame(data)

到目前为止,我有以下类别,我想按以下方式对交易进行分组:

CATEGORIES = {
'Groceries': ['SuperB', 'Superbest'],
'Housing': ['Insurance', 'Rent']
}

现在我想循环遍历 DataFrame 中的每一行并对每个事务进行分组。我想通过检查 text 是否包含 CATEGORIES 字典中的值之一来做到这一点。

如果是这样,该交易应该被分类为 CATEGORIES 字典的键 - 例如 Groceries

如何最有效地做到这一点?

最佳答案

IIUC,

我们可以从您的字典中创建一个管道分隔列表,并使用 .loc 进行一些分配

print(df)
for k,v in CATEGORIES.items():
pat = '|'.join(v)
df.loc[df['text'].str.contains(pat),'category'] = k
print(df[['text','category']])
text category
0 SuperB Vesterbro T 74637 Groceries
1 Rent 45228 Housing
2 Superbest 86125 Groceries
<小时/>

更有效的解决方案:

我们创建您所有值的单个列表,并使用 str.extract 提取它们,同时重新创建您的字典,因此每个值现在都是我们将映射到您的目标的键数据框。

words = []
mapping_dict = {}
for k,v in CATEGORIES.items():
for item in v:
words.append(item)
mapping_dict[item] = k


ext = df['text'].str.extract(f"({'|'.join(words)})")
df['category'] = ext[0].map(mapping_dict)
print(df)
text category
0 SuperB Vesterbro T 74637 Groceries
1 Rent 45228 Housing
2 Superbest 86125 Groceries

关于Python:对交易进行分类的最有效方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59800959/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com