gpt4 book ai didi

python - 汇总数据框中的事务链,按列值链接行

转载 作者:行者123 更新时间:2023-12-04 21:01:47 25 4
gpt4 key购买 nike

我正在尝试从 DataFrame 链接多行,以便通过将接收者 id 连接到发送者 id 来获得所有可能的路径。

这是我的 DataFrame 的示例:

   transaction_id sender_id receiver_id  amount
0 213234 002 125 10
1 223322 017 354 90
2 343443 125 689 70
3 324433 689 233 5
4 328909 354 456 10

创建于:
df = pd.DataFrame(
{'transaction_id': {0: '213234', 1: '223322', 2: '343443', 3: '324433', 4: '328909'},
'sender_id': {0: '002', 1: '017', 2: '125', 3: '689', 4: '354'},
'receiver_id': {0: '125', 1: '354', 2: '689', 3: '233', 4: '456'},
'amount': {0: 10, 1: 90, 2: 70, 3: 5, 4: 10}}
)

我的代码的结果应该是链式 ID 列表和交易链的总金额。对于上面示例中的前两行,类似于:
[('002', '125', '689', '233'), 85]
[('017', '354', '456'), 100]

我已经尝试遍历行并将每一行转换为 Node 类的实例,然后使用方法遍历链表,但我不知道下一步是什么:
class Node:
def __init__(self,transaction_id,sender,receiver,amount):
self.transac = transaction_id
self.val = sender_id
self.next = receiver_id
self.amount = amount
def traverse(self):
node = self # start from the head node
while node != None:
print (node.val) # access the node value
node = node.next # move on to the next node

for index, row in customerTransactionSqlDf3.iterrows():
index = Node(
row["transaction_id"],
row["sender_id"],
row["receiver_id"],
row["amount"]
)

附加信息:
  • sender_id 值是唯一的,对于每个发送者 id,只有一个可能的交易链。
  • 没有循环,也没有接收者 id 指向同一路径中的发送者 id 的链。
  • 最佳答案

    您有一个有向图,边由 id -> id 连接形成。您正在尝试通过此图枚举所有路径。通过不使用链表,这实际上要容易得多。

    请注意,您的链表实现实际上并未链接节点;您的 next 值必须引用其他 Node 实例,而不是 id

    因为您的路径不能有循环,所以该图被称为无环图。您的路径也非常简单,正如您所说,每个发送者 id 永远不会超过一个接收者 id。

    使用 sender_id 作为索引,以及接收者 id 和数量列,在您的数据帧中创建一个新 View ;这将使查找下一个路径元素变得非常容易。然后,您可以遍历这些列并遍历它们的路径并简单地求和它们的数量。以下代码使用已经找到的路径来避免再次遍历这些路径:

    # receiver and amount rows, indexed by sender
    edges = df[['sender_id', 'receiver_id', 'amount']].set_index('sender_id')
    paths = {} # sender -> [sender, receiver, receiver, receiver, ...]
    totals = {} # sender -> total amount

    for sender, next_, amount in edges.itertuples():
    path = paths[sender] = [sender, next_]
    totals[sender] = amount
    while True:
    if next_ in paths:
    # re-use already found path
    path += paths[next_]
    totals[sender] += totals[next_]
    break

    try:
    next_, amount = edges.loc[next_]
    except KeyError:
    break # path complete

    path.append(next_)
    totals[sender] += amount

    通过更新遇到的每个子路径,仍然可以使代码更高效,因此当您处理发件人 id 125 的第三行时,您已经处理了该路径,因为您必须遍历它以获取从第一行 002 开始的路径:
    for sender, next_, amount in edges.itertuples():
    if sender in paths:
    # already handled as part of a longer path
    continue

    paths[sender], totals[sender] = [sender, next_], amount
    senders = [sender] # all sender ids along the path

    while True:
    if next_ in paths:
    # re-use already found path
    for sender in senders:
    paths[sender] += paths[next_]
    totals[sender] += totals[next_]
    break

    if next_ not in edges.index:
    break # path complete

    # start a new path from this sender id
    paths[next_], totals[next_] = [next_], 0
    senders.append(next_)

    next_, amount = edges.loc[next_]
    for sender in senders:
    paths[sender].append(next_)
    totals[sender] += amount

    无论哪种方式,您现在都拥有为所有交易计算出的完整路径和总数。您可以将它们转回附加列:
    df['path'], df['total'] = df.sender_id.map(paths), df.sender_id.map(totals)

    对于您的输入数据框,它会产生:

       transaction_id sender_id receiver_id  amount                  path  total
    0 213234 002 125 10 [002, 125, 689, 233] 85
    1 223322 017 354 90 [017, 354, 456] 100
    2 343443 125 689 70 [125, 689, 233] 75
    3 324433 689 233 5 [689, 233] 5
    4 328909 354 456 10 [354, 456] 10

    或者,您可以通过遍历任一字典来配对路径和总数:
    for id, path in paths.items():
    print(id, path, totals[id])

    对于您的具体示例,它会产生:

    002 ['002', '125', '689', '233'] 85
    125 ['125', '689', '233'] 75
    689 ['689', '233'] 5
    017 ['017', '354', '456'] 100
    354 ['354', '456'] 10

    关于python - 汇总数据框中的事务链,按列值链接行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58394975/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com