gpt4 book ai didi

python - 在数据帧的特定索引处用另一行替换一行并更改单元格值

转载 作者:行者123 更新时间:2023-12-04 09:46:06 25 4
gpt4 key购买 nike

我有一个这样的示例 csv:

                 keys                       key_regex    datatype detailed_datatype precedence  val_regex     val_regex_2  val_regex_3  max_words  alpha_char_check
0 billingAddress original_billing_key_regex alphabetic address primary NaN NaN NaN NaN NaN
1 deliveryAddress original_delivery_key_regex alphabetic address primary NaN NaN NaN NaN NaN
2 notifyParty original_notify_party_regex alphabetic alphabetic primary NaN NaN NaN NaN NaN
3 originAddress original_seller_address_regex alphabetic address primary NaN NaN NaN NaN NaN
4 billingAddressAlt alternative_billing_key_regex alphabetic address tertiary NaN NaN NaN NaN NaN
5 deliveryAddressAlt alternative_delivery_key_regex alphabetic address tertiary NaN NaN NaN 5.0 1.0
6 originAddressAlt alternative_seller_key_regex alphabetic address tertiary NaN sample_val_re1 NaN NaN 0.0

我正在尝试替换 keys 的行列有 value 作为 tertiary_row_replacement_dict 中的键带有 keys 的行列值作为对应的值,然后重命名 precendence来自 'tertiary' 的列值至 'primary' - 同时保持索引位置与以前相同。

预期的输出是这样的:
              keys                       key_regex    datatype detailed_datatype precedence  val_regex     val_regex_2  val_regex_3  max_words  alpha_char_check
0 billingAddress alternative_billing_key_regex alphabetic address primary NaN NaN NaN NaN NaN
1 deliveryAddress alternative_delivery_key_regex alphabetic address primary NaN NaN NaN 5.0 1.0
2 notifyParty original_notify_party_regex alphabetic alphabetic primary NaN NaN NaN NaN NaN
3 originAddress alternative_seller_key_regex alphabetic address primary NaN sample_val_re1 NaN NaN 0.0

有 3 个原始 csvs - 它们中的每一个都很大,有很多类似的情况,即具有主要优先级的键和具有三级优先级的替代键。我有这样的键改变字典:
tertiary_row_replacement_dict = {
"originAddress": "originAddressAlt",
"deliveryAddress": "deliveryAddressAlt",
# "totalAmount": "totalAmountAlt",
"billingAddress": "billingAddressAlt"
....
}

假设该字典的键和相应值始终存在于 csv 中,我有以下代码:
for k, new_k in row_replacement_dict.items():
t2 = df.loc[df['keys']==new_k].index[0]
df.loc[df.loc[df['keys']==k].index[0]] = [i if i!='tertiary' else 'primary' for i in df.loc[t2]]
df = df.replace([new_k, 'tertiary'], [k, 'primary']).drop([t2])

它完成了我正在尝试做的事情。仅在测试 csv 上执行此操作大约需要 0.034 秒,并且可能不是处理这种仅替换行并替换单元格值的情况的最佳或优化方法。是否有任何更快的替代方法,先决条件知识哪些行要替换为哪些行(即使用该字典不是强制性的,我们可以将其用作列表列表的元组列表以进行速度权衡)。

最佳答案

您可以使用 replace用主键和 groupby().first() 替换第三键填写信息:

inverse_dict = {v:k for k,v in tertiary_row_replacement_dict.items()}
(df.groupby(df['keys'].replace(inverse_dict))
.first()
.reset_index(drop=True)
)

输出:
    keys             key_regex                      datatype    detailed_datatype    precedence      val_regex  val_regex_2       val_regex_3    max_words    alpha_char_check
-- --------------- ----------------------------- ---------- ------------------- ------------ ----------- -------------- ------------- ----------- ------------------
0 billingAddress original_billing_key_regex alphabetic address primary nan nan nan nan nan
1 deliveryAddress original_delivery_key_regex alphabetic address primary nan nan nan 5 1
2 notifyParty original_notify_party_regex alphabetic alphabetic primary nan nan nan nan nan
3 originAddress original_seller_address_regex alphabetic address primary nan sample_val_re1 nan nan 0

关于python - 在数据帧的特定索引处用另一行替换一行并更改单元格值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62105742/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com