gpt4 book ai didi

python-3.x - pandas 中的匹配和条件爆炸

转载 作者:行者123 更新时间:2023-12-02 17:25:27 25 4
gpt4 key购买 nike

我面临一个难题,需要在爆炸之前进行匹配。

我的问题最好用数据来描述。如下所示:

df = pd.DataFrame({
'A': [
[0.05, 0.055, 0.055, 0.06, 0.065, 0.07, 0.075, 0.075, 0.085, 0.09, 1.32],
[0.4, 0.06, 0.06, 0.13, 0.135, 0.145, 0.155, 0.17] ,
[3.81, 0.3, 0.4, 0.425, 0.445, 0.48, 0.51, 0.54, 0.58, 0.62, 0.66, 0.66, 0.705, 0.53, 0.57, 0.61],
[7.395, 0.075, 0.085, 0.09, 0.095, 0.1, 0.11, 0.12, 0.13, 0.14],
[0.105, 0.11, 0.12, 0.125, 0.135, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.205, 2.21]
],
'B' : [
[0.680, 1.320],
[0.520, 0.130, 0.135, 0.145, 0.155, 0.170],
[8.035, 3.810],
[0.945, 7.395],
[1.790, 2.210]
],
'C' : [
['08/01/91', '08/01/10'],
['09/01/92', '09/01/93', '09/01/94', '09/01/95', '09/01/96', '09/01/10'],
['11/01/91', '11/01/10'],
['09/01/93', '09/01/21'],
['12/01/92', '12/01/10']
]
})
df

A B C
0 [0.05, 0.055, 0.055, 0.06, 0.065, 0.07, 0.075, 0.075, 0.085, 0.09, 1.32] [0.68, 1.32] [08/01/91, 08/01/10]
1 [0.4, 0.06, 0.06, 0.13, 0.135, 0.145, 0.155, 0.17] [0.52, 0.13, 0.135, 0.145, 0.155, 0.17] [09/01/92, 09/01/93, 09/01/94, 09/01/95, 09/01/96, 09/01/10]
2 [3.81, 0.3, 0.4, 0.425, 0.445, 0.48, 0.51, 0.54, 0.58, 0.62, 0.66, 0.66, 0.705, 0.53, 0.57, 0.61] [8.035, 3.81] [11/01/91, 11/01/10]
3 [7.395, 0.075, 0.085, 0.09, 0.095, 0.1, 0.11, 0.12, 0.13, 0.14] [0.945, 7.395] [09/01/93, 09/01/21]
4 [0.105, 0.11, 0.12, 0.125, 0.135, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.205, 2.21] [1.79, 2.21] [12/01/92, 12/01/10]

保证A中列表元素之和等于B中列表元素之和。通常它们是有序的,但也有恢复顺序的情况。

例如第0行的情况,前10个元素的总和为0.68,按顺序匹配为1.32。

但是,第 2 行则相反,因为 3.81 与 B 的最后一个元素匹配。B 和 C 列来自同一数据集,因此应翻转它们以匹配 A 的顺序。

匹配和爆炸后我想要的输出将如下所示:

      A         B        C
0 0.05 0.68 08/01/91
0 0.055 0.68 08/01/91
0 0.055 0.68 08/01/91
0 0.06 0.68 08/01/91
0 0.065 0.68 08/01/91
0 0.07 0.68 08/01/91
0 0.075 0.68 08/01/91
0 0.085 0.68 08/01/91
0 0.09 0.68 08/01/91
0 1.32 1.32 08/01/10
...
2 3.81 3.81 11/01/10
2 0.3 8.035 11/01/91
2 0.4 8.035 11/01/91
2 0.425 8.035 11/01/91
2 0.445 8.035 11/01/91
2 0.48 8.035 11/01/91
2 0.51 8.035 11/01/91
2 0.54 8.035 11/01/91
2 0.58 8.035 11/01/91
2 0.62 8.035 11/01/91
2 0.66 8.035 11/01/91
2 0.66 8.035 11/01/91
2 0.705 8.035 11/01/91
2 0.52 8.035 11/01/91
2 0.57 8.035 11/01/91
2 0.61 8.035 11/01/91

任何想法和方法都深表赞赏。

我发现上面的数据有错误,我已更正。 B & C 他们的列表中总是有确切数量的元素。

第 1 行情况:我想要的输出是:

1   0.4       0.520   09/01/92                   
1 0.06 0.520 09/01/92
1 0.06 0.520 09/01/92
1 0.13 0.130 09/01/93
1 0.135 0.135 09/01/94
1 0.145 0.145 09/01/95
1 0.155 0.155 09/01/96
1 0.17 0.17 09/01/10

最佳答案

以下示例中的基本思想是首先为数据中的每行创建每列中长度相等的“匹配”列表,然后“转置”(此处爆炸并不真正正确)这些列表。这也可以轻松扩展到更多列,如果您需要任何有关概括该功能的帮助,请立即联系我

def match_row(row):
bc_mapping = {b: c for b, c in zip(row['B'], row['C'])}
common_elements = set(row['A']).intersection(set(row['B']))
sum_elements = set(row['B']).difference(common_elements)
assert len(sum_elements) == 1 # Sanity check

common_elements = sorted(common_elements)
sum_element = list(sum_elements)[0]
number_of_free_elements = len(row['A']) - len(common_elements)

return pd.Series({
'A': [element for element in row['A'] if element not in common_elements] + common_elements,
'B': [sum_element] * number_of_free_elements + common_elements,
'C': [bc_mapping[sum_element]] * number_of_free_elements + [bc_mapping[element] for element in common_elements]
})


df = df. \
apply(match_row, axis=1). \
aggregate('sum'). \
apply(pd.Series). \
transpose()

编辑:泛化到多个列:多列的情况并不是那么简单,但以下应该可行:

df['D'] = df['B'].apply(lambda b: [random.randint(0, 10) for _ in b])

def match_row(row, variable_column, reference_column):
fixed_columns = row.index.tolist()
fixed_columns.remove(variable_column)
fixed_columns.remove(reference_column)

variable_elements = row[variable_column]
reference_elements = row[reference_column]
fixed_elements = row[fixed_columns].apply(pd.Series).T.values.tolist()

fixed_elements_mapping = {
reference: fixed_elements
for reference, fixed_elements
in zip(reference_elements, fixed_elements)
}
common_elements = set(variable_elements).intersection(set(reference_elements))
sum_elements = set(reference_elements).difference(common_elements)
assert len(sum_elements) == 1 # Sanity check

common_elements = sorted(common_elements)
sum_element = list(sum_elements)[0]
number_of_free_elements = len(variable_elements) - len(common_elements)

variable_and_reference_result = pd.Series({
variable_column: [element for element in row['A'] if element not in common_elements] + common_elements,
reference_column: [sum_element] * number_of_free_elements + common_elements,
})
fixed_coluumns_result = pd.Series({
column_name: [fixed_elements_mapping[sum_element][i]] * number_of_free_elements +
[fixed_elements_mapping[element][i] for element in common_elements]
for i, column_name
in enumerate(fixed_columns)
})
return pd.concat([variable_and_reference_result, fixed_coluumns_result])


df = df. \
apply(lambda row: match_row(row, 'A', 'B'), axis=1). \
aggregate('sum'). \
apply(pd.Series). \
transpose()

关于python-3.x - pandas 中的匹配和条件爆炸,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60047827/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com