gpt4 book ai didi

python - 根据名称 pandas python 对某些列进行乘法和求和

转载 作者:太空宇宙 更新时间:2023-11-03 13:28:59 26 4
gpt4 key购买 nike

我有一个小样本数据集:

import pandas as pd
d = {
'measure1_x': [10,12,20,30,21],
'measure2_x':[11,12,10,3,3],
'measure3_x':[10,0,12,1,1],
'measure1_y': [1,2,2,3,1],
'measure2_y':[1,1,1,3,3],
'measure3_y':[1,0,2,1,1]
}
df = pd.DataFrame(d)
df = df.reindex_axis([
'measure1_x','measure2_x', 'measure3_x','measure1_y','measure2_y','measure3_y'
], axis=1)

看起来像:

      measure1_x  measure2_x  measure3_x  measure1_y  measure2_y  measure3_y
10 11 10 1 1 1
12 12 0 2 1 0
20 10 12 2 1 2
30 3 1 3 3 1
21 3 1 1 3 1

我创建的列名几乎相同,除了“_x”和“_y”以帮助确定应该乘以哪对:我想在忽略“_x”和“_y”时将具有相同列名的对相乘,然后我想对数字求和以获得总数,请记住我的实际数据集很大并且列的顺序不是完美的,因此这种命名是一种识别正确的对乘法的方法:

总计 = measure1_x * measure1_y + measure2_x * measure2_y + measure3_x * measure3_y

如此期望的输出:

measure1_x  measure2_x  measure3_x  measure1_y  measure2_y  measure3_y   total

10 11 10 1 1 1 31
12 12 0 2 1 0 36
20 10 12 2 1 2 74
30 3 1 3 3 1 100
21 3 1 1 3 1 31

我的尝试和思考过程,但在语法方面无法继续:

#first identify the column names that has '_x' and '_y', then identify if 
#the column names are the same after removing '_x' and '_y', if the pair has
#the same name then multiply them, do that for all pairs and sum the results
#up to get the total number

for colname in df.columns:
if "_x".lower() in colname.lower() or "_y".lower() in colname.lower():
if "_x".lower() in colname.lower():
colnamex = colname
if "_y".lower() in colname.lower():
colnamey = colname

#if colnamex[:-2] are the same for colnamex and colnamey then multiply and sum

最佳答案

过滤器 + np.einsum

我想这次我会尝试一些不同的东西——

  • 分别获取_x_y
  • 做一个乘积和。这很容易用 einsum(而且快速)指定。

df = df.sort_index(axis=1) # optional, do this if your columns aren't sorted

i = df.filter(like='_x')
j = df.filter(like='_y')
df['Total'] = np.einsum('ij,ij->i', i, j) # (i.values * j).sum(axis=1)

df
measure1_x measure2_x measure3_x measure1_y measure2_y measure3_y Total
0 10 11 10 1 1 1 31
1 12 12 0 2 1 0 36
2 20 10 12 2 1 2 74
3 30 3 1 3 3 1 100
4 21 3 1 1 3 1 31

一个稍微更健壮的版本,它过滤掉非数字列并预先执行断言——

df = df.sort_index(axis=1).select_dtypes(exclude=[object])
i = df.filter(regex='.*_x')
j = df.filter(regex='.*_y')

assert i.shape == j.shape

df['Total'] = np.einsum('ij,ij->i', i, j)

如果断言失败,假设 1) 您的列是数字,以及 2) x 和 y 列的数量相等,正如您的问题所暗示的那样,不适用于您的实际数据集。

关于python - 根据名称 pandas python 对某些列进行乘法和求和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50377228/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com