gpt4 book ai didi

python - 在 Pandas 中匹配来自excel的索引函数

转载 作者:太空宇宙 更新时间:2023-11-04 09:25:20 25 4
gpt4 key购买 nike

Excel 中有一个匹配索引函数,如果元素出现在所需的列中,我会使用它来匹配

=iferror(INDEX($B$2:$F$8,MATCH($J4,$B$2:$B$8,0),MATCH(K$3,$B$1:$F$1,0)),0)

这是我现在正在使用的函数,它产生了很好的结果,但我想在 python 中实现它。

brand   N   Z   None
Honor 63 96 190
Tecno 0 695 763

我要这张 table

  brand L   N   Z
Honor 0 63 96
Tecno 0 0 695

它应该比较列和索引并给出适当的值

我已经尝试过 python 中的查找函数,但这给了我

ValueError: Row labels must have same size as column labels

最佳答案

基本上,您使用 Excel 公式所做的就是创建数据透视表之类的东西,您也可以使用 Pandas 来完成。例如。像这样:

# Define the columns and brands, you like to have in your result table
# along with the dataframe in variable df it's the only input
columns_query=['L', 'N', 'Z']
brands_query=['Honor', 'Tecno', 'Bar']

# no begin processing by selecting the columns
# which should be shown and are actually present
# add the brand, even if it was not selected
columns_present= {col for col in set(columns_query) if col in df.columns}
columns_present.add('brand')
# select the brands in question and take the
# info in columns we identified for these brands
# from this generate a "flat" list-like data
# structure using melt
# it contains records containing
# (brand, column-name and cell-value)
flat= df.loc[df['brand'].isin(brands_query), columns_present].melt(id_vars='brand')

# if you also want to see the columns and brands,
# for which you have no data in your original df
# you can use the following lines (if you don't
# need them, just skip the following lines until
# the next comment)
# the code just generates data points for the
# columns and rows, which would otherwise not be
# displayed and fills them wit NaN (the pandas
# equivalent for None)
columns_missing= set(columns_query).difference(columns_present)
brands_missing= set(brands_query).difference(df['brand'].unique())
num_dummies= max(len(brands_missing), len(columns_missing))
dummy_records= {
'brand': list(brands_missing) + [brands_query[0]] * (num_dummies - len(brands_missing)),
'variable': list(columns_missing) + [columns_query[0]] * (num_dummies - len(columns_missing)),
'value': [np.NaN] * num_dummies
}
dummy_records= pd.DataFrame(dummy_records)
flat= pd.concat([flat, dummy_records], axis='index', ignore_index=True)

# we get the result by the following line:
flat.set_index(['brand', 'variable']).unstack(level=-1)

对于我的测试数据,输出:

         value             
variable L N Z
brand
Bar NaN NaN NaN
Honor NaN 63.0 96.0
Tecno NaN 0.0 695.0

测试数据是(注意,上面我们没有看到 col None 和行 Foo,但是我们看到行 Bar 和列 L,它们实际上不存在于测试数据中,而是被“查询”):

   brand   N    Z  None
0 Honor 63 96 190
1 Tecno 0 695 763
2 Foo 8 111 231

您可以使用以下方法生成此测试数据:

import pandas as pd
import numpy as np
import io

raw=\
"""brand N Z None
Honor 63 96 190
Tecno 0 695 763
Foo 8 111 231"""

df= pd.read_csv(io.StringIO(raw), sep='\s+')

注意:输出中显示的结果是一个常规的 pandas 数据框。因此,如果您打算将数据写回 Excel 工作表,应该没有问题(pandas 提供了从 excel 文件读取/写入数据帧的方法)。

关于python - 在 Pandas 中匹配来自excel的索引函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57942068/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com