gpt4 book ai didi

python - 带有重复标题值的 Pandas read_excel

转载 作者:行者123 更新时间:2023-12-04 22:18:10 24 4
gpt4 key购买 nike

我有一张我想读入 Pandas 多索引数据框的 excel 表。复杂之处在于 excel 表包含重复的标题值。阅读 pandas 时,将 .x 添加到第二级标题的末尾而不是第一级。有没有办法必须重命名顶级标题而不是二级标题?
示例 excel 文件:
enter image description here
阅读脚本:

from pathlib import Path
import pandas as pd


def main():
xl_file = Path('.') / 'pandasExample.xlsx'
df = pd.read_excel(xl_file, sheet_name='Sheet1', header=[
0, 1], skiprows=[0])
print(df)


if __name__ == '__main__':
main()
输出:
  Rectangle        Ellipse    Rectangle
Width Height a b Width.1 Height.1 Width.2 Height.2
0 10 20 1 2 20 30 40 50
期望的输出:
  Rectangle        Ellipse    Rectangle.1        Rectangle.2       
Width Height a b Width Height Width Height
0 10 20 1 2 20 30 40 50

最佳答案

这是一个不同的答案,可以产生问题中列出的确切所需输出。

from pathlib import Path
import pandas as pd
from typing import List


def rename_headers(headers: List[str]) -> List[str]:
header_dict = {}
new_headers = []
for header in headers:
header_prefix = header.split('.')[0]
header_occurance = header_dict.get(header_prefix, 0)
if header_occurance > 0:
new_header = header_prefix + f'.{header_occurance}'
else:
new_header = header_prefix
new_headers.append(new_header)
header_occurances[header_prefix] = header_occurance + 1
return new_headers

def main():
xl_file = Path('.') / 'pandasExample.xlsx'

# Read first level headers
header_df = pd.read_excel(xl_file, sheet_name='Sheet1', header=[
0], skiprows=[0], nrows=1)
headers = list(filter(lambda x: not x.startswith(
'Unnamed'), list(header_df.columns)))

# Generate the desired headers
new_headers = rename_headers(headers)

# Read in the full dataframe
df = pd.read_excel(xl_file, sheet_name='Sheet1', header=[
0, 1], skiprows=[0])

# Create a dictionary that identifies the parameters for each unique header
unique_headers = pd.unique(pd.Index(df.columns.get_level_values(0)))
parameters = {}
for header in unique_headers:
parameters[header] = pd.unique(
[column.split('.')[0] for column in df[header].columns])


unstack_df = df.head(1).stack()
# Keep order of the original index after stack
index = df.head(1).unstack().index.get_level_values(1)
unstack_df = unstack_df.reindex(zip([0] * len(index), index))
unstack_df = unstack_df.reset_index()

# Create the new level 0 and level 1 headers
level_0 = []
for header in new_headers:
level_0 += [header] * len(parameters[header.split('.')[0]])
level_1 = [parameter.split('.')[0] for parameter in unstack_df['level_1']]

# Rename level 0 and level 1 columns for the dataframe
df.columns = pd.MultiIndex.from_tuples(zip(level_0, level_1))
print(df)


if __name__ == '__main__':
main()
输出:
  Rectangle        Ellipse    Rectangle.1        Rectangle.2       
Width Height a b Width Height Width Height
0 10 20 1 2 20 30 40 50

关于python - 带有重复标题值的 Pandas read_excel,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66788246/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com