gpt4 book ai didi

python - pandas to_excel() 忽略/允许重复的列名

转载 作者:太空宇宙 更新时间:2023-11-03 12:54:00 26 4
gpt4 key购买 nike

有没有办法在使用pandas的to_excel()函数后忽略重复的列名?

比如说,我有 old_wb.xlsx:

>> df1 = pd.read_excel('wb1.xlsx')
---------------------merged header--------------------
col1 col2 col3 col1 col4 col1 col2 col5
test test test test test test test test

并说我对我的 Excel 文件做了一些处理,例如,删除合并的标题并将其保存到另一个 Excel 文件:

>> df1.to_excel('new_wb.xlsx', 'Sheet1', merged_cells=False, header=None, index=False)

new_wb.xlsx 的列名如下所示:

        col1    col2   col3   col1.1   col4   col1.2   col2.1   col5
test test test test test test test test

它将 .1 添加到重复的列名中,并且它也随着重复的列名的增加而递增。

我尝试在使用 to_excel() 之前重命名列名,但没有成功。似乎重复项的重命名发生在 to_excel() 中。

>> df1.rename(columns=lambda x: x.replace('.1',''))

经过搜索,我发现 to_excel() 的参数是 mangle_dupe_cols=False,不幸的是它返回了:

ValueError: Setting mangle_dupe_cols=False is not supported yet

有关如何在保存“to_excel()”时忽略重复列名的任何帮助

最佳答案

@Ricky Aguilar has a great solution. I took his solution and just made it more dynamic.

现在您可以重命名所有重复的 header ,甚至不知道它们的值是什么

def dataframe_allowing_duplicate_headers():
# To Hold All The Possible Duplicate Tags ['.1', '.2', '.3', ...]
dup_id_range = []

# Load Your Excel File Using Pandas
dataframe = pandas.read_excel("path_to_excel_file", sheet_name="sheetname")

# Generate And Store All The Possible Duplicate Tags ['.1', '.2', '.3', ...]
for count in range(0, len(dataframe.columns)):
dup_id_range.append( '.{}'.format(count) )

# Search And Replace All Duplicate Headers To What It Was Set As Originally
def rename(dataframe, character_number):
duplicate_columns_chars = list(
filter(lambda v: v[(len(v)-character_number):] in dup_id_range,
dataframe.columns))

for duplicate_column in duplicate_columns_chars:
dataframe = dataframe.rename(
columns={duplicate_column:duplicate_column[:-character_number]})
return dataframe


# Replace The Possible Duplicates Respectfully Based On Columns Count
if len(dup_id_range) > 0:
dataframe = rename(dataframe, 2)
if len(dup_id_range) > 9:
dataframe = rename(dataframe, 3)
if len(dup_id_range) > 99:
dataframe = rename(dataframe, 4)
# If You Have More Than A Thousand Columns (lol)
#if len(dup_id_range) > 999:
# dataframe = rename(dataframe, 5)

return dataframe

用法:

# This Dataframe Will Have All Your Headers, Allowing Your Duplicates
my_dataframe = dataframe_allowing_duplicate_headers()

关于python - pandas to_excel() 忽略/允许重复的列名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50353740/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com