gpt4 book ai didi

python - Pandas - 从 Excel 中读取表格

转载 作者:太空宇宙 更新时间:2023-11-03 21:21:07 24 4
gpt4 key购买 nike

我正在尝试在 Pandas 中读取 Excel 中的表格。但是,我做不到。

例如,下面的代码适用于我,它从 File.xlsx 文件上的 Sheet1 读取数据

df = pd.read_excel('File.xlsx', sheetname='Sheet1')

但是该表中有一个名为“Table4”的表,我想使用 Python 在 pandas 中读取该表。

The table looks like

表格按行和列增长。只有表的名称不变。 table 一侧的内容也会发生变化。

是否可以使用 Pandas 来完成此操作,或者您能给我建议替代方案吗?

最佳答案

Pandas 目前没有直接读取表格的方法,但下面的函数可以使用 openpyxl 库(这是 pandas 用于读取当前 Excel 文件的库)来实现此目的。

请注意,该技术来 self 未撰写的博客文章 ( found here ),尽管我的代码略有不同。

import pandas as pd
import openpyxl

def read_table(file_name: str, table_name: str) -> pd.DataFrame:
wb = openpyxl.load_workbook(file_name, read_only= False, data_only = True) # openpyxl does not have table info if read_only is True; data_only means any functions will pull the last saved value instead of the formula
for sheetname in wb.sheetnames: # pulls as strings
sheet = wb[sheetname] # get the sheet object instead of string
if table_name in sheet.tables: # tables are stored within sheets, not within the workbook, although table names are unique in a workbook
tbl = sheet.tables[table_name] # get table object instead of string
tbl_range = tbl.ref #something like 'C4:F9'
break # we've got our table, bail from for-loop
data = sheet[tbl_range] # returns a tuple that contains rows, where each row is a tuple containing cells
content = [[cell.value for cell in row] for row in data] # loop through those row/cell tuples
header = content[0] # first row is column headers
rest = content[1:] # every row that isn't the first is data
df = pd.DataFrame(rest, columns = header)
wb.close()
return df

版本:

In [50]: pd.__version__
Out[50]: '1.3.5'

In [51]: openpyxl.__version__
Out[51]: '3.0.9'

关于python - Pandas - 从 Excel 中读取表格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54241345/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com