gpt4 book ai didi

Python - pandas xls 导入 - 删除某些行时遇到困难 +

转载 作者:太空宇宙 更新时间:2023-11-03 14:08:29 27 4
gpt4 key购买 nike

[miniconda、python 3]

要下载的我的数据.xls:(密码:stack) Download .xls

0)您可以注意到我的 xls 文件在第一行中有很大的合并单元格,在第 2 行和第 3 行中也有一些合并单元格。这是一个问题吗?如果这是一个问题 - 我可以以某种方式取消它们的合并吗?

1)我想删除此 xls 的第一行,因为对我来说没有重要信息。我猜问题是该行被合并了?我想使用 df = df.drop([0]) 来实现这一点,但它不是删除这个巨大的第一行,而是删除带有列标题的行(以“ID klienta”开头)。这是为什么?

2)在我删除第一行之后,我喜欢处理来自各个列的一些数字(在我的示例中,我想将数据与“Stav”列分开)。我怎么做?我在某处看到可以仅通过标题名称(字符串)来索引行/列。例如,我想使用以下方法将数据与标题为“Stav”的列分开:Stav = df['Stav']

到目前为止我的代码是:

import pandas as pd
import numpy as np

print("\n\n*********************************************")
print("My xls processing script\n")
print("*********************************************\n")

#load data
df = pd.read_excel("file.xls")

#My unsucessful attempt to get rid of first row
#uncomment this and it will remove the second row instead of the first row
#df = df.drop([0])

#print preview of 6 rows 5 columnts
print(df.iloc[0:5, 0:4])
print("\n\n")

#My unsuccessful attempt to get column date with header 'ID'
Stav = df['Stav']
print(Stav)

控制台输出:

(xls_env) C:\Users\Slavek\Documents\PythonScripts>python xld_proj.py

*********************************************
My xls processing script

*********************************************

Lidé, které jsem podpořil Unnamed: 1 Unnamed: 2 Unnamed: 3
0 ID klienta Název Stav ID příběhu
1 NaN NaN NaN NaN
2 zonky214882 Jeep na cestě 181187
3 zonky235862 Notebook k práci i relaxu na cestě 206317
4 zonky230378 Dětský pokoj v pořádku 199686



Traceback (most recent call last):
File "C:\miniconda\envs\xls_env\lib\site-packages\pandas\core\indexes\base.py", line 2525, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Stav'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "xld_proj.py", line 20, in <module>
Stav = df['Stav']
File "C:\miniconda\envs\xls_env\lib\site-packages\pandas\core\frame.py", line 2139, in __getitem__
return self._getitem_column(key)
File "C:\miniconda\envs\xls_env\lib\site-packages\pandas\core\frame.py", line 2146, in _getitem_column
return self._get_item_cache(key)
File "C:\miniconda\envs\xls_env\lib\site-packages\pandas\core\generic.py", line 1842, in _get_item_cache
values = self._data.get(item)
File "C:\miniconda\envs\xls_env\lib\site-packages\pandas\core\internals.py", line 3843, in get
loc = self.items.get_loc(item)
File "C:\miniconda\envs\xls_env\lib\site-packages\pandas\core\indexes\base.py", line 2527, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Stav'

最佳答案

我认为您想要读入标题功能选项

df = pd.read_excel("file.xls", header =[0,1,2])

然后您可以删除不需要的 header :

 df.columns = df.columns.droplevel([0,1])

或者类似的东西。该表有点困惑,因为变量名称分散在两个子标题中。我会把它清理干净,这样它们就都在同一条线上。

或者保留所有标题并在此处查看: How do I change or access pandas MultiIndex column headers?

关于Python - pandas xls 导入 - 删除某些行时遇到困难 +,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48694286/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com