python - pandas.read_fwf 忽略提供的数据类型-6ren

python - pandas.read_fwf 忽略提供的数据类型

转载作者：行者123 更新时间：2023-12-01 08:51:18

24

4

我正在从文本文件导入数据框我想指定列的数据类型，但 pandas 似乎忽略了 dtype 输入。

一个工作示例:

from io import StringIO
import pandas as pd

string = 'USAF   WBAN  STATION NAME                  CTRY ST CALL  LAT     LON      ELEV(M) BEGIN    END\n007026 99999 WXPOD 7026                    AF            +00.000 +000.000 +7026.0 20120713 20170822\n007070 99999 WXPOD 7070                    AF            +00.000 +000.000 +7070.0 20140923 20150926'

f = StringIO(string)

df = pd.read_fwf(f,
                 colspecs = [(0,6),
                             (7,12),
                             (13,41),
                             (43,45),
                             (48,50),
                             (51,55),
                             (57,64),
                             (65,73),
                             (74,81),
                             (82,90),
                             (91,101)],
                 dtypes = {'USAF'         : str,
                           'WBAN'         : str,
                           'STATION NAME' : str,
                           'CT'           : str,
                           'ST'           : str,
                           'CALL'         : str,
                           'LAT'          : float,
                           'LON'          : float,
                           'ELEV(M)'      : float,
                           'BEGIN'        : int,
                           'END'          : int,},
                 )
df.dtype

返回

USAF              int64
WBAN              int64
STATION NAME     object
CT               object
ST              float64
CALL            float64
LAT             float64
LON             float64
ELEV(M)         float64
BEGIN             int64
END               int64
dtype: object

为什么会发生这种情况？如何强制第一列为字符串？

最佳答案

使用 read_fwf 进行数据类型转换时存在问题。这是 Pandas 猜测类型并应用。此处明确使用转换器。您必须在 DataFrame 创建期间执行此操作，因为如果之后进行转换，您将丢失前导 0。

string = 'USAF   WBAN  STATION NAME                  CTRY ST CALL  LAT     LON      ELEV(M) BEGIN    END\n007026 99999 WXPOD 7026                    AF            +00.000 +000.000 +7026.0 20120713 20170822\n007070 99999 WXPOD 7070                    AF            +00.000 +000.000 +7070.0 20140923 20150926'

f = StringIO(string)
df = pd.read_fwf(f,
                 colspecs = [(0,6),
                             (7,12),
                             (13,41),
                             (43,45),
                             (48,50),
                             (51,55),
                             (57,64),
                             (65,73),
                             (74,81),
                             (82,90),
                             (91,101)],
                converters = {'USAF':lambda x : str(x),
                              'WBAN':lambda x : str(x),
                              'STATION NAME':lambda x : str(x),
                              'CT':lambda x : str(x),
                              'ST':lambda x : str(x),
                              'CALL':lambda x : str(x)}
                 )
>>> df.dtypes
USAF             object
WBAN             object
STATION NAME     object
CT               object
ST               object
CALL             object
LAT             float64
LON             float64
ELEV(M)         float64
BEGIN             int64
END               int64
dtype: object

关于python - pandas.read_fwf 忽略提供的数据类型，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53105792/

24

4

0

文章推荐： python - 使用应用于 groupby 的函数结果来计算原始 df

文章推荐： java - 检查java中记录的模式

文章推荐： java - 如果在java中第一次失败则执行服务器调用

python - pandas.read_fwf 忽略提供的数据类型
我正在从文本文件导入数据框我想指定列的数据类型，但 pandas 似乎忽略了 dtype 输入。一个工作示例: from io import StringIO import pandas as pd
python - Pandas read_fwf 忽略列
我有一个 .asc 文件，其中每行有 655 个条目，看起来有点像下面这样(注意前导空格) -999 -999 -999 -999 -999 -999 -999 -999 -999 ... -999
python - Pandas read_fwf 特殊字符未正确加载
所以我在 test.txt 中有以下数据: étoufee placing 和下面的代码: import pandas as pd import numpy as np widths = [4,3]
python - Pandas read_fwf 没有加载文件的全部内容
我有一个相当大的固定宽度文件(约 3000 万行，4GB)，当我尝试使用 pandas read_fwf() 创建 DataFrame 时，它只加载了文件的一部分，我很好奇是否有人有过这个解析器没
python - Pandas read_fwf : specify dtype
我正在读取一个巨大的固定宽度文本文件，并将数据导出为.csv.因为 pandas.read_fwf 不允许指定数据类型，所以我想知道还有什么其他方法可以强制列为字符串。这原因是 pandas 将某些列
python - 为什么 pandas.read_fwf 没有按照指示跳过空行？
我正在读取充满缺失数据的固定宽度格式 ( full source file )，因此 pandas.read_fwf 派上用场。标题后面有一个空行，因此我传递了 skip_blank_lines=Tr
python - 如何在 pandas 中使用 read_fwf 跳过空行？
我在 Python pandas 0.19.2 中使用 pandas.read_fwf() 函数读取具有以下内容的文件 fwf.txt: # Column1 Column2 123
python - 在使用 read_fwf() 读取的 Pandas 数据框中查找虚假数据
我正在尝试使用从此处获取的每日数据分析纽约的天气记录:http://cdiac.ornl.gov/epubs/ndp/ushcn/daily_doc.html 我正在加载数据: tf = pandas
python-3.x - 模块 'dask' 没有属性 'read_fwf'
我想使用 dask.read_fwf(file)，但出现错误 AttributeError: module 'dask' has no attribute 'read_fwf' read_csv 和
python - 我需要你关于 python pandas 中 read_fwf 的帮助
文本文件的例子是图片根据文件，在'chapter'之后会改变数据的方向换句话说，阅读方向从水平变为垂直。为了解决这个大问题，我在pandas模块中找到了read_fwf并应用了，但是失败了。 li
python - 使用 pd.read_fwf 读取固定宽度文件时出现 ValueError - 预期字段数与看到的数字不匹配
我当前的代码包含以下内容: columns=[(0,4), (4,8), (8,9), (9,10), (20,22), (23,24)] header=['var1','var2','var3','
python - Pandas 如何使用 read_fwf 读取填充为 0 的数字？
我正在使用 read_fwf 来做显而易见的事情，但 pandas 将从我们使用的数字字符串代码中删除左侧填充的零，并将类型视为 int。我们必须使用的各种代码也是如此，它们采用“xxxx.yyy”格
python read_fwf 错误 : 'dtype is not supported with python-fwf parser'
使用 python 2.7.5 和 pandas 0.12.0，我正在尝试使用“pd.io.parsers.read_fwf()”将固定宽度字体的文本文件导入 DataFrame。我导入的值都是数字，
python - 如果 colspecs 参数不包含第一列，则 Python 中 pandas 中的 read_fwf 不使用注释字符
在使用 Python (3.4.3) 的 pandas (0.18.1) 中使用 read_fwf 函数读取固定宽度文件时，可以使用 comment< 指定注释字符参数。我希望以注释字符开头的所有行都

首页

博学

6Ren·AI

商城

python - pandas.read_fwf 忽略提供的数据类型