python - 从具有可变列数的 ASCII 文件中读取浮点值-6ren

python - 从具有可变列数的 ASCII 文件中读取浮点值

转载作者：行者123 更新时间：2023-11-30 23:22:56

26

4

我有包含 float 的 ASCII 文件。大多数行有 10 列，但有些行的列数较少。一个例子是这样的:

* lat =   33.2813
  19.61  19.92  21.82  21.94  22.77  25.81  29.48  29.86  29.92  28.98
  27.94  25.78  23.68  23.37
* lat =   33.3438
  20.16  23.62  27.73  31.12  33.06  34.01  35.78  37.03  37.79  35.74
  34.12  31.83  33.98  28.57
* lat =   33.4063
  28.26  30.04  35.00  37.92  41.50  44.55  45.44  46.74  46.74  43.47
  37.67  35.67  35.67  31.64
* lat =   33.4688
  34.02  36.07  38.95  44.24  46.49  47.98  50.62  51.95  51.95  51.95
  48.31  41.03  38.01  34.58
* lat =   33.5313
  36.94  37.12  44.04  48.41  51.70  52.71  54.18  55.71  56.98  62.10
  57.26  49.05  44.18  41.50

以*开头的行是注释。

如何使用 numpy 有效地读取该文件？ (这是一个玩具示例；我的实际数据文件中有 >> 1E6 值)。 numpy 函数 loadtxt/genfromtxt 似乎无法处理可变的列数:

   In [25]: np.loadtxt(fn, comments="*", dtype=float)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-2419eebb6114> in <module>()
----> 1 np.loadtxt(fn, comments="*", dtype=float)

/usr/lib/pymodules/python2.7/numpy/lib/npyio.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin)
    833             fh.close()
    834 
--> 835     X = np.array(X, dtype)
    836     # Multicolumn data are returned with shape (1, N, M), i.e.
    837     # (1, 1, M) for a single row - remove the singleton dimension there

ValueError: setting an array element with a sequence.

genfromtxt 更详细，但也不起作用:

    In [27]: np.genfromtxt(fn, comments="*", dtype=float)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-27-6c6e8879e4b9> in <module>()
----> 1 np.genfromtxt(fn, comments="*", dtype=float)

/usr/lib/pymodules/python2.7/numpy/lib/npyio.pyc in genfromtxt(fname, dtype, comments, delimiter, skiprows, skip_header, skip_footer, converters, missing, missing_values, filling_values, usecols, names, excludelist, deletechars, replace_space, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, invalid_raise)
   1636             # Raise an exception ?
   1637             if invalid_raise:
-> 1638                 raise ValueError(errmsg)
   1639             # Issue a warning ?
   1640             else:

ValueError: Some errors were detected !
    Line #2 (got 4 columns instead of 10)
    Line #5 (got 4 columns instead of 10)
    Line #8 (got 4 columns instead of 10)
    Line #11 (got 4 columns instead of 10)
    Line #14 (got 4 columns instead of 10)
    Line #17 (got 4 columns instead of 10)
    Line #20 (got 4 columns instead of 10)
    Line #23 (got 4 columns instead of 10)
    Line #26 (got 4 columns instead of 10)
    Line #29 (got 4 columns instead of 10)

似乎有一个 kwarg invalid_raise，但将其设置为 False 会导致值少于 10 个的行被忽略。

如果您能帮助解决此问题，我将不胜感激。我很乐意在 Cython 中编写自己的文件解析器，但实际上无法找到有关 Cython 中高效字符串->浮点转换的信息...

最佳答案

这是一种使用 pandas 的方法解析器。如果您只想要 numpy 数组，请使用 df.values

In [239]: import pandas as pd

In [240]: df = pd.read_csv('input.txt', header=None, skiprows=1, delim_whitespace=True)

In [242]: df = df[df[0] != '*']  #filter out comment rows

In [245]: df = df.convert_objects(convert_numeric=True)

In [246]: df
Out[246]: 
        0      1      2      3      4      5      6      7      8      9
0   19.61  19.92  21.82  21.94  22.77  25.81  29.48  29.86  29.92  28.98
1   27.94  25.78  23.68  23.37    NaN    NaN    NaN    NaN    NaN    NaN
3   20.16  23.62  27.73  31.12  33.06  34.01  35.78  37.03  37.79  35.74
4   34.12  31.83  33.98  28.57    NaN    NaN    NaN    NaN    NaN    NaN
6   28.26  30.04  35.00  37.92  41.50  44.55  45.44  46.74  46.74  43.47
7   37.67  35.67  35.67  31.64    NaN    NaN    NaN    NaN    NaN    NaN
9   34.02  36.07  38.95  44.24  46.49  47.98  50.62  51.95  51.95  51.95
10  48.31  41.03  38.01  34.58    NaN    NaN    NaN    NaN    NaN    NaN
12  36.94  37.12  44.04  48.41  51.70  52.71  54.18  55.71  56.98  62.10
13  57.26  49.05  44.18  41.50    NaN    NaN    NaN    NaN    NaN    NaN

关于python - 从具有可变列数的 ASCII 文件中读取浮点值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24223682/

26

4

0

文章推荐： Mysql - 微调 HAVING

文章推荐： c# - 如何从 Web API 接收这个 Json？

文章推荐： c# - 在 Linq 表达式中使用方法

f# - 类型定义中的“可变”
为什么禁用类型像 type t = A of int | B of string * mutable int 虽然允许此类类型: type t = A of int | B of string * i
Python 可变 NamedTuple
我正在寻找一种类似结构的数据结构，我可以从中创建多个实例并具有某种类型提示而不是不可变的。所以我有这样的东西: class ConnectionConfig(NamedTuple): nam
Swift:间接访问/可变
我需要转到引用的结构: class SearchKnot { var isWord : Bool = false var text : String = "" var to
javascript - 可变/词法环境
如sec 10.4.3中所述当控制进入执行时，执行以下步骤功能对象F（调用者）中包含的功能代码的上下文提供thisArg，而调用方提供argumentsList：如
iphone - 使用事件指示器和标签显示警报(可变)
i make a game that start display Activity indicator And activity indicator bottom display UiLable wi
Scala - 可变(var)方法参数引用
编辑:我在这里不断获得支持。只是为了记录，我认为这不再重要。自从我发布它以来我就不再需要它了。我想在 Scala 中执行以下操作... def save(srcPath: String, destP
hash - 可变 HashMap 键是一种危险的做法吗？
使用可变对象作为 Hashmap 键是一种不好的做法吗？当您尝试使用已修改足以更改其哈希码的键从 HashMap 中检索值时，会发生什么？例如，给定 class Key { int a; /
kotlin - Kotlin(可变)列表
如果您在Kotlin中访问List类型的Java值，则将获得(Mutable)List!类型。例如。: Java代码: public class Example { public stati
python - 可变 str 类扩展
我编写了 str 类(内置)的以下扩展，以便执行以下操作:假设我有字符串 "Ciao" ，通过做"Ciao" - "a"我想要的结果是字符串 "Cio" 。这是执行此操作的代码，并且运行良好: cla
hash - 可变 HashMap 键是一种危险的做法吗？
使用可变对象作为 Hashmap 键是一种不好的做法吗？当您尝试使用已修改足以更改其哈希码的键从 HashMap 中检索值时，会发生什么？例如，给定 class Key { int a; /
SQL 数据库规范化和外键实践(可变/空白键？)
我正在为我的公司设计一个数据库来管理商业贷款。每笔贷款都可以有担保人，可以是个人或公司，在借款业务失败时作为财务支持。我有 3 个表:Loan、Person 和 Company，它们存储明显的信息。
c# - 可变 F# 记录的二进制序列化
我使用二进制序列化从 C# 类中保存 F# 记录。一切正常: F#: type GameState = { LevelStatus : LevelStatus
java - 对齐系统输出中的双(可变)列
import javax.swing.JOptionPane; public class HW { public static void main(String[] args) { Strin
c++ - 可变 FlatBuffers ，性能损失？
使用 flatbuffer mutable 有多少性能损失？是否“正确”使用 FlatBuffers 来拥有一个应该可编辑的对象/结构(即游戏状态) 在我的示例中，我现在有以下类: class Ga
c++ - 可变 lambda 是否有自己的捕获值拷贝？
std::function create_function (args...) { int x = initial_value (args...); return [x] () mut
C++ - 如何在字符串中查找(可变)字符？
我需要在 for 循环中找到用户输入的字符。我通常会这样做如果(句子[i] == 'e') 但是因为在这里，'e' 将是一个单字母字符变量，我不知道如何获取要比较的值。我不能只输入 if (sent
rust - 可变 Vector 中引用的生命周期
我有一个这样的算法: let seed: Foo = ... let mut stack: Vec = Vec::new(); stack.push(&seed); while let Some(ne
for-loop - 如何循环特定(可变)次数？
这个问题可能看起来非常基础，但我很难弄清楚如何做。我有一个整数，我需要使用 for 循环来循环整数次。首先，我尝试了—— fn main() { let number = 10; // An
rust - 如何解构元组以使绑定(bind)可变？
如果我有以下结构: struct MyStruct { tuple: (i32, i32) }; 以及以下函数: // This will not compile fn function(&mut s
mysql - 可变 SQL 列默认值
我希望在每个 session 的基础上指定列的默认值。下面的脚本不起作用，但描述了我想如何使用它。我目前使用的是 MySQL 5.5.28，但如果需要可以升级。 CREATE TABLE my_tbl

首页

博学

6Ren·AI

商城

python - 从具有可变列数的 ASCII 文件中读取浮点值