python - 如何调试 OverflowError : value too large to convert to int32

python - 如何调试 OverflowError : value too large to convert to int32_t?

转载作者：行者123 更新时间：2023-12-05 04:47:00

48

4

我想做什么

我正在使用 PyArrow读取一些 CSV 并将它们转换为 Parquet。我阅读的一些文件有很多列并且占用大量内存(足以使运行该作业的机器崩溃)，因此我正在分块读取文件。

这就是我用来生成箭头表的函数的样子(为简洁起见的片段):

import pyarrow as pa
import pyarrow.parquet as pq
from pyarrow import csv as arrow_csv

def generate_arrow_tables(
        input_buffer: pa.lib.Buffer,
        arrow_schema: pa.Schema,
        batch_size: int
) -> Generator[pa.Table, None, None]:
    """
    Generates an Arrow Table from given data.
    :param batch_size: Size of batch streamed from CSV at a time
    :param input_buffer: Takes in an Arrow BufferOutputStream
    :param arrow_schema: Takes in an Arrow Schema
    :return: Returns an Arrow Table
    """

    # Preparing convert options
    co = arrow_csv.ConvertOptions(column_types=arrow_schema, strings_can_be_null=True)

    # Preparing read options
    ro = arrow_csv.ReadOptions(block_size=batch_size)
    # Streaming contents of CSV into batches
    with arrow_csv.open_csv(input_buffer, convert_options=co, read_options=ro) as stream_reader:
        for chunk in stream_reader:
            if chunk is None:
                break

            # Emit batches from generator. Arrow schema is inferred unless explicitly specified
            yield pa.Table.from_batches(batches=[chunk], schema=arrow_schema)

这就是我使用该函数将批处理写入 S3 的方式(为简洁起见的片段):

GB = 1024 ** 3
# data.size here is the size of the buffer
arrow_tables: Generator[Table, None, None] = generate_arrow_tables(pg_data, arrow_schema, min(data.size, GB ** 10))
# Iterate through generated tables and write to S3
count = 0
for table in arrow_tables:
    count += 1  # Count based on batch size

    # Write keys to S3
    file_name = f'{ARGS.run_id}-{count}.parquet'
    write_to_s3(table, output_path=f"s3://{bucket}/{bucket_prefix}/{file_name}")

出了什么问题

我收到以下错误 OverflowError: value too large to convert to int32_t 这里是堆栈跟踪(为简洁起见的片段):

[2021-08-04 11:26:45,479] {pod_launcher.py:156} INFO - b'    ro = arrow_csv.ReadOptions(block_size=batch_size)\n'
[2021-08-04 11:26:45,479] {pod_launcher.py:156} INFO - b'  File "pyarrow/_csv.pyx", line 87, in pyarrow._csv.ReadOptions.__init__\n'
[2021-08-04 11:26:45,479] {pod_launcher.py:156} INFO - b'  File "pyarrow/_csv.pyx", line 119, in pyarrow._csv.ReadOptions.block_size.__set__\n'
[2021-08-04 11:26:45,479] {pod_launcher.py:156} INFO - b'OverflowError: value too large to convert to int32_t\n'

如何调试和/或修复此问题？

如果需要，我很乐意提供更多信息

最佳答案

如果我理解正确，generate_arrow_tables 的第三个参数是 batch_size，您将其作为 block_size 传递给 CSV 阅读器。我不确定 data.size 的值是多少，但你用 min(data.size, GB ** 10) 来保护它。

10GB 的 block_size 将不起作用。您收到的错误是 block 大小不适合带符号的 32 位整数(最大 ~2GB)。

除此限制外，我不确定使用比默认值 (1MB) 大得多的 block 大小是否是个好主意。我不认为您会看到很多性能优势，并且您最终会使用比您需要的更多的 RAM。

关于python - 如何调试 OverflowError : value too large to convert to int32_t?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68652157/

48

4

0

文章推荐： laravel - 将 css/js 文件从一个组件推送到另一个组件

文章推荐： javascript - NuxtJS : routes working in dev but not production (netlify)

文章推荐： html - 在 Angular 10 中滑动删除

java - org.springframework.core.convert.ConverterNotFoundException : No converter found capable of converting from type Account to type AllAccount
我正在使用SpringBoot和JPA来调用db，我遇到异常 org.springframework.core.convert.ConverterNotFoundException: No conve
spring - org.springframework.core.convert.converter.Converter 给出错误 : Null can not be a value of a non-null type
我尝试实现 Spring Converter，但在单元测试中出现错误: Kotlin: Null can not be a value of a non-null type TodoItem 如果我尝
spring - org.springframework.core.convert.ConverterNotFoundException : No converter found capable of converting from type to type [java. lang.String] - Redis
我在 Spring Boot 2.0 示例中使用 Spring Data Redis。在此示例中，我尝试将客户数据 + 学生数据保存在一起。我不太确定这里的数据建模是如何发生的，但假设它与 Mongo
java - Spring Webflow 绑定(bind) : Converter - java. lang.IllegalArgumentException : Each converter object must implement one of the Converter . .. 接口(interface)
我在 Spring 的 XML 配置文件之一中有以下代码:
java - Hibernate Converter + 在 Converter 中检索属性名
我们正在尝试使用 hibernate Converter 来加密/解密通过 hibernate 存储的几列数据 @Convert(attributeName="myattr",converter=Da
c# - Convert.TryToInt64 而不是 Convert.ToInt64？
我有this我必须实现的功能: protected override ValidationResult IsValid( Object value, ValidationContext
rust - 我应该什么时候实现 std::convert::From vs std::convert::Into？
我看到了 std::convert::Into有任何实现 std::convert::From 的实现: impl Into for T where U: From, 在Rust 1.0标准库
c# - Convert.ChangeType 或 Convert.ToInt32 之间的主要区别是什么？
Convert.ChangeType 或 Convert.ToInt32 或 int.Parse 之间是否存在性能优势最佳答案如果您知道要将 string 转换为 Int32，使用 Convert
bash - 将通配符与 "convert"一起使用。或者 "convert"ing 一组文件
我会定期浏览我的家庭作业以供上课。我的扫描仪将原始 jpg 文件导出到 USB，然后我可以从那里使用 gimp 编辑文件并将其另存为 pdf。我发现一种节省时间的方法是将我的多页作业导出为 .mng
json - Grails在BootStrap中注册了一个DateMarshaller，引发了异常，即rails.converters.JSON无法转换为grails.converters.XML
Grails版本:2.3.8我在BootStrap.groovy中注册了一个自定义日期编码器，但是当我使用日期填充为Json的Object时，它将引发异常:Exception message is C
bash - 将通配符与 "convert"一起使用。或者 "convert"ing 一组文件
我会定期浏览我的家庭作业以供上课。我的扫描仪将原始 jpg 文件导出到 USB，然后我可以从那里使用 gimp 编辑文件并将其另存为 pdf。我发现一种节省时间的方法是将我的多页作业导出为 .mng
swift - 使用转换(_ :to:) or convert(_:from:) to convert a node's position to another's
我正在尝试制作一个 SKAction，以便我的玩家慢慢地被拉向一个要杀死他的敌人。实际上，问题在于玩家和敌人处于不同的节点，遵循以下层次结构: 场景(SKScene)-PARENT->播放器(SKNo
Spring 数据mongodb : access default POJO converter from within custom converter
我通过 xml 设置了 spring data mongo 自定义转换器，如下所示在自定义读/写转换器中，我想
ruby-on-rails - 我得到这个 "Error while running convert: sh: convert: command not found"
我正在尝试使用名为 Simple Captcha 的 gem 这需要在机器上安装 ImageMagick。我已经安装了它并且 convert --version 显示了这个 Version: Imag
ruby-on-rails - 我得到这个 "Error while running convert: sh: convert: command not found"
我正在尝试使用名为 Simple Captcha 的 gem 这需要在机器上安装 ImageMagick。我已经安装了它并且 convert --version 显示了这个 Version: Imag
java - Spring JPA native 查询调用存储过程给出 “No converter found capable of converting from type”
我正在使用 Spring JPA，我需要有一个 native 查询来调用存储过程。从结果中，我只需要获取两个字段，即代码和消息。我创建了一个包含两个字段代码和消息的类。它不起作用，这是我收到的错误:
java - org.apache.camel.NoTypeConversionAvailableException : No type converter available to convert from type:
我首先有多部分文件，我想将其发送到camel管道并使用原始名称保存该文件。我的代码: @Autowired ProducerTemplate producerTemplate; ...
java - 为什么.NoSuchMethodError : org. springframework.core.convert.converter.ConverterRegistry.addConverter
我的maven项目使用了spring、hibernate。我得到“没有这样的方法错误”。我相信这是由于依赖项中的版本冲突造成的，但不知道是什么。构建成功。但是在“NetBeans:在 GlassFis
java - Vaadin 8 Converter 的行为与 Vaadin 7 Converter 不同(不更新 UI)？
TL;DR:Vaadin 8 中是否有类似于 Vaadin 7 的转换器来更新 UI 中输入字段的表示？ IE。在输入字段失去焦点后立即从用户输入中删除所有非数字，或将小数转换为货币？ Vaadin
c# - 表达式.Convert : Object of type 'System.Int64' cannot be converted to type 'System.Int32'
我昨天问了一个问题here关于从匿名对象读取属性并将它们写入类的私有(private)字段。问题解决了。这是一个小故事: 我有一些 json 格式的数据。我将它们反序列化为 ExpandoObject

首页

博学

6Ren·AI

商城

python - 如何调试 OverflowError : value too large to convert to int32_t?