python - pd.DataFrame.to_sql(method ="multi") GCP Postgres 引发 struct.error 'h' 格式需要 -32768 <= number <= 32767 和用户定义的 dtypes-6ren

python - pd.DataFrame.to_sql(method ="multi") GCP Postgres 引发 struct.error 'h' 格式需要 -32768 <= number <= 32767 和用户定义的 dtypes

转载作者：行者123 更新时间：2023-12-04 18:48:04

在这里发布我的第一个问题 - 请放轻松!
我正在尝试将 Pandas 数据框(3,000,000 x 8)写入 GCP 托管的 Postgres 数据库。我正在使用类似于以下内容的内容来编写我的数据。

from sqlalchemy import Table,MetaData,Column,String,Integer,Float,DateTime,ARRAY,BigInteger
import pandas as pd
import sqlalchemy
from datetime import datetime
from google.cloud.sql.connector import connector
import numpy as np
import random

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path-to-your-keys"
Base = declarative_base()

os.environ['DB_USER'] = "root-user"
os.environ['DB_PROJECTID']  ="project-id-from-GCP"
os.environ["DB_NAME"] = "DB-NAME"
os.environ["DB_PASS"] = "your-password-for-the-GCP-DB"

def getconn():
    conn = connector.connect(
        os.environ["DB_PROJECTID"],
        "pg8000",
        user=os.environ["DB_USER"],
        password=os.environ["DB_PASS"],
        db=os.environ["DB_NAME"],
    )
    return conn

db = sqlalchemy.create_engine(
        "postgresql+pg8000://",
        creator=getconn,
    )

def make_dummy_df():
    rng = np.random.default_rng()
    df = pd.DataFrame(rng.integers(0, 50000, size=(3000000, 1)), columns=['window'])
    df['start'] = list(pd.date_range(start=datetime(2020,1,1),end=datetime.today(),periods=int(df.shape[0])))
    df['end'] = list(pd.date_range(start=datetime(2020,1,1),end=datetime.today(),periods=int(df.shape[0])))
    df['degree'] = [random.randint(0,40) for _ in range(df.shape[0])]
    df['x'] = [random.sample(range(10000, 100000), 10) for _ in range(df.shape[0])]
    df['y'] = [random.sample(range(-100, 100), 10) for _ in range(df.shape[0])]
    df['z'] = [random.sample(range(100, 1000), 10) for _ in range(df.shape[0])]      
    df['index'] = df.index                  
    return df

if __name__=="__main__":
    df = make_dummy_df()
    df.to_sql(
        "test1",
        con=db,
        if_exists="replace",
        index=False,
        method="multi",
        chunksize=10000,
        dtype={
             "index":BigInteger(),
             "window":Integer(),
             "degree":Integer(),
             "start":DateTime(),
             "end":DateTime(),
             "x":ARRAY(Float),
             "y":ARRAY(Float),
             "z":ARRAY(Float)
         })

在 中运行时引发以下错误Linux 环境。 linux 机器是 AWS EC2 Ubuntu Server 20.04 LTS (HVM) 上的虚拟机，SSD 卷类型 c4.8xlarge

Linux ip-xxx-xx-xx-xx A.B.C-D-aws #21~20.04.1-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linu

Traceback (most recent call last):                                                                                                                       
  File "testing.py", line 53, in <module>                                                                               
    df.to_sql(                                                                                                                       
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 2963, in to_sql                     
    return sql.to_sql(                                                         
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pandas/io/sql.py", line 697, in to_sql                          
    return pandas_sql.to_sql(                                                  
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pandas/io/sql.py", line 1739, in to_sql                         
    total_inserted = sql_engine.insert_records(                                
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pandas/io/sql.py", line 1322, in insert_records           
    return table.insert(chunksize=chunksize, method=method)                    
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pandas/io/sql.py", line 950, in insert 
    num_inserted = exec_insert(conn, keys, chunk_iter)                                                                       
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pandas/io/sql.py", line 873, in _execute_insert_multi        
    result = conn.execute(stmt)                                                                                                                       
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1295, in execute        
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)                                                                               
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection       
    return connection._execute_clauseelement(                                                                               
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1487, in _execute_clauseelement       
    ret = self._execute_context(                                                                                                                       
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1851, in _execute_context       
    self._handle_dbapi_exception(                                                                                                                       
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2036, in _handle_dbapi_exception       
    util.raise_(exc_info[1], with_traceback=exc_info[2])
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1808, in _execute_context
    self.dialect.do_execute(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
    cursor.execute(statement, parameters)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pg8000/dbapi.py", line 455, in execute
    self._context = self._c.execute_unnamed(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pg8000/core.py", line 627, in execute_unnamed
    self.send_PARSE(NULL_BYTE, statement, oids)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pg8000/core.py", line 601, in send_PARSE
    val.extend(h_pack(len(oids)))
struct.error: 'h' format requires -32768 <= number <= 32767

以下是模块依赖版本:

Numpy:                        1.22.3
Pandas:                       1.4.1
SqlAlchemy:                   1.4.32
cloud-sql-python-connector:   0.5.2

这个问题特别与 GCP + SqlAlchemy + df.to_sql(method="multi") 中的 Postgres 相关。如果解决了问题，字段的 dtypes 可以改变。但是 df 中的数组必须作为 ARRAY 写入数据库。
我目前已经测试了使用以下方法将 DataFrame 分 block 成更小的尺寸:

n = int(round(df.shape[0]/20,0))
chunks = [df[i:i+n] for i in range(0,df.shape[0],n)]

然后迭代 block 。我还尝试从 DataFrame 中删除单个列并写入 DB 以尝试确定是否是一列导致问题 - 不走运。我已经制作了所有整数字段-> BigInteger() - 不走运。
有趣的是，如果您不将可选的 kwarg“方法”作为“多”传递 - df.to_sql 可以正常工作。我认为问题可能出在“多”中-但我不确定。
谢谢

最佳答案

通过类似的设置，我用更小的 block 大小避免了这个错误。

关于python - pd.DataFrame.to_sql(method ="multi") GCP Postgres 引发 struct.error 'h' 格式需要 -32768 <= number <= 32767 和用户定义的 dtypes，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/71752718/

文章推荐： amazon-web-services - 2FA 不适用于 Ubuntu AWS 实例

文章推荐： PHP Plesk - php 错误日志文件丢失

文章推荐： node.js - 状态码 503 Service Unavailable 可能是什么原因

rust - 如何将 serde_json::error::Error 转换为 reqwest::error::Error？
reqwest v0.9 将 serde v1.0 作为依赖项，因此实现 converting serde_json errors into reqwest error . 在我的代码中，我使用 se
error-handling - 有没有办法将 std::io::Error 转换为 failure::error::Error？
我有这个代码: let file = FileStorage { // ... }; file.write("Test", bytes.as_ref()) .map_err(|e| Mu
角度攀登: Error: No errors
我只是尝试用angular-cli创建一个新项目，然后运行服务器，但是它停止并显示一条有趣的消息:Error: No errors。我以这种方式更新了(希望有帮助):npm uninstall -g
payload - 我收到错误 : "MetaMask - RPC Error: Error: Error: [ethjs-rpc] rpc error with payload"
我从我的 javascript 发送交易 Metamask 打开传输对话框我确定 i get an error message in metamask (inpage.js:1 MetaMask -
error-handling - 使用 Box 装箱后如何处理不同的错误类型？
这个问题在这里已经有了答案: How do you define custom `Error` types in Rust? (3 个答案) How to get a reference to a
swift - error = error 与 error != nil 之间的区别
我想知道两者之间有什么大的区别 if let error = error{} vs if error != nil?或者只是人们的不同之处，比如他们如何用代码表达自己？例如，如果我使用这段代码: u
blazor - 错误 : Connection disconnected with error 'Error: Server returned an error on close: Connection closed with an error.'
当我尝试发送超过 50KB 的图像时，我在 Blazor 服务器应用程序上收到以下错误消息 Error: Connection disconnected with error 'Error: Serv
jsf - JSF : error handling with and JSF1073 error
我有一个error-page指令，它将所有异常重定向到错误显示页面我的web.xml: [...] java.lang.Exception /vi
node.js - 如何修复 'error: Error: syntax error - at value'
我有这样的对象: address: { "phone" : 888, "value" : 12 } 在 WHERE 中我需要通过 address.value 查找对象，但是在 SQL 中有函数
c++ - '标识符' : redefinition errors ( error C2011 & error C2370)
每次我尝试编译我的代码时，我都会遇到大量错误。这不是我的代码的问题，因为它在另一台计算机上工作得很好。我尝试重新安装和修复，但这没有帮助。这是整个错误消息: 1>------ Build starte
error-handling - Bison : one error causes additional but incorrect error
在我的代码的类部分，如果我写一个错误，则在不应该的情况下，将有几行报告为错误。我将'| error'放在可以从错误中恢复的良好/安全位置，但是我认为它没有使用它。也许它试图在某个地方恢复中间表情？有
Python捕获异常 "pandas.errors.ParserError: Error tokenizing data. C error"
我遇到了 csv 输入文件整体读取故障的问题，我可以通过在 read_csv 函数中添加 "error_bad_lines=False" 来删除这些问题来解决这个问题。但是我需要报告这些造成问题的文
java - Spring : How to resolve a validation error -> error code -> error message
在 Spring 中，验证后我们在 controller 中得到一个 BindingResult 对象。很简单，如果我收到验证错误，我想重新显示我的表单，并在每个受影响的字段上方显示错误消息。因此
eclipse - Java 运行时环境检测到 fatal error : Internal Error ; Error: ShouldNotReachHere()
我不知道出了什么问题，因为我用 Java 编程了大约一年，从来没有遇到过这个错误。在一分钟前在 Eclipse 中编译和运行工作，现在我得到这个错误: #A fatal error has been
postgresql - Postgres : Error [42601] Error: Syntax error at or near "$2". 执行查询时出错
SELECT to_char(messages. TIME, 'YYYY/MM/DD') AS FullDate, to_char(messages. TIME, 'MM/DD
.net - VB.NET : error BC30037, followed by error BC30627 and error BC30465
我收到这些错误: AnonymousPath\Anonymized.vb : error BC30037: Character is not valid. AnonymousPath\Anonymiz
sungridengine - 网格引擎 : error: commlib error: got select error (connection refused)
我刚刚安装了 gridengine 并在执行 qstat 时出现错误: error: commlib error: got select error (Connection refused) erro
php - 尖叫 : Error suppresion ignored for Parse error: syntax error PHP
嗨，我正在学习 PHP，我从 CRUD 系统开始，我在 Windows 上安装了 WAMP 服务器，当我运行它时，我收到以下错误消息。 SCREAM: Error suppression ignore
swift - fatal error : Unresolved error Error Domain=NSCocoaErrorDomain Code=134140
我刚刚开始一个新项目，我正在学习核心数据教程，可以找到:https://www.youtube.com/watch?v=zZJpsszfTHM 我似乎无法弄清楚为什么会抛出此错误。我有一个名为“Exp
c++ - JENKINS BUILD ERROR fatal error C1853 : precompiled header error
当我使用 Jenkins 运行新构建时，出现以下错误: "FilePathY\XXX.cpp : fatal error C1853: 'FilePathZ\XXX.pch' precompiled

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - pd.DataFrame.to_sql(method ="multi") GCP Postgres 引发 struct.error 'h' 格式需要 -32768 <= number <= 32767 和用户定义的 dtypes