python - psycopg2 的 fast_executemany 替代方案-6ren

python - psycopg2 的 fast_executemany 替代方案

转载作者：行者123 更新时间：2023-12-01 12:09:12

25

4

我有一个 Redshift 服务器，它是通过 psycopg2 启动的(请注意，公司服务器不支持 ODBC，因此我无法使用 pyodbc)。

目前通过 pd.to_sql() 处理 30-35k 行需要 10 多分钟，它从数据帧写入 Redshift DB。因此，作为一种解决方法，我将 DF 下载为 csv，将文件推送到 S3，然后使用 copy写入数据库。
fast_executemany按照 Speeding up pandas.DataFrame.to_sql with fast_executemany of pyODBC 的解决方案本来是完美的 - 但是 psycopg2 不支持此功能.
我也找到了 d6tstack按照 https://github.com/d6t/d6tstack/blob/master/examples-sql.ipynb但是 pd_to_psql不适用于 Redshift，仅适用于 Postgresql(不能 copy... from stdin)

我可以为我的案例使用任何替代方案吗？

这是我的代码:

import sqlalchemy as sa

DATABASE = ""
USER = ""
PASSWORD = ""
HOST = "...us-east-1.redshift.amazonaws.com"
PORT = "5439"
SCHEMA = "public" 

server = "redshift+psycopg2://%s:%s@%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(server)
conn = engine.raw_connection()

with conn.cursor() as cur:
    cur.execute('truncate table_name')

df.to_sql('table_name', engine, index=False, if_exists='append')

最佳答案

如果您无法使用 COPY from S3并且必须依赖DML，您可以尝试通过 use_batch_mode=True 至 create_engine() :

engine = create_engine('theurl', use_batch_mode=True)

从这台机器向 Redshift 集群简单插入 500 行显示了启用批处理模式的合理改进:

In [31]: df = pd.DataFrame({'batchno': range(500)})

In [32]: %time df.to_sql('batch', engine, index=False, if_exists='append')
CPU times: user 87.8 ms, sys: 57.6 ms, total: 145 ms
Wall time: 1min 6s

In [33]: %time df.to_sql('batch', bm_engine, index=False, if_exists='append')
CPU times: user 10.3 ms, sys: 4.66 ms, total: 15 ms
Wall time: 9.96 s

请注意，Pandas 0.23.0 和 0.24.0 及更高版本不会从使用批处理模式中受益，因为如果底层 DBMS 支持，它们使用多值插入而不是 executemany。使用多值插入应该会在吞吐量上提供一些类似的改进，因为发出的查询更少。

关于python - psycopg2 的 fast_executemany 替代方案，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53656225/

25

4

0

文章推荐： java - List.indexOf() 的 lombok EqualsAndHashCode

文章推荐： java - 将一个对象发送到另一个对象内的数组

文章推荐： java - 在哪里最好定义java枚举？

文章推荐： java - 如何将表单输入值从jsp页面传递到java类？

python - psycopg2 的 fast_executemany 替代方案
我有一个 Redshift 服务器，它是通过 psycopg2 启动的(请注意，公司服务器不支持 ODBC，因此我无法使用 pyodbc)。目前通过 pd.to_sql() 处理 30-35k 行需
python - 为什么在微型 df 上使用 fast_executemany 会出现内存错误？
我正在寻找加速将数据帧推送到 sql server 的方法，并偶然发现了一种方法 here.这种方法在速度方面让我震惊。使用普通的 to_sql 花费了将近 2 个小时，而这个脚本在 12.54 秒内
python - 使用 fast_executemany 写入 MS-Access 数据库不起作用
我在将数据加载到 Access 数据库时遇到问题。出于测试目的，我构建了一些转换函数，它从 hdf 文件中获取所有数据集并将其写入 accdb。没有 @event.listens_for(engine
python - 属性错误: 'psycopg2.extensions.cursor' object has no attribute 'fast_executemany'
属性错误:“psycopg2.extensions.cursor”对象没有属性“fast_executemany” to_sql() 太慢。所以试图解决这个问题。但是当我运行以下代码时，我得到:- A
python - 使用 fast_executemany Python pyodbc 加速 pandas 数据帧的插入
我正在尝试将 .csv 文件中包含的数据从我的电脑插入到远程服务器。这些值被插入到包含 3 列的表中，即 Timestamp、Value 和 TimeseriesID。我必须一次插入大约 3000 行
python - 为什么使用 fast_executemany=True 调用 cursor.executemany() 会导致段错误？
我有一个 pydodbc 游标连接到 Azure SQL 数据库，并安装了最新版本的 FreeTDS，在 Mac OS Sierra 10.12.6 上运行: cursor.execute("CREA
sql-server - pyodbc:使用带有 TEXT/NTEXT 列的 fast_executemany 的内存错误
我在将行插入数据库时遇到问题。只是想知道是否有人知道为什么会发生这种情况？当我避免使用 fast_executemany 但插入变得非常慢时，它会起作用。 driver = 'ODBC Drive
python - 使用 pyODBC 的 fast_executemany 加速 pandas.DataFrame.to_sql
我想向运行 MS SQL 的远程服务器发送一个大型 pandas.DataFrame。我现在这样做的方法是将 data_frame 对象转换为元组列表，然后使用 pyODBC 的 executeman
python - 使用 Access ODBC 的 pyodbc fast_executemany 会使 Python 解释器崩溃
我正在尝试在 MS Access 数据库中生成并插入许多 (>1.000.000) 行。对于这一代，我使用 numpy 函数，因此我尝试使用 python Access 数据库。我从 pyodbc 开

首页

博学

6Ren·AI

商城

python - psycopg2 的 fast_executemany 替代方案