update 20M records in table and set col1 = col2 + col3 for each record using spring boot(更新表中的200M条记录，并使用Spring Boot为每条记录设置col1=col2+col3)-6ren

update 20M records in table and set col1 = col2 + col3 for each record using spring boot(更新表中的200M条记录，并使用Spring Boot为每条记录设置col1=col2+col3)

转载作者：bug小助手更新时间：2023-10-28 10:20:10

update 20M records in table and set col1 = col2 + col3 for each record using spring boot.

更新表中的200M条记录，并使用Spring Boot为每个记录设置col1=col2+col3。

try (Connection connection = dataSource.getConnection()) {
            String updateQuery = "UPDATE table SET col1 = ? WHERE id = ?";
            PreparedStatement preparedStatement = connection.prepareStatement(updateQuery);

        for (TableEntity entity : tableEntities) {
            preparedStatement.setInt(1, calculate(offersLeadScore)); // Set new col1
            preparedStatement.setObject(2, entity.getId());
            preparedStatement.addBatch();
        }
        preparedStatement.executeBatch();`

The above approach is taking so much time to execute and update all records in table.

上述方法花费了大量时间来执行和更新表中的所有记录。

How to achieve faster execution using spring boot?

如何使用Spring Boot实现更快的执行速度？

Tried JPA's saveAll() and above approach. Could not optimise it.

尝试了JPA的saveAll()和上面的方法。无法对其进行优化。

Need some industry standard approach to update all records in one table in faster way.

需要一些行业标准的方法，以更快的方式更新一个表中的所有记录。

更多回答

It seems to me that this question doesn't satisfy criteria of stackoverflow.com/help/minimal-reproducible-example. You mention col2+col3 in the title but it never plays any role in the code. The code mentions some objects and functions that are not introduced above. Please get rid of unimportant details in your example code so that it's possible to suggest something.

在我看来，这个问题不符合stackoverflow.com/help/minimal-reproducible-example.的标准您在标题中提到了col2+col3，但它从未在代码中扮演任何角色。代码提到了一些上面没有介绍的对象和函数。请去掉示例代码中不重要的细节，这样才有可能提出一些建议。

Also, from the first sight, I don't think this problem has something to do with Spring. Spring only gives you convenient way to get rid of boiler plate code and configuration. Optimising batch update for 20 mln records is not what Spring is invented for.

另外，乍一看，我不认为这个问题与Spring有关。Spring只给你提供了摆脱模板代码和配置的便捷方式。优化2000万条记录的批量更新不是Spring的目的。

优秀答案推荐

Try this approach, it works by using a Spring Boot service called TableService to effectively update a big dataset of 20 million records inside a database table. The updateCol1ForTable function contains the main operation. This method builds a SQL update query that, using the record's id as a reference, computes the new value for col1 as the sum of col2 and col3 for each record.
Here is the code;

尝试这种方法，它的工作原理是使用一个名为TableService的Spring Boot服务来有效地更新数据库表中包含2000万条记录的大数据集。UpdateCol1ForTable函数包含主操作。此方法构建一个SQL更新查询，该查询使用记录的id作为引用，将每条记录的col2和col3之和计算为col1的新值。以下是代码；

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.jdbc.core.BatchPreparedStatementSetter;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.stereotype.Service;

import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.List;

@Service
public class TableService {

    private final JdbcTemplate jdbcTemplate;

    @Autowired
    public TableService(JdbcTemplate jdbcTemplate) {
        this.jdbcTemplate = jdbcTemplate;
    }

    public void updateCol1ForTable(List<TableEntity> tableEntities) {
        String updateQuery = "UPDATE your_table_name SET col1 = col2 + col3 WHERE id = ?";
        
        jdbcTemplate.batchUpdate(updateQuery, new BatchPreparedStatementSetter() {
            @Override
            public void setValues(PreparedStatement preparedStatement, int i) throws SQLException {
                preparedStatement.setLong(1, tableEntities.get(i).getId());
            }

            @Override
            public int getBatchSize() {
                return tableEntities.size();
            }
        });
    }
}

To efficiently update the records in batches, call this function from your service layer or controller.

要有效地批量更新记录，请从您的服务层或控制器调用此函数。

Hope it works :)

希望它能奏效：)

Why? At best you are introducing a maintenance headache. Will every developer and user remember to appropriately update col1 whenever col2 and/or col3 is updated and every time a row is inserted. A much better solution is to perform the calculation on the select. It is a 1 time maintenance of the select. If for some reason you think you need to store the derived result then define col1 as a generated. First drop the column then re-add it: (demo here)

为什么？充其量，您正在引入一个令人头疼的维护问题。是否每个开发人员和用户都会记得在每次更新col2和/或col3以及每次插入一行时适当地更新col1。更好的解决方案是在SELECT上执行计算。这是对SELECT的一次性维护。如果出于某种原因，您认为需要存储派生结果，则将col1定义为生成的。首先删除该列，然后重新添加它：(在此处演示)

alter table <your_table_name> drop col1; 
alter table <your_table_name> 
      add col1 bigint
          generated always as (col2::bigint + col3::bigint) stored;

The initial run/setting will not be fast. But the trade off is you have no maintenance and the value of col1 is automatically calculated when a row is added and whenever col2' and/or col3' is updated. No additional maintenance required initially or later.

初始运行/设置不会很快。但代价是您不需要维护，并且在添加行和更新col2‘和/或col3’时会自动计算col1的值。最初或以后不需要额外的维护。

更多回答

what if we dont want to update col1 based on some condition? like if col1 is greater than 100, dont update it. But update remaining. (just an example)

如果我们不想基于某些条件更新COL1，该怎么办？例如，如果col1大于100，则不要更新它。但仍有更新。(仅举个例子)

Then using a generated column will not work. It seems you have 3 viable choices. 1: A trigger which performs the calculation when needed. 2: Create a view which derives the appropriate value. 3: Just derive it during SELECT. ( a first step for the view). 4: Leave it to the app (viable only if you can guarantee the app is the only access - I never figured out how to do that. ). Without knowing all the details I would tend for #2.

则使用生成的列将不起作用。看起来你有3个可行的选择。1：在需要时执行计算的触发器。2：创建派生适当值的视图。3：仅在SELECT过程中派生。(视图的第一步)。他说：把它留给应用程序吧(只有当你能保证应用程序是唯一的访问途径时才可行--我从来没有想过如何做到这一点。)。在不知道所有细节的情况下，我会倾向于第二个。

文章推荐： Using rank() in Postgres(在postgres中使用ranch())

mysql - 同步/流式传输 MySQL 表/表(连接表)与 PostgreSQL 表/表
我有一台 MySQL 服务器和一台 PostgreSQL 服务器。需要从多个表中复制或重新插入一组数据 MySQL 流式传输/同步到 PostgreSQL 表。这种复制可以基于时间(Sync)或事
php - 从用户(表)获取数据其中用户(表)的id等于 friend (表)的id
如果两个表的 id 彼此相等，我尝试从一个表中获取数据。这是我使用的代码: SELECT id_to , email_to , name_to , status_to
sql - Excel 表 SQL 表
我有一个 Excel 工作表。顶行对应于列名称，而连续的行每行代表一个条目。如何将此 Excel 工作表转换为 SQL 表？我使用的是 SQL Server 2005。最佳答案这取决于您使用哪
mysql - 如何将两个django模型(表)合并为一个模型(表)
我想合并两个 Django 模型并创建一个模型。让我们假设我有第一个表表 A，其中包含一些列和数据。 Table A -------------- col1 col2 col3 col
mysql - 表 1、表 2 的多列左连接
我有两个表:table1，table2，如下所示 table1: id name 1 tamil 2 english 3 maths 4 science table2: p
sql - 大传感器数据最佳选择。表 SQL 与 Azure 表
关闭。此题需要details or clarity 。目前不接受答案。想要改进这个问题吗？通过 editing this post 添加详细信息并澄清问题. 已关闭 1 年前。 Improve th
dynamics-ax-2009 - 表=表与表.数据(表)
下面两个语句有什么区别？ newTable = orginalTable 或 newTable.data(originalTable) 我怀疑 .data() 方法具有性能优势，因为它在标准 AX 中
SQL Server 表 -(或可能是任何 SQL 表)没有主键会影响性能吗？
我有一个表，我没有在其中显式定义主键，它并不是真正需要的功能......但是一位同事建议我添加一个列作为唯一主键以随着数据库的增长提高性能...... 谁能解释一下这是如何提高性能的？没有使用索引(
php - 将产品详细信息插入 'product' 表，并将产品图像插入 'image' 表
如何将表“产品”中的产品记录与其不同表“图像”中的图像相关联？我正在对产品 ID 使用自动增量。我觉得不可能进行关联，因为产品 ID 是自动递增的，因此在插入期间不可用! 如何插入新产品，获取产品
python - 创建一个新的 sql 表，其中的列源自另一个 sql 表
我有一个 sql 表，其中包含关键字和出现次数，如下所示(尽管出现次数并不重要): ____________ dog | 3 | ____________ rat | 7 | ____
MySQL LAST_INSERT_ID() 与 INSERT INTO 表 SELECT FROM 表
是否可以使用目标表中的LAST_INSERT_ID更新源表？ INSERT INTO `target` SELECT `a`, `b` FROM `source` 目标表有一个自动增量键id，我想将其
mysql - 查询 - 在简单的 mysql 内连接中定义(表，表)
我正在重建一个搜索查询，因为它在“我看到的”中变得多余，我想知道什么 (albums_artists, artists) ( ) does in join? is it for boosting pe
innodb - mysqldump 备份缺少所有 innodb 表，但没有 MyISAM 表
以下是我使用 mysqldump 备份数据库的开关: /usr/bin/mysqldump -u **** --password=**** --single-transaction --databas
html - 为什么 MySQL 表中的所有行都是相同的？ (MySQL 表 > HTML 表)
我试图获取 MySQL 表中的所有行并将它们放入 HTML 表中: Exam ID Status Assigned Examiner
mysql - 查询 'photos' 表，同时查询 'bookmarks' 表，以便知道添加书签的照片
如何查询名为 photos 的表中的所有记录，并知道当前用户使用单个查询将哪些结果照片添加为书签？这是我的表格: -- -- Table structure for table `photos` -
Mysql MEMORY 表 vs InnoDB 表(很多插入，很少读取)
我的网站都在 InnoDB 表上运行，目前为止运行良好。现在我想知道在我的网站上实时发生了什么，所以我将每个页面浏览量(页面、引荐来源网址、IP、主机名等)存储在 InnoDB 表中。每秒大约有 10
mysql - 如何在 mysql 中存储客户数据(2 表 vs 1 表)
我在想我会为 mysql 准备两个表。一个用于存储登录信息，另一个用于存储送货地址。这是传统方式还是所有内容都存储在一张表中？对于两个表...有没有办法自动将表 A 的列复制到表 B，以便我可以引用
mysql - 表 1 包含名字和姓氏，表 2 包含两列引用表 1 上的名称
我不是程序员，我从这个表格中阅读了很多关于如何解决我的问题的内容，但我的搜索效果不好我有两张 table 表 1:成员 id*| name | surname -------------------
c# - 如何在 ASP.NET 中显示 "View"表(SQL 表)？
我知道如何在 ASP.NET 中显示真实表，例如 public ActionResult Index() { var s = db.StaffInfoDBSet.ToList(); r
php - INSERT INTO 表 VALUES.. 与 INSERT INTO 表 SET 错误
我正在尝试运行以下查询: "insert into visits set source = 'http://google.com' and country = 'en' and ref = '1234

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

update 20M records in table and set col1 = col2 + col3 for each record using spring boot(更新表中的200M条记录，并使用Spring Boot为每条记录设置col1=col2+col3)