update 20M records in table and set col1 = col2 + col3 for each record using spring boot.
更新表中的200M条记录,并使用Spring Boot为每个记录设置col1=col2+col3。
try (Connection connection = dataSource.getConnection()) {
String updateQuery = "UPDATE table SET col1 = ? WHERE id = ?";
PreparedStatement preparedStatement = connection.prepareStatement(updateQuery);
for (TableEntity entity : tableEntities) {
preparedStatement.setInt(1, calculate(offersLeadScore)); // Set new col1
preparedStatement.setObject(2, entity.getId());
preparedStatement.addBatch();
}
preparedStatement.executeBatch();`
The above approach is taking so much time to execute and update all records in table.
上述方法花费了大量时间来执行和更新表中的所有记录。
How to achieve faster execution using spring boot?
如何使用Spring Boot实现更快的执行速度?
Tried JPA's saveAll() and above approach. Could not optimise it.
尝试了JPA的saveAll()和上面的方法。无法对其进行优化。
Need some industry standard approach to update all records in one table in faster way.
需要一些行业标准的方法,以更快的方式更新一个表中的所有记录。
更多回答
It seems to me that this question doesn't satisfy criteria of stackoverflow.com/help/minimal-reproducible-example. You mention col2+col3 in the title but it never plays any role in the code. The code mentions some objects and functions that are not introduced above. Please get rid of unimportant details in your example code so that it's possible to suggest something.
在我看来,这个问题不符合stackoverflow.com/help/minimal-reproducible-example.的标准您在标题中提到了col2+col3,但它从未在代码中扮演任何角色。代码提到了一些上面没有介绍的对象和函数。请去掉示例代码中不重要的细节,这样才有可能提出一些建议。
Also, from the first sight, I don't think this problem has something to do with Spring. Spring only gives you convenient way to get rid of boiler plate code and configuration. Optimising batch update for 20 mln records is not what Spring is invented for.
另外,乍一看,我不认为这个问题与Spring有关。Spring只给你提供了摆脱模板代码和配置的便捷方式。优化2000万条记录的批量更新不是Spring的目的。
优秀答案推荐
Try this approach, it works by using a Spring Boot service called TableService to effectively update a big dataset of 20 million records inside a database table. The updateCol1ForTable function contains the main operation. This method builds a SQL update query that, using the record's id as a reference, computes the new value for col1 as the sum of col2 and col3 for each record.
Here is the code;
尝试这种方法,它的工作原理是使用一个名为TableService的Spring Boot服务来有效地更新数据库表中包含2000万条记录的大数据集。UpdateCol1ForTable函数包含主操作。此方法构建一个SQL更新查询,该查询使用记录的id作为引用,将每条记录的col2和col3之和计算为col1的新值。以下是代码;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.jdbc.core.BatchPreparedStatementSetter;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.stereotype.Service;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.List;
@Service
public class TableService {
private final JdbcTemplate jdbcTemplate;
@Autowired
public TableService(JdbcTemplate jdbcTemplate) {
this.jdbcTemplate = jdbcTemplate;
}
public void updateCol1ForTable(List<TableEntity> tableEntities) {
String updateQuery = "UPDATE your_table_name SET col1 = col2 + col3 WHERE id = ?";
jdbcTemplate.batchUpdate(updateQuery, new BatchPreparedStatementSetter() {
@Override
public void setValues(PreparedStatement preparedStatement, int i) throws SQLException {
preparedStatement.setLong(1, tableEntities.get(i).getId());
}
@Override
public int getBatchSize() {
return tableEntities.size();
}
});
}
}
To efficiently update the records in batches, call this function from your service layer or controller.
要有效地批量更新记录,请从您的服务层或控制器调用此函数。
Hope it works :)
希望它能奏效:)
Why? At best you are introducing a maintenance headache. Will every developer and user remember to appropriately update col1
whenever col2 and/or col3 is updated and every time a row is inserted. A much better solution is to perform the calculation on the select
. It is a 1 time maintenance of the select. If for some reason you think you need to store the derived result then define col1
as a generated. First drop the column then re-add it: (demo here)
为什么?充其量,您正在引入一个令人头疼的维护问题。是否每个开发人员和用户都会记得在每次更新col2和/或col3以及每次插入一行时适当地更新col1。更好的解决方案是在SELECT上执行计算。这是对SELECT的一次性维护。如果出于某种原因,您认为需要存储派生结果,则将col1定义为生成的。首先删除该列,然后重新添加它:(在此处演示)
alter table <your_table_name> drop col1;
alter table <your_table_name>
add col1 bigint
generated always as (col2::bigint + col3::bigint) stored;
The initial run/setting will not be fast. But the trade off is you have no maintenance and the value of col1
is automatically calculated when a row is added and whenever col2' and/or
col3' is updated. No additional maintenance required initially or later.
初始运行/设置不会很快。但代价是您不需要维护,并且在添加行和更新col2‘和/或col3’时会自动计算col1的值。最初或以后不需要额外的维护。
更多回答
what if we dont want to update col1 based on some condition? like if col1 is greater than 100, dont update it. But update remaining. (just an example)
如果我们不想基于某些条件更新COL1,该怎么办?例如,如果col1大于100,则不要更新它。但仍有更新。(仅举个例子)
Then using a generated column will not work. It seems you have 3 viable choices. 1: A trigger which performs the calculation when needed. 2: Create a view which derives the appropriate value. 3: Just derive it during SELECT. ( a first step for the view). 4: Leave it to the app (viable only if you can guarantee the app is the only access - I never figured out how to do that. ). Without knowing all the details I would tend for #2.
则使用生成的列将不起作用。看起来你有3个可行的选择。1:在需要时执行计算的触发器。2:创建派生适当值的视图。3:仅在SELECT过程中派生。(视图的第一步)。他说:把它留给应用程序吧(只有当你能保证应用程序是唯一的访问途径时才可行--我从来没有想过如何做到这一点。)。在不知道所有细节的情况下,我会倾向于第二个。
我是一名优秀的程序员,十分优秀!