英文:
Batch INSERT into 2 related tables while avoid SQL Injection
问题
我使用的是Java 8,JDBC和MySql。我想要将大量数据(2,000行)插入到2个表中。这两个表之间是一对一的关系。第一个表是 `order_items`:
| id | amount |
|:--------|----------------:|
| 1 | 20 |
| 2 | 25 |
| 3 | 30 |
第二个表是 `delivery_details`:
| orderItemId | message |
|----------------:|:-----------|
| 1 | hello. |
| 2 | salut. |
| 3 | ciao. |
`orderItemId` 是指向 `order_items` 的外键。
这些数据在以下类中表示:
```java
public class OrderItemDelivery {
@SerializedName("amount")
private BigDecimal amount = null;
@SerializedName("message")
private String message = null;
// getters and setters below
// ...
// ...
}
我需要批量执行插入以减少执行时间。List<OrderItemDelivery> orderItemDeliveries
包含了 2,000 条数据。我当前的代码是:
Connection connection = this.hikariDataSource.getConnection();
connection.setAutoCommit(false);
Statement statement = connection.createStatement();
for (int x = 0; x < orderItemDeliveries.size(); x++) {
sql = String.format("INSERT INTO order_items (amount) VALUES ('%s')", orderItemDelivery.getAmount());
statement.addBatch(sql);
sql = String.format("INSERT INTO `delivery_details` (`orderItemId`, `message`) VALUES (LAST_INSERT_ID(), '%s')", orderItemDelivery.getMessage());
statement.addBatch(sql);
}
statement.executeBatch();
statement.close();
connection.setAutoCommit(true);
connection.close();
这个方法非常高效,但是存在 SQL 注入的风险。如果我要使用 PreparedStatement
,我需要为 order_items
批次和 delivery_details
批次分别创建一个。而且 LAST_INSERT_ID()
将无法工作。
是否有其他方法可以解决这个问题?根据我所见,似乎没有。我需要通过使用 Java 来清理 message
和 amount
来防止 SQL 注入,但是这似乎有一些局限性。例如,message
可能包含撇号和表情符号。有没有其他解决方案?
编辑
以下是我想出的一个非常高效的解决方案:
String orderItemSql = "INSERT INTO order_items (amount) VALUES (?) ";
for (int x = 1; x < orderItemDeliveries.size(); x++) {
orderItemSql += ", (?)";
}
PreparedStatement preparedStatement = connection.prepareStatement(orderItemSql, Statement.RETURN_GENERATED_KEYS);
int i = 1;
for (int x = 0; x < orderItemDeliveries.size(); x++) {
preparedStatement.setDouble(i++, orderItemDelivery.getAmount().doubleValue());
}
preparedStatement.executeUpdate();
Long ids[] = new Long[orderItemDeliveries.size()];
ResultSet rs = preparedStatement.getGeneratedKeys();
int x = 0;
while (rs.next()) {
ids[x] = rs.getLong(1);
x++;
}
String deliveryDetails = "INSERT INTO `delivery_details` (`orderItemId`, `message`) VALUES (?, ?)";
for (x = 1; x < orderItemDeliveries.size(); x++) {
deliveryDetails += ", (?)";
}
preparedStatement = connection.prepareStatement(deliveryDetails);
i = 1;
for (x = 0; x < orderItemDeliveries.size(); x++) {
orderItemDelivery = orderItemDeliveries.get(x);
preparedStatement.setLong(i++, ids[x]);
preparedStatement.setString(i++, orderItemDelivery.getMessage());
}
preparedStatement.executeUpdate();
所以为了使这个方法起作用,ids
的顺序必须是连续的,而且 orderItemDeliveries
的顺序在列表的第一次循环和第二次循环之间不能改变。
这种方法可能感觉有些巧妙,但是它是有效的。我是否漏掉了什么?
<details>
<summary>英文:</summary>
I'm using Java 8, JDBC and MySql. I want to insert a large amount of data (2,000 rows) into 2 tables. The tables have a 1 to 1 relationship. First table is `order_items`:
| id | amount |
|:--------|----------------:|
| 1 | 20 |
| 2 | 25 |
| 3 | 30 |
Second table is `delivery_details`:
| orderItemId | message |
|----------------:|:-----------|
| 1 | hello. |
| 2 | salut. |
| 3 | ciao. |
`orderItemId` is a foreign key to `order_items`.
The data is represented in this class:
public class OrderItemDelivery {
@SerializedName("amount")
private BigDecimal amount = null;
@SerializedName("message")
private String message = null;
// getters and setters below
...
...
}
I need to execute the inserts as a batch to cut execution time. `List<OrderItemDelivery> orderItemDeliveries` contains 2,000 items. My current code is:
Connection connection = this.hikariDataSource.getConnection();
connection.setAutoCommit(false);
Statement statement = connection.createStatement();
for (int x = 0; x < orderItemDeliveries.size(); x++) {
sql = String.format("INSERT INTO order_items (amount) VALUES ('%s')", orderItemDelivery.getAmount());
statement.addBatch(sql);
sql = String.format("INSERT INTO `delivery_details` (`orderItemId`, `message`) VALUES (LAST_INSERT_ID(), '%s')", orderItemDelivery.getMessage());
statement.addBatch(sql);
}
statement.executeBatch();
statement.close();
connection.setAutoCommit(true);
connection.close();
This is really efficient, but the limitation here is it's open to SQL Injection. If I was to use `PreparedStatement`, I would need one for the `order_items` batch and one for the `delivery_details` batch. And then `LAST_INSERT_ID()` would not work.
Is there any way around this? From what I've seen, there isn't. And I need to prevent SQL Injection by sanitising the `message` and `amount` with Java, which appears to have limitations. For example `message` can contain apostrophies and emojis. Can anyone think of another solution?
**EDIT**
Here's a really efficient solution I've come up with:
String orderItemSql = "INSERT INTO order_items (amount) VALUES (?) ";
for (int x = 1; x < orderItemDeliveries.size(); x++) {
orderItemSql += ", (?)";
}
PreparedStatement preparedStatement = connection.prepareStatement(orderItemSql, Statement.RETURN_GENERATED_KEYS);
int i = 1;
for (int x = 0; x < orderItemDeliveries.size(); x++) {
preparedStatement.setDouble(i++, orderItemDelivery.getAmount().doubleValue());
}
preparedStatement.executeUpdate();
Long ids[] = new Long[orderItemDeliveries.size()];
ResultSet rs = preparedStatement.getGeneratedKeys();
int x = 0;
while (rs.next()) {
ids[x] = rs.getLong(1);
x++;
}
String deliveryDetails = "INSERT INTO `delivery_details` (`orderItemId`, `message`) VALUES (?, ?)";
for (x = 1; x < orderItemDeliveries.size(); x++) {
deliveryDetails += ", (?)";
}
preparedStatement = connection.prepareStatement(deliveryDetails);
i = 1;
for (x = 0; x < orderItemDeliveries.size(); x++) {
orderItemDelivery = orderItemDeliveries.get(x);
preparedStatement.setLong(i++, ids[x]);
preparedStatement.setString(i++, orderItemDelivery.getMessage());
}
preparedStatement.executeUpdate();
So for this to work, the order of the `ids` must be sequential, and the order of `orderItemDeliveries` must not change between the first loop through of the list and the second.
This feels a bit hacky, but it works. Am I missing something?
</details>
# 答案1
**得分**: 1
以下是翻译好的内容:
这是我最终所做的,使用 `getGeneratedKeys()`:
```java
String orderItemSql = "INSERT INTO order_items (amount) VALUES (?) ";
for (int x = 1; x < orderItemDeliveries.size(); x++) {
orderItemSql += ", (?)";
}
PreparedStatement preparedStatement = connection.prepareStatement(orderItemSql, Statement.RETURN_GENERATED_KEYS);
int i = 1;
for (int x = 0; x < orderItemDeliveries.size(); x++) {
preparedStatement.setDouble(i++, orderItemDelivery.getAmount().doubleValue());
}
preparedStatement.executeUpdate();
Long ids[] = new Long[orderItemDeliveries.size()];
ResultSet rs = preparedStatement.getGeneratedKeys();
int x = 0;
while (rs.next()) {
ids[x] = rs.getLong(1);
x++;
}
String deliveryDetails = "INSERT INTO `delivery_details` (`orderItemId`, `message`) VALUES (?, ?)";
for (x = 1; x < orderItemDeliveries.size(); x++) {
deliveryDetails += ", (?)";
}
preparedStatement = connection.prepareStatement(deliveryDetails);
i = 1;
for (x = 0; x < orderItemDeliveries.size(); x++) {
orderItemDelivery = orderItemDeliveries.get(x);
preparedStatement.setLong(i++, ids[x]);
preparedStatement.setString(i++, orderItemDelivery.getMessage());
}
preparedStatement.executeUpdate();
所以为了使这个工作正常,ids 的顺序必须是连续的,orderItemDeliveries 的顺序在第一次遍历列表和第二次遍历列表时不能改变。
这感觉有点不太正规,但它可以工作。
<details>
<summary>英文:</summary>
Here's what I ended up doing, using `getGeneratedKeys()`:
String orderItemSql = "INSERT INTO order_items (amount) VALUES (?) ";
for (int x = 1; x < orderItemDeliveries.size(); x++) {
orderItemSql += ", (?)";
}
PreparedStatement preparedStatement = connection.prepareStatement(orderItemSql, Statement.RETURN_GENERATED_KEYS);
int i = 1;
for (int x = 0; x < orderItemDeliveries.size(); x++) {
preparedStatement.setDouble(i++, orderItemDelivery.getAmount().doubleValue());
}
preparedStatement.executeUpdate();
Long ids[] = new Long[orderItemDeliveries.size()];
ResultSet rs = preparedStatement.getGeneratedKeys();
int x = 0;
while (rs.next()) {
ids[x] = rs.getLong(1);
x++;
}
String deliveryDetails = "INSERT INTO `delivery_details` (`orderItemId`, `message`) VALUES (?, ?)";
for (x = 1; x < orderItemDeliveries.size(); x++) {
deliveryDetails += ", (?)";
}
preparedStatement = connection.prepareStatement(deliveryDetails);
i = 1;
for (x = 0; x < orderItemDeliveries.size(); x++) {
orderItemDelivery = orderItemDeliveries.get(x);
preparedStatement.setLong(i++, ids[x]);
preparedStatement.setString(i++, orderItemDelivery.getMessage());
}
preparedStatement.executeUpdate();
So for this to work, the order of the ids must be sequential, and the order of orderItemDeliveries must not change between the first loop through of the list and the second.
This feels a bit hacky, but it works.
</details>
# 答案2
**得分**: 0
*使用PreparedStatement能实现吗?*
不错的观点,但由于这是一个一对一的关系,你可以为每个表使用单独的序列或AUTO_INCREMENT键,而不是使用`last_insert_id()`,因为它们为相关记录生成相同的值。在具有并发事务的OLTP环境中,我不会这样做,但由于您已经在进行批处理,这可能是合理的。您可以通过提前[锁定][1]两个表来强制进行独占访问,如果这是一个选项的话。
让应用程序跟踪键值也是使用一个自增字段的选项。不幸的是,与Oracle不同,MySQL不允许直接从序列中选择下一个值。例如,可以使用一个具有MAX字段的MAXKEY表。假设您要插入10行,MAX为200。独占锁定MAXKEY,选择MAX(现在您知道,您的键可以从200 + 1开始),将MAXKEY更新为200 + 10,提交(释放锁定)。对于使用预准备查询的2组批量插入,使用201...210。
您可以使用存储过程来接受两个表的值,并在其中分别插入它们(参见[这里][2]),再次使用`last_insert_id()`,并以批处理方式调用该过程(参见[这里][3])。
最终还有SQL清理工具,也许类似于org.apache.commons.lang.StringEscapeUtils.escapeSql()的内容可能会有帮助。
但预准备语句还会添加其他优化。SQL仅一次发送到服务器,与一个二维值数组一起发送。解析后的查询可以被缓存并用于后续调用。您应该能够从中看到一些性能改进。
字符串拼接版本会为每一行发送整个查询,所有查询都是不同的,需要被解析,并且无法在缓存中找到。
[1]: https://dev.mysql.com/doc/refman/8.0/en/lock-tables.html
[2]: https://www.tutorialspoint.com/how-to-insert-into-two-tables-using-a-single-mysql-query
[3]: https://www.benchresources.net/jdbc-calling-stored-procedure-using-callablestatement-with-batch-execution/
<details>
<summary>英文:</summary>
*Is it even possible with PreparedStatement ?*
Good point, but since it is a 1:1 relationship you could use a separate sequence or AUTO_INCREMENT keys for each table, not `last_insert_id()`, given that they generate the same values for correlated records. In an oltp setting with concurrent transactions I wouldn't do that, but since you're batching anyway that may be reasonable. You could force exclusive access by [locking][1] both tables exclusively in advance, if that's an option.
Letting the application track the key values is also an option instead of using one autoinc field. Unfortunately mysql doesn't allow to select directly the next value from a sequence, as opposed to Oracle. E.g. this way: use a MAXKEY table with field MAX. Say you want to insert 10 rows, MAX is at 200. lock MAXKEY exclusively, select MAX (now you know, your keys can start with 200 + 1), update MAXKEY to 200 + 10, commit (releasing the lock). use 201...210 for 2 sets of batched inserts with prepared queries.
You could use a stored procedure to accept the values for both tables and insert seperately in bot of them (see [this][2]), again using `last_insert_id()`, and call the procedure in batched fashion (see [this][3]).
Eventually there are sql sanitizers, perhaps something on the line of org.apache.commons.lang.StringEscapeUtils.escapeSlq() may do.
But prepared statements also add other optimizations. The sql gets only sent once to the server, together with a 2-dimensional array of values. The parsed query can be cached and reused for subsequent calls. You should be able to see some more performance improvement just from that.
The string concatenation version sends the whole query for each row, all of them are different, need to be parsed and can't be found in the cache.
[1]: https://dev.mysql.com/doc/refman/8.0/en/lock-tables.html
[2]: https://www.tutorialspoint.com/how-to-insert-into-two-tables-using-a-single-mysql-query
[3]: https://www.benchresources.net/jdbc-calling-stored-procedure-using-callablestatement-with-batch-execution/
</details>
# 答案3
**得分**: 0
我建议您尝试这个方法。即使它不是批处理方法,它基于`PreparedStatement`,这将始终比内联SQL获得更好的性能:
```java
private void insertItems(Connection connection, Collection<OrderItemDelivery> orderItemDeliveries)
throws SQLException
{
try (PreparedStatement pst1 = connection.prepareStatement("INSERT INTO order_items (amount) VALUES (?)", new String[] { "id" });
PreparedStatement pst2 = connection.prepareStatement("INSERT INTO delivery_details(orderItemId, message) VALUES (?, ?)"))
{
for (OrderItemDelivery orderItemDelivery : orderItemDeliveries)
{
pst1.setString(1, orderItemDelivery.getAmount());
int x = pst1.executeUpdate();
if (x != 1)
{
throw new SQLException("Row was not inserted");
}
try (ResultSet rs = pst1.getGeneratedKeys())
{
if (rs.next())
{
long id = rs.getLong(1);
// TODO 填充第二个预处理语句中的值并调用executeUpdate()。
}
else
{
throw new SQLException("Id was not generated");
}
}
}
}
}
注意:您必须先尝试它;并非所有数据库供应商都实现了getGeneratedKeys
方法。如果您的数据库供应商未实现此方法,只需将生成的键部分替换为调用LAST_INSERT_ID
:它应该可以正常工作。
英文:
I suggest you to try this. Even if it is not a batch approach, it is based upon PreparedStatement
, which will always get a better performance over inlined SQL:
private void insertItems(Connection connection, Collection<OrderItemDelivery> orderItemDeliveries)
throws SQLException
{
try (PreparedStatement pst1=connection.prepareStatement("INSERT INTO order_items (amount) VALUES (?)", new String[] { "id"});
PreparedStatement pst2=connection.prepareStatement("INSERT INTO delivery_details(orderItemId, message) VALUES (?, ?)"))
{
for (OrderItemDelivery orderItemDelivery : orderItemDeliveries)
{
pst1.setString(1, orderItemDelivery.getAmount());
int x=pst1.executeUpdate();
if (x != 1)
{
throw new SQLException("Row was not inserted");
}
try (ResultSet rs=pst1.getGeneratedKeys())
{
if (rs.next())
{
long id=rs.getLong(1);
// TODO Fill the values in 2nd prepared statement and call executeUpdate().
}
else
{
throw new SQLException("Id was not generated");
}
}
}
}
}
Note: You must try it first; not all db vendors implement the getGeneratedKeys
method. In case yours does not, just replace the generated keys piece by a call to LAST_INSERT_ID
: It should work the same.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论