Apache Camel SQL组件以批处理模式选择所有记录。

huangapple go评论78阅读模式
英文:

Apache Camel SQL component select all records in a batch mode

问题

我正在使用Apache Camel作为ETL工具,从PostgreSQL中的select *...查询数据,然后插入到MariaDB中的insert...操作中。

在PostgreSQL中有很多记录(超过1百万条),我想以批处理的方式进行操作。

我尝试了几个标志(batchCountbatchSize),但都没有起作用。

我还在Apache Camel文档中进行了搜索,但没有找到成功的解决方法。

from("sql:SELECT * FROM my_schema.trees?dataSource=#postgersqlDataSource&batch=true")
    .convertBodyTo(String.class)
    .process(ex -> {
        log.info("batch insert for single table");
        List<Map<String, Object>> rows = ex.getIn().getBody(List.class);
        log.info(String.format("Value: %s", rows.size()));
    })
    .to("stream:out");

但程序崩溃了,因为它将所有数据加载到内存中(当只有1条记录时,它可以正常工作)。

有什么建议吗?

该程序是在Spring Boot中运行的。

英文:

I'm using apache camel as an ETL from (select *...) PostgreSQL to (insert...) MariaDB .

In the PostgreSQL there are a lot of records (more then 1M) and I want to do it in a batch way.

I've tried with several flag (batchCount, batchSize) but non of them worked.

I've also search in Apache Camel docs, without any success.

from(&quot;sql:SELECT * FROM my_schema.trees?dataSource=#postgersqlDataSource&amp;batch=true&quot;)
            .convertBodyTo(String.class)
            .process(ex -&gt; {
                log.info(&quot;batch insert for single table&quot;);
                List&lt;Map&lt;String, Object&gt;&gt; rows = ex.getIn().getBody(List.class);
                log.info(String.format(&quot;Value: %s&quot;, rows.size()));
            })
            .to(&quot;stream:out&quot;);

But the program crashed because it load everything to the memory (with 1 records it worked of course).

Any advise?

it runs overs Spring boot.

答案1

得分: 1

batch选项仅适用于生产者(例如to)。
https://camel.apache.org/components/3.20.x/sql-component.html

相反,可以查看outputType=StreamList,在此选项中,您可以将其与分割EIP(在流模式下)结合使用,以处理行而无需将所有行加载到内存中。

这也意味着您一次处理1行

from sql
  split
    process (这里处理1行)
英文:

The batch option is only for producer (eg to).
https://camel.apache.org/components/3.20.x/sql-component.html

Instead take a look at outputType=StreamList where you can combine this with split EIP (in streaming mode) to process the rows without loading all into memory.

This also mean you process 1 row at a time

from sql
  split
    process (1 row here)

huangapple
  • 本文由 发表于 2023年1月8日 22:13:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75048429.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定