什么是使用Java向另一个PostgreSQL服务器插入超过一百万行数据的有效方法?

huangapple go评论96阅读模式
英文:

What's the effective way to insert more than a million rows into postgresql server from another postgres server using Java?

问题

我有两个PostgreSQL服务器,我需要复制来自第一个服务器格式的表行并转换为另一个服务器格式(具有不同的列名称)。

我使用带有Spring Boot和JPA存储库的Java应用程序,实现了具有流读取获取大小为1000的findAll方法。

@Query("select c from ExternalFormatEntity c")
@QueryHints(@javax.persistence.QueryHint(name = "org.hibernate.fetchSize",
  value = Constants.DEFAULT_FETCH_SIZE))
Stream<ExternalFormatEntity> findAllEntities();

阅读后,我会将1000行进行转换并批量插入。

try (Stream<ExternalFormatEntity> allExtEntitiesStream = extFormatService.getAllEntities()) {
  LinkedList<CanonicalFormatEntity> canonicalEntityList = new LinkedList<>();
  allExtEntitiesStream.forEach(extEntity -> {
    if (Objects.nonNull(extEntity)) {
      canonicalEntityList.add(SomeConverter.convert(extEntity));
    }
    if (canonicalEntityList.size() >= DEFAULT_BATCH_SIZE) {
      List<CanonicalFormatEntity> copyList = new LinkedList<>(canonicalEntityList);
      canonicalEntityList.clear();
      Thread thread = new Thread(() -> {
        canonicalEntityRepository.saveAll(copyList);
        canonicalEntityRepository.flush();
        copyList.clear();
      });
      thread.start();
    }
  });
}

根据我的观点,当前操作的速度可以在1小时内处理100万条记录。我是否可以加速此操作?如果可以,该如何做?

首先,我尝试将第一个数据库中的表记录转换为CSV文件,将其保存在另一台服务器上,并使用Postgres Copy API进行下载,但由于与硬盘的附加操作,总时间仍然不可接受。

也许PostgreSQL具有流式写入或其他功能?我在官方PostgreSQL文档中找不到答案。

英文:

I have two postgresql servers and I need to copy table rows with from first server format and convert to another server format (different column names).

I use java application with spring boot and jpa repository, which implemented method findAll with stream read fetch size 1000.

@Query(&quot;select c from ExternalFormatEntity c&quot;)
@QueryHints(@javax.persistence.QueryHint(name = &quot;org.hibernate.fetchSize&quot;,
  value = Constants.DEFAULT_FETCH_SIZE))
Stream&lt;ExternalFormatEntity&gt; findAllEntities();

After reading I convert and insert 1000 rows in batch.

try (Stream&lt;ExternalFormatEntity&gt; allExtEntitiesStream = extFormatService.getAllEntities()) {
  LinkedList&lt;CanonicalFormatEntity&gt; canonicalEntityList = new LinkedList&lt;&gt;();
  allExtEntitiesStream.forEach(extEntity -&gt; {
    if (Objects.nonNull(extEntity)) {
      canonicalEntityList.add(SomeConverter.convert(extEntity));
    }
    if (canonicalEntityList.size() &gt;= DEFAULT_BATCH_SIZE) {
      List&lt;CanonicalFormatEntity&gt; copyList = new LinkedList&lt;&gt;(canonicalEntityList);
      canonicalEntityList.clear();
      Thread thread = new Thread(() -&gt; {
        canonicalEntityRepository.saveAll(copyList);
        canonicalEntityRepository.flush();
        copyList.clear();
      });
      thread.start();
    }
  });
}

For my opinion, current speed of this operation can be faster than 1 hour for 1 million records. Can I speed up this operation, if yes, how to do it ?

Foremost, I tried to convert table records from first database to CSV file, save it on another server and use Postgres Copy Api for downloading but the summary time is still unacceptable due to additional operations with the hard disk.

Maybe postgres have stream writing or something else? I cant find answer in official postgresql docs.

答案1

得分: 0

以下是翻译好的内容:

对于我的情况,下面的解决方案起到了帮助作用:

  1. 将外部表导出为带有压缩的 CSV 文件
    (示例来自 StackOverflow 答案:https://stackoverflow.com/a/3981807/3744622)

  2. 将小型压缩文件复制到位于 /tmp 文件夹中的 Postgres 服务器
    scp root@ext_server:/path/to/file root@target_server:/tmp/

  3. 从压缩的 CSV 文件中导入表格
    (示例来自 StackOverflow 答案:https://stackoverflow.com/a/46228247/3744622)

我成功地将时间缩短到约10分钟。

非常感谢大家,这真是个很棒的地方)

英文:

For my case helped next solution:

  1. export external table to csv file with zip compression
    (example from StackOverflow answer: https://stackoverflow.com/a/3981807/3744622)

  2. copy small zip file to postgres server in /tmp folder
    scp root@ext_server:/path/to/file root@target_server:/tmp/

  3. import table from csv zipped file
    (example from StackOverflow answer: https://stackoverflow.com/a/46228247/3744622)

I achieved summary time about 10 minutes.

Thank you all, this is wonderful place)

huangapple
  • 本文由 发表于 2020年8月21日 16:18:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/63519129.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定