如何从QuestDB将大型结果集流式传输到Java应用程序?

huangapple go评论63阅读模式
英文:

How to stream large results sets from QuestDB to java application?

问题

我在Java中使用PG Driver遇到了问题。通常我需要从数据库中流式传输大约五千到六千万行到我的Java应用程序,然而(内部),PG Driver会创建大量垃圾,所有的垃圾收集会拖慢我的应用程序。这些垃圾来自于ResultSet.getDouble()或ResultSet.getTimestamp()。

是否有任何替代方法,可以更友好地处理垃圾收集,并将数据从QuestDB流式传输到Java应用程序?

英文:

I have an issue with PG Driver in Java. Usually I need to stream around 50-60 million rows from db to my java application, however (internally) the PG Driver creates a lot of garbage and all the Garbage Collections slow down my application. The garbage comes from ResultSet.getDouble() or ResultSet.getTimestamp() .

Is there any alternative and much more GC friendly way to stream data from QuestDB to a Java application ?

答案1

得分: 1

以下是已翻译的内容:

要获得大型数据集传输的更好的PG驱动程序性能,请设置以下连接属性:
 - 将 binaryTransfer 设置为 'true'
 - 将 preferQueryMode 设置为 'prepared' 或 'extendedForPrepared'
 - 将 prepareThreshold 设置为 '-1'(在第一次运行时使用服务器端游标)

以触发二进制传输(从而减少临时对象的数量),以及:

 - 在连接上禁用自动提交
 - 将查询执行为准备语句
 - 在执行 executeQuery() 调用之前在语句上设置 fetch size

以限制由PG驱动程序执行的结果集缓存。

在二进制模式下,ResultSet.getTimestamp() 仍然为每行分配一个对象,因此一种解决方法是在查询中将其转换为长整型,并在客户端自行转换为时间戳(java.sql.Timestamp.setTime() 和 java.sql.Timestamp.setNanos())。
使用上述设置,传输 6000 万行的 3 列大约需要 10 秒。使用默认设置需要超过 5 倍的时间。

示例:

    Properties properties = new Properties();
    properties.setProperty("user", "admin");
    properties.setProperty("password", "quest");
    properties.setProperty("binaryTransfer", "true");
    properties.setProperty("preferQueryMode", "extended");
    properties.setProperty("prepareThreshold", "-1");
    
    try (final Connection connection = DriverManager.getConnection("jdbc:postgresql://127.0.0.1:8812/qdb", properties)) {
        connection.setAutoCommit(false);
    
        try (Statement create = connection.createStatement()) {
            create.execute("create table x as (select rnd_double() d, cast(x as timestamp) ts, x l from long_sequence(60000000))");
            connection.commit();
        }
    
        long start = System.currentTimeMillis();
        try (PreparedStatement select = connection.prepareStatement("select d, cast(ts as long), l from x")) {
            select.setFetchSize(10000);
    
            double sumD = 0;
            long sumTs = 0;
            long sumL = 0;
    
            try (ResultSet resultSet = select.executeQuery()) {
                while (resultSet.next()) {
                    sumD += resultSet.getDouble(1);
                    sumTs += resultSet.getLong(2);
                    sumL += resultSet.getLong(3);
                }
            }
            System.out.println(sumD + ", " + sumTs + ", " + sumL);
        }
        long end = System.currentTimeMillis();
        System.out.println(end - start);
    }
英文:

You'll get better PG driver performance for large dataset transfer by setting the following connection properties:

  • binaryTransfer to 'true'
  • preferQueryMode to 'prepared' or 'extendedForPrepared'
  • prepareThreshold to '-1' (use server side cursor on first run)

to trigger binary transfer (thus reduce the number of temporary objects), and:

  • disable autocommit on connection
  • execute query as prepared statement
  • set fetch size on the statement prior to executeQuery() call

to limit result set caching done by PG driver.

In binary mode, ResultSet.getTimestamp() still allocates an object for each row, so a workaround is to cast it to long in query and transform to timestamp (java.sql.Timestamp.setTime() & java.sql.Timestamp.setNanos()) on client side yourself.
With setup described above transfer of 60 mil rows of 3 columns takes about 10s. With default settings it's more than 5 times longer.

Example:

Properties properties = new Properties();
properties.setProperty("user", "admin");
properties.setProperty("password", "quest");
properties.setProperty("binaryTransfer", "true");
properties.setProperty("preferQueryMode", "extended");
properties.setProperty("prepareThreshold", "-1");

try (final Connection connection = DriverManager.getConnection("jdbc:postgresql://127.0.0.1:8812/qdb", properties)) {
    connection.setAutoCommit(false);

    try (Statement create = connection.createStatement()) {
        create.execute(("create table x as (select rnd_double() d, cast(x as timestamp) ts, x l from long_sequence(60000000))"));
        connection.commit();
    }

    long start = System.currentTimeMillis();
    try (PreparedStatement select = connection.prepareStatement("select d, cast(ts as long), l from x")) {
        select.setFetchSize(10000);

        double sumD = 0;
        long sumTs = 0;
        long sumL = 0;

        try (ResultSet resultSet = select.executeQuery()) {
            while (resultSet.next()) {
                sumD += resultSet.getDouble(1);
                sumTs += resultSet.getLong(2);
                sumL += resultSet.getLong(3);
            }
        }
        System.out.println(sumD + ", " + sumTs + ", " + sumL);
    }
    long end = System.currentTimeMillis();
    System.out.println(end - start);
}

huangapple
  • 本文由 发表于 2023年7月3日 16:37:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76603123.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定