以 JSON 格式导出一百万条记录的最佳方法是什么?

huangapple go评论86阅读模式
英文:

Best way to export a million records formatted as json?

问题

我需要将大约一百万条记录导出以进行一次性迁移,并且我需要将这些记录导出为JSON格式,并遵循我们通常向另一个团队暴露的相同API合同对象结构。

我需要暴露的对象由两个表中的数据组成,我已经设置了条件以将结果集从数百万限制到一百万。

下面是我拥有的分批逻辑,它首先获取初始数据,然后将它们分批处理成每组1000条,最后添加到一个列表中,我最终要暴露出去。

以下是服务方法:

public List<NewEntity> getNewEntities() {
    int i = 0;
    int BATCH_SIZE = 1000;

    List<Entity> totalEntityList = new ArrayList<>();

    List<Entity> entityList = entityManager.createNamedQuery(Entity.SELECT_NEW_ENTITIES, Entity.class)
        .setFirstResult(i)
        .setMaxResults(BATCH_SIZE).getResultList();

    while (entityList.size() == BATCH_SIZE) {
        i += BATCH_SIZE;
        entityList = entityManager.createNamedQuery(Entity.SELECT_NEW_ENTITIES, Entity.class)
            .setFirstResult(i)
            .setMaxResults(BATCH_SIZE)
            .getResultList();
        totalEntityList.addAll(entityList);
    }

    return totalEntityList.stream()
        .map(entity -> entity.toNewEntity())
        .collect(toList());
}

这个过程花费了2.5小时(每批1000条记录)和3.5小时(每批2000条记录)。第一次运行时,我有一个错误,没有将所有记录保存到列表中,所以我在隔夜再次运行它。我早上检查时,花了更长的时间,我的IDE冻结了,所以我不知道为什么最终请求失败并显示500错误。我强制退出了我的IDE,现在正在尝试再次运行,但我要等2到3个小时才能知道是否成功。这太慢了。

是否有改进的方法?还有其他方法吗?

我正在使用Spring Boot、Java和JPA。

谢谢!

编辑:根据评论的建议进行了更新:

public void getNewEntities() {
    int i = 0;
    int BATCH_SIZE = 1000;

    List<Entity> entityList = entityManager.createNamedQuery(Entity.SELECT_NEW_ENTITIES, Entity.class)
        .setFirstResult(i)
        .setMaxResults(BATCH_SIZE).getResultList();

    try {
        FileWriter file = new FileWriter("new_entities.json");
        while (entityList.size() == BATCH_SIZE) {
            i += BATCH_SIZE;
            entityList = entityManager.createNamedQuery(Entity.SELECT_NEW_ENTITIES, Entity.class)
                .setFirstResult(i)
                .setMaxResults(BATCH_SIZE)
                .getResultList();
            file.write(new Gson().toJson(entityList));
        }
        file.close();
    } catch (IOException ex) {
        ex.printStackTrace();
    }
}

这是否是更好(正确)的方法?我正在让它运行,不确定将1,000,000条记录写入JSON文件需要多长时间,但如果这是正确的方向,请告诉我。目前已经运行了两个小时,只处理了大约450,000条记录!

更新3:添加了Entity以便查看。现在我假设问题可能出在映射到位置的地方,我在最后将其转换为列表。是否应该将其保留为流?当然,我有更多字段,但我删除了那些只是简单列字段没有复杂性的字段。

@Builder
@Getter
@Entity
@AllArgsConstructor
@NoArgsConstructor
@Table(name = "TABLE_NAME", schema = "SCHEMA")
@NamedQuery(name = SELECT_NEW_ENTITIES, query = FIND_NEW_ENTITIES)
public class Entity {

    public static final String SELECT_NEW_ENTITIES = "SELECT_NEW_ENTITIES";

    protected static final String FIND_NEW_ENTITIES = "" +
        "SELECT a FROM Entity a " +
        "WHERE a.code IS NOT NULL " +
        "AND a.someId IS NOT NULL " +
        "ORDER BY a.id ";

    @Id
    @Column(name = "NEW_ENTITY_ID")
    private Long id;

    @Column(name = "SHARED_ID")
    private Long sharedId;

    @OneToMany(cascade = CascadeType.ALL)
    @JoinColumn(name = "NEW_ENTITY_ID")
    @Builder.Default
    private List<Location> locations = new ArrayList<>();

    @Formula(value = "(SELECT eng.SOME_ID from SCHEMA.ANOTHER_TABLE eng WHERE eng.SHARED_ID = SHARED_ID)")
    private String someId;

    @Formula(value = "(SELECT eng.CODE from SCHEMA.ANOTHER_TABLE eng WHERE eng.SHARED_ID = SHARED_ID)")
    private String code;

    public NewEntity toNewEntity() {
        return NewEntity
            .builder()
            .newEntityId(this.getId())
            .code(code)
            .locations(mapLocations())
            .build();
    }

    private List<LocationModel> mapLocations() {
        return this.getLocations()
            .stream()
            .map(location -> LocationModel.builder()
                .id(location.getId())
                .newEntityId(location.getNewEntityId())
                .city(location.getCity())
                .state(location.getState())
                .country(location.getCountry())
                .build())
            .collect(toList());
    }
}
英文:

I have a request to export roughly a million records to do a one time migration and I need those records exported in JSON format and adhering to the same api contract object structure we normally expose to this other team.

The object I need to expose is composed of data from two tables and I have a conditional in place to limit the resultset from millions to 1 million.

The batching logic I have below does the initial fetch then batches them in groups of 1000 and add them to a list that I want to expose eventually.

Service method below:

public List&lt;NewEntity&gt; getNewEntities() {
int i=0;
int BATCH_SIZE = 1000; 
List&lt;Entity&gt; totalEntityList = new ArrayList&lt;&gt;();
List&lt;Entity&gt; entityList = entityManager.createNamedQuery(Entity.SELECT_NEW_ENTITIES, Entity.class)
.setFirstResult(i)
.setMaxResults(BATCH_SIZE).getResultList();
while(entityList.size() == BATCH_SIZE) {
i+=BATCH_SIZE;
entityList = entityManager.createNamedQuery(Entity.SELECT_NEW_ENTITIES, Entity.class)
.setFirstResult(i)
.setMaxResults(BATCH_SIZE)
.getResultList();
totalEntityList.addAll(entityList);
}
return totalEntityList.stream()
.map(entity -&gt; entity.toNewEntity())
.collect(toList());
}

This process was taking 2.5 (with batch of 1000) and 3.5 hours (with batch of 2000). The first time I ran it, I had a bug where I didn't save all of them to the list, so I ran it again overnight. I checked in the morning, it took longer and my IDE froze so I don't know why the request eventually failed with 500 error. I force quit my IDE and trying again now but I won't know if it works until after 2 to 3 hours. This is very slow.

Is there a way to improve how I'm doing this? Any other approaches here?

I'm using Spring Boot, Java, JPA.

Thanks!

EDIT: UPDATE 2 after adjusting per recommendations from the comments:

public void getNewEntities() {
int i=0;
int BATCH_SIZE = 1000; 
List&lt;Entity&gt; entityList = entityManager.createNamedQuery(Entity.SELECT_NEW_ENTITIES, Entity.class)
.setFirstResult(i)
.setMaxResults(BATCH_SIZE).getResultList();
try{
FileWriter file = new FileWriter(&quot;new_entities.json&quot;);
while(entityList.size() == BATCH_SIZE) {
i+=BATCH_SIZE;
entityList = entityManager.createNamedQuery(Entity.SELECT_NEW_ENTITIES, Entity.class)
.setFirstResult(i)
.setMaxResults(BATCH_SIZE)
.getResultList();
file.write(new Gson().toJson(entityList));
}
file.close();
}
catch(IOException ex) {
ex.printStackTrace(); 
};
}

Is this a better(correct?) approach? I'm letting it run right now and not sure how long it would take to write 1,000,000 records to JSON file but let me know if this on the right track? Currently, it's been two hours and it's only about 450,000 records through!

UPDATE 3 adding Entity for visibility. I'm assuming now it could be an issue with the mapping to locations where I'm converting it to list at the end. Should I just keep it as a stream instead? Also I obviously have a lot more fields but I removed the ones where it was just a simple column field with no complexity to it.

@Builder
@Getter
@Entity
@AllArgsConstructor
@NoArgsConstructor
@Table(name = &quot;TABLE_NAME&quot;, schema = &quot;SCHEMA&quot;)
@NamedQuery(name = SELECT_NEW_ENTITIES, query = FIND_NEW_ENTITIES)
public class Entity { 
public static final String SELECT_NEW_ENTITIES = &quot;SELECT_NEW_ENTITIES&quot;;
protected static final String FIND_NEW_ENTITIES = &quot;&quot; + 
&quot;SELECT a FROM Entity a &quot; + 
&quot;WHERE a.code IS NOT NULL &quot; + 
&quot;AND a.someId IS NOT NULL &quot; +
&quot;ORDER BY a.id &quot;;
@Id 
@Column(name = &quot;NEW_ENTITY_ID&quot;) 
private Long id;
@Column(name = &quot;SHARED_ID&quot;) 
private Long sharedId;
@OneToMany(cascade = CascadeType.ALL) 
@JoinColumn(name = &quot;NEW_ENTITY_ID&quot;)
@Builder.Default 
private List&lt;Location&gt; locations = new ArrayList&lt;&gt;();
@Formula(value = &quot;(SELECT eng.SOME_ID from SCHEMA.ANOTHER_TABLE eng WHERE eng.SHARED_ID = SHARED_ID)&quot;) 
private String someId;
@Formula(value = &quot;(SELECT eng.CODE from SCHEMA.ANOTHER_TABLE eng WHERE eng.SHARED_ID = SHARED_ID)&quot;) 
private String code;
public NewEntity toNewEntity() { 
return NewEntity
.builder() 
.newEntityId(this.getId())
.code(code) 
.locations(mapLocations()) 
.build(); 
}
private List&lt;LocationModel&gt; mapLocations() { 
return this.getLocations()
.stream() 
.map(location -&gt; LocationModel.builder() 
.id(location.getId()) 
.newEntityId(location.getNewEntityId()) 
.city(location.getCity()) 
.state(location.getState()) 
.country(location.getCountry())
.build()) 
.collect(toList()); 
}
}

答案1

得分: 4

以下是您要翻译的内容:

"The way I see this is, you are fetching a million records as an entity. After the query is executed, the resultset will be mapped into the Entity class. In your case, 2 @Formula are being executed for each record and also a list of locations being populated too. I suggest you fetching them directly via native queries.

I am writing a pseudo code,

@Transactional(readOnly=true)
public void writeMyEntitiesToJsonFile() {
        Gson gson = new Gson();
        try {

            FileWriter file = new FileWriter(&quot;new_entities.json&quot;);

            //this query will fetch all the records from your table name
            String rawQuery = &quot;SELECT \n&quot;
                    + &quot;t.NEW_ENTITY_ID AS id,\n&quot;
                    + &quot;t.SHARED_ID AS sharedId,\n&quot;
                    + &quot;a.SOME_ID AS someId,\n&quot;
                    + &quot;a.CODE AS code\n&quot;
                    + &quot;FROM TABLE_NAME t\n&quot;
                    + &quot;JOIN ANOTHER_TABLE a \n&quot;
                    + &quot;ON t.SHARED_ID=a.SHARED_ID&quot;;

            List&lt;Object[]&gt; newEntityRawList = em.createNativeQuery(rawQuery)
                    .getResultList();

            //mapping all those Object[] into a DTO&gt; EntityDTO
            List&lt;EntityDTO&gt; newEntityDTOList = newEntityRawList
                    .stream()
                    .map(obj -&gt; new EntityDTO((Number) obj[0], (Number) obj[1], (String) obj[2], (String) obj[3]))
                    .collect(Collectors.toList());

            //this query will fetch all the location from your table
            String locationQuery = &quot;SELECT \n&quot;
                    + &quot;l.newEntityId,\n&quot;
                    + &quot;l.id,\n&quot;
                    + &quot;l.city,\n&quot;
                    + &quot;l.state,\n&quot;
                    + &quot;l.country \n&quot;
                    + &quot;FROM location l&quot;;
            List&lt;Object[]&gt; locationRawList = em.createNativeQuery(locationQuery)
                    .getResultList();

            //mapping all those Object[] into a DTO&gt; LocationDTO
            List&lt;LocationDTO&gt; locationDTOList = locationRawList
                    .stream()
                    .map(ob -&gt; new LocationDTO((Number) ob[0], (Number) ob[1], (String) ob[2], (String) ob[3], (String) ob[4]))
                    .collect(Collectors.toList());

            //Using lambda to get a map with key&gt; new entity id. and values&gt; List of Location 
            Map&lt;Long, List&lt;LocationDTO&gt;&gt; mapOfNewEntityIdAndLocationList = locationDTOList
                    .stream()
                    .collect(Collectors.groupingBy(LocationDTO::getNewEntityId));

            //setting The list of locations to EntityDTO
            for (EntityDTO ne : newEntityDTOList) {
                ne.setLocations(mapOfNewEntityIdAndLocationList.get(ne.getId()));
            }
            
            //writing data to file
            file.write(gson.toJson(newEntityDTOList));
        } catch (IOException io) {
            io.printStackTrace();
        }
    }

    @Getter
    public class EntityDTO {

        private final long id;
        private final long sharedId;
        private final String someId;
        private final String code;

        @Setter
        private List&lt;LocationDTO&gt; locations;

        public EntityDTO(Number id,
                Number sharedId,
                String someId,
                String code) {
            this.id = id.longValue();
            this.sharedId = sharedId.longValue();
            this.someId = someId;
            this.code = code;
        }
    }

    @Getter
    public class LocationDTO {

        private final long id;
        private final long newEntityId;
        private final String city;
        private final String state;
        private final String country;

        public LocationDTO(Number id,
                Number newEntityId,
                String city,
                String state,
                String country) {
            this.id = id.longValue();
            this.newEntityId = newEntityId.longValue();
            this.city = city;
            this.state = state;
            this.country = country;
        }

    }
    

Instead of using the constructors, you can also use SqlResultSetMapping.

Of course, this will still be an expensive operation and you have figured out the batching yourself from the other answers in the answer section.

This approach of solution should eliminate the additional expensive queries.

I did not run the process I don't have any data to work with.

P.S. Please do try the batching too. The maxResult for the first native query is easy, fetching the corresponding data from the second query will be a little tricky as you can you will need to set the WHERE NEW_ENTITY_ID BETWEEN MIN AND MAX ids obtained from the first result(should not be very hard as you know how the data are sitting on your database tables).

Good luck!

英文:

The way I see this is, you are fetching a million records as an entity.
After the query is executed, the resultset will be mapped into the Entity class. In your case, 2 @Formula are being executed for each record and also a list of locations being populated too. I suggest you fetching them directly via native queries.

I am writing a pseudo code,

@Transactional(readOnly=true)
public void writeMyEntitiesToJsonFile() {
        Gson gson = new Gson();
        try {

            FileWriter file = new FileWriter(&quot;new_entities.json&quot;);

            //this query will fetch all the records from your table name
            String rawQuery = &quot;SELECT \n&quot;
                    + &quot;t.NEW_ENTITY_ID AS id,\n&quot;
                    + &quot;t.SHARED_ID AS sharedId,\n&quot;
                    + &quot;a.SOME_ID AS someId,\n&quot;
                    + &quot;a.CODE AS code\n&quot;
                    + &quot;FROM TABLE_NAME t\n&quot;
                    + &quot;JOIN ANOTHER_TABLE a \n&quot;
                    + &quot;ON t.SHARED_ID=a.SHARED_ID&quot;;

            List&lt;Object[]&gt; newEntityRawList = em.createNativeQuery(rawQuery)
                    .getResultList();

            //mapping all those Object[] into a DTO&gt; EntityDTO
            List&lt;EntityDTO&gt; newEntityDTOList = newEntityRawList
                    .stream()
                    .map(obj -&gt; new EntityDTO((Number) obj[0], (Number) obj[1], (String) obj[2], (String) obj[3]))
                    .collect(Collectors.toList());

            //this query will fetch all the location from your table
            String locationQuery = &quot;SELECT \n&quot;
                    + &quot;l.newEntityId,\n&quot;
                    + &quot;l.id,\n&quot;
                    + &quot;l.city,\n&quot;
                    + &quot;l.state,\n&quot;
                    + &quot;l.country \n&quot;
                    + &quot;FROM location l&quot;;
            List&lt;Object[]&gt; locationRawList = em.createNativeQuery(locationQuery)
                    .getResultList();

            //mapping all those Object[] into a DTO&gt; LocationDTO
            List&lt;LocationDTO&gt; locationDTOList = locationRawList
                    .stream()
                    .map(ob -&gt; new LocationDTO((Number) ob[0], (Number) ob[1], (String) ob[2], (String) ob[3], (String) ob[4]))
                    .collect(Collectors.toList());

            //Using lambda to get a map with key&gt; new entity id. and values&gt; List of Location 
            Map&lt;Long, List&lt;LocationDTO&gt;&gt; mapOfNewEntityIdAndLocationList = locationDTOList
                    .stream()
                    .collect(Collectors.groupingBy(LocationDTO::getNewEntityId));

            //setting The list of locations to EntityDTO
            for (EntityDTO ne : newEntityDTOList) {
                ne.setLocations(mapOfNewEntityIdAndLocationList.get(ne.getId()));
            }
            
            //writing data to file
            file.write(gson.toJson(newEntityDTOList));
        } catch (IOException io) {
            io.printStackTrace();
        }
    }

    @Getter
    public class EntityDTO {

        private final long id;
        private final long sharedId;
        private final String someId;
        private final String code;

        @Setter
        private List&lt;LocationDTO&gt; locations;

        public EntityDTO(Number id,
                Number sharedId,
                String someId,
                String code) {
            this.id = id.longValue();
            this.sharedId = sharedId.longValue();
            this.someId = someId;
            this.code = code;
        }
    }

    @Getter
    public class LocationDTO {

        private final long id;
        private final long newEntityId;
        private final String city;
        private final String state;
        private final String country;

        public LocationDTO(Number id,
                Number newEntityId,
                String city,
                String state,
                String country) {
            this.id = id.longValue();
            this.newEntityId = newEntityId.longValue();
            this.city = city;
            this.state = state;
            this.country = country;
        }

    }
    

Instead of using the constructors, you can also use SqlResultSetMapping.

Of course, this will still be an expensive operation and you have figured out the batching yourself from the other answers in the answer section.

This approach of solution should eliminate the additional expensive queries.

I did not run the process I don't have any data to work with.

P.S. Please do try the batching too. The maxResult for the first native query is easy, fetching the corresponding data from the second query will be a little tricky as you can you will need to set the WHERE NEW_ENTITY_ID BETWEEN MIN AND MAX ids obtained from the first result(should not be very hard as you know how the data are sitting on your database tables).

Good luck!

答案2

得分: 1

你可以并行化处理批处理操作。一旦你需要导出1,000,000条记录,你可以创建4个线程,每个线程从自己的偏移开始数据提取,然后将数据写入自己的临时文件。

批处理结束后,你可以将这四个临时文件合并成一个结果文件。

如果其中一个线程由于某种原因失败,你不需要从头开始整个过程,而是可以使用存储在临时文件中的结果。

此外,在你的代码中,你为每个大小为1000的块创建了一个新的昂贵的 Gson 对象,这样一共创建了1000个对象。相反,可以在字段中缓存一个对象,或者通过 IoC 容器注入一个对象。

建议你修改你的代码如下:

public class BatchExtractor {
  private static final int BATCH_SIZE = 1000;

  private EntityManager entityManager;
  private Gson gson = new Gson();

  @SneakyThrows
  public void getNewEntities() {
    int limit = 250_000;
    int parallelism = 4;
    try (ExecutorService executorService = Executors.newFixedThreadPool(parallelism)) {
      List<Future<File>> futures = new ArrayList<>();
      for (int i = 0; i < parallelism; i++) {
        int offset = i * limit;
        Future<File> future = executorService.submit(() -> extract(offset, limit));
        futures.add(future);
      }
      File resultingFile = new File("new_entities.json");
      resultingFile.createNewFile();
      try (FileOutputStream fos = new FileOutputStream(resultingFile)) {
        futures.stream().map(this::getFile).forEach(tempFile -> transfer(tempFile, fos));
        fos.flush();
      }
    }
  }

  @SneakyThrows
  private File extract(int offset, int limit) {
    String fileName = "new_entities_temp_" + offset + ".json";
    File file = new File(fileName);
    file.createNewFile();
    int extractedCount = 0;
    try (FileWriter tempFile = new FileWriter(file)) {
      while (extractedCount < limit) {
        offset += BATCH_SIZE;
        List<Entity> entityList = entityManager.createNamedQuery(Entity.SELECT_NEW_ENTITIES, Entity.class)
                .setFirstResult(offset)
                .setMaxResults(BATCH_SIZE)
                .getResultList();
        tempFile.write(gson.toJson(entityList));
        extractedCount += entityList.size();
      }
    }
    return file;
  }

  @SneakyThrows
  private long transfer(File tempFile, FileOutputStream fos) {
    return Files.copy(tempFile.toPath(), fos);
  }

  @SneakyThrows
  private File getFile(Future<File> fileFuture) {
    return fileFuture.get();
  }
}

更新 你还可以将这种方法与另一个回答中建议的流处理相结合使用。

英文:

You could parallelize your batching. As soon as you need to export 1,000,000 records, you could create e.g. 4 threads each starting data extraction from it's own offset and writing the data into its own temporary file.

After batching is over you merge four temp files into the resulting one.

If one of threads fails for some reason you don't need to start the whole process from the very beginning but the use results stored in temp files.

Also in your code you create a new expensive Gson object for each chunk of size 1000, making 1000 objects for the whole batch. Instead one object cached in the field or injected by IoC container can be used.

I suggest you to modify your code as:

public class BatchExtractor {
  private static final int BATCH_SIZE = 1000;

  private EntityManager entityManager;
  private Gson gson = new Gson();

  @SneakyThrows
  public void getNewEntities() {
    int limit = 250_000;
    int parallelism = 4;
    try (ExecutorService executorService = Executors.newFixedThreadPool(parallelism)) {
      List&lt;Future&lt;File&gt;&gt; futures = new ArrayList&lt;&gt;();
      for (int i = 0; i &lt; parallelism; i++) {
        int offset = i * limit;
        Future&lt;File&gt; future = executorService.submit(() -&gt; extract(offset, limit));
        futures.add(future);
      }
      File resultingFile = new File(&quot;new_entities.json&quot;);
      resultingFile.createNewFile();
      try (FileOutputStream fos = new FileOutputStream(resultingFile)) {
        futures.stream().map(this::getFile).forEach(tempFile -&gt; transfer(tempFile, fos));
        fos.flush();
      }
    }
  }

  @SneakyThrows
  private File extract(int offset, int limit) {
    String fileName = &quot;new_entities_temp_&quot; + offset + &quot;.json&quot;;
    File file = new File(fileName);
    file.createNewFile();
    int extractedCount = 0;
    try (FileWriter tempFile = new FileWriter(file)) {
      while (extractedCount &lt; limit) {
        offset += BATCH_SIZE;
        List&lt;Entity&gt; entityList = entityManager.createNamedQuery(Entity.SELECT_NEW_ENTITIES, Entity.class)
                .setFirstResult(offset)
                .setMaxResults(BATCH_SIZE)
                .getResultList();
        tempFile.write(gson.toJson(entityList));
        extractedCount += entityList.size();
      }
    }
    return file;
  }

  @SneakyThrows
  private long transfer(File tempFile, FileOutputStream fos) {
    return Files.copy(tempFile.toPath(), fos);
  }

  @SneakyThrows
  private File getFile(Future&lt;File&gt; fileFuture) {
    return fileFuture.get();
  }
}

UPD You could also combine this approach with the streaming suggested in another answer.

答案3

得分: 1

使用流而不是批处理,并分离对象,以避免将不需要的对象填充到内存中。

创建 Gson 一次,并重复使用,因为它是一个昂贵的对象。

在下面的代码中,我做了一些假设:

  • Spring Boot 3.x
  • Java 17 或更高版本
@Transactional(readOnly=true)
public void getNewEntities() {
    var gson = new Gson();

    try (var writer = Files.newBufferedWriter(Path.get("new_entities.json"))) {
        var entities = entityManager.createNamedQuery(Entity.SELECT_NEW_ENTITIES, Entity.class)
            .setHint(QueryHints.HINT_FETCH_SIZE, "1000")
            .setHint(QueryHints.READ_ONLY, "true")
            .getResultStream();

        entities.forEach((entity) -> writeToFile(gson, writer, entity));      
    }
    catch(IOException ex) {
        throw new IllegalStateException(ex);
    };
}

private void writeToFile(Gson gson, Writer writer, Entity entity) {
    try {
        writer.write(gson.toJson(entity.toNewEntity()));
    } catch (Exception ex) {
        throw new IllegalStateException(ex);
    }
    entityManager.detach(entity);
}
英文:

Use a stream instead of batching and detach the objects as not to polute the memory with objects you don't need.

Create Gson once and re-use as it is an expensive object.

In the code below I make a couple of assumptions:

  • Spring Boot 3.x
  • Java 17 or higher
@Transactional(readOnly=true)
public void getNewEntities() {
    var gson = new Gson();

    try (var writer = Files.newBufferedWriter(Path.get(&quot;new_entities.json&quot;))) {
      var entities = entityManager.createNamedQuery(Entity.SELECT_NEW_ENTITIES, Entity.class)
      .setHint(QueryHints.HINT_FETCH_SIZE, &quot;1000&quot;)
      .setHint(QueryHints.READ_ONLY, &quot;true&quot;)
      .getResultStream();

       entities.forEach( (entity) -&gt; writeToFile(gson, writer, entity);      
    }
    catch(IOException ex) {
        throw new IllegalStateException(ex);
    };
}

private void writeToFile(Gson gson, Writer writer, Entity entity) {
  try {
    writer.write(gson.toJson(entity.toNewEntity()));
  } catch (Exception ex) {
    throw new IllegalStateException(ex);
  }
  entityManager.detach(entity);
}

答案4

得分: 1

如果你遇到实体缓存膨胀并且处理速度逐渐减慢的情况,你可以考虑使用EntityManager.clear()方法。
(创建大量实体是导致系统性能下降的最佳方法之一。使用循环,在每几百个实体之后调用clear方法,以摆脱不再需要的实体。)
参见此处

英文:

In case you suffer from Entity cache blowing up and processing speed decreasing over time, you might consider to use EntityManager.clear() method.
(Creating lots of Entities is one of the best ways to bring down your system. Use a loop and call the clear method after every couple of some hundred entities to get rid of the Entities you do not need anymore.)
See also here

huangapple
  • 本文由 发表于 2023年6月19日 20:37:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76506703.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定