清除不同线程中缓冲区的正确方法。

huangapple go评论74阅读模式
英文:

Proper way to clear a buffer in a different thread

问题

以下是您提供的内容的中文翻译:

我面临着这个问题。
我有一个处理器类 Processor,它从客户端应用程序接收数据,进行一系列的计算,然后将结果插入到数据库中。
我将插入数据库的操作拆分到了另一个类 DataLayer 中。
为了避免阻塞线程,我让 DataLayer 类不会立即将结果插入数据库,而是将结果添加到一个名为 List<Result> buffer 的列表中。
每个小时,DataLayer 会将缓冲区中的结果插入到数据库并清空缓冲区。
现在我确信在清空缓冲区时可能会遇到竞态条件。
在处理这个问题时,应该采取什么正确的方法以避免任何并发问题?

代码片段:

public class Processor{
  
  private DataLayer dataLayer = new DataLayer();

  void accept(List<Data> data){
     // 从客户端应用程序接收数据
     List<Result> results = calculateResults(data);
     saveResults(results)
  }

  List<Result> calculateResults(List<Data> data){

  void saveResults(List<Result> results){
     dataLayer.insertToDB(results);
  }

 }


public class DataLayer {

   private ThreadPoolTaskScheduler taskScheduler;

   private List<Result> buffer = new ArrayList<>();

   public DataLayer(){
      scheduleHourlyCheck();
   }

   void insertToDB(List<Result> results){
      this.buffer.add(results)
   }

   scheduleHourlyCheck(){
     taskScheduler.schedule(() -> {
       jdbcTemplate.update(...)
       buffer.clear()   // -------> 我确信这样做是不对的
     }, everyHour)
   }
}

我的问题是:
buffer.clear()

我确信这可能会导致并发问题。
处理这个问题的正确方式是什么?

英文:

I am faced with this problem.<br>
I have a processor class Processor that receives data from a client app, does a bunch of calculations and insert the result into a database.<br>
I split the insert to DB to a different class DataLayer<br>
To avoid blocking the thread, I made the DataLayer class doesn't insert into the database right away but instead add the results to a list List&lt;Result&gt; buffer.<br>
Every hour, the DataLayer would take the buffer insert to the database and clear the buffer<br>
Now I am sure I could run into a race condition when clearing the buffer.<br>
What is the right way to do this to avoid any concurrency issues

code snippet:

public class Processor{
  
  private DataLayer dataLayer = new DataLayer();

  void accept(List&lt;Data&gt; data){
     //receive data from client app
     List&lt;Result&gt; results = calculateResults(data);
     saveResults(results)
  }

  List&lt;Result&gt; calculateResults(List&lt;Data&gt; data){

  void saveResults(List&lt;Result&gt; results){
     dataLayer.insertToDB(results);
  }

 }


public class DataLayer {

   private ThreadPoolTaskScheduler taskScheduler;

   private List&lt;Result&gt; buffer = new ArrayList&lt;&gt;();

   public DataLayer(){
      scheduleHourlyCheck();
   }

   void insertToDB(List&lt;Result&gt; results){
      this.buffer.add(results)
   }

   scheduleHourlyCheck(){
     taskScheduler.schedule(() -&gt; {
       jdbcTemplate.update(...)
       buffer.clear()   // ------&gt; i am sure this shouldn&#39;t be done
     }, everyHour)
   }
}

My issue is this:
buffer.clear()

I am sure this could cause concurrency issues<br>
what is the proper way to handle this?

答案1

得分: 1

你需要做的最小更改是使用 Collections.synchronizedList 来包装 ArrayList

private List<String> buffer = Collections.synchronizedList(new ArrayList<>());

即使你不定期调用 clear,这也是必需的。

这样可以避免两个线程同时访问列表,无论它们是在添加项目还是一个在添加项目,另一个在清除列表。

如果在 jdbcTemplate.update(...) 中对列表进行迭代 - 仍然需要在外部进行同步:

synchronized (buffer) {
    jdbcTemplate.update(...);
}
英文:

The smallest change you need to make is wrap ArrayList with Collections.synchronizedList:

private List&lt;String&gt; buffer = Collections.synchronizedList(new ArrayList&lt;&gt;());

This would be required even if you didn't periodically call clear.

This avoids two threads from accessing the list at the same time, whether they are both adding items or one is adding items and the other is clearing the list.

If you iterate on the list within jdbcTemplate.update(...) - that still needs to be synchronized externally:

synchronized (buffer) {
    jdbcTemplate.update(...);
}

答案2

得分: 0

主要问题是可能会忽略一些插入操作。当使用当前列表更新数据库(jdbcTemplate.update(...))并立即清除该列表(buffer.clear())后,任何其间的插入操作都会被忽略,因为它会立即被清除。

解决这个问题的一种方法是将涉及的部分简单地标记为synchronized

void insertToDB(List<Result> results){
    synchronized (buffer) {
        this.buffer.add(results);
    }
}
scheduleHourlyCheck(){
    taskScheduler.schedule(() -> {
        synchronized (buffer) {
            jdbcTemplate.update(...);
            buffer.clear();
        }
    }, everyHour);
}

如果您不希望添加操作必须等待所有数据库更新完成(这当然是您的意愿),那么您可以简单地为更新操作复制列表。

scheduleHourlyCheck(){
    taskScheduler.schedule(() -> {
        List<Result> temp; 
        synchronized (buffer) {
            temp = new ArrayList<>(buffer);
            buffer.clear();
        }
        jdbcTemplate.update(...); // 但在此处使用 temp
    }, everyHour);
}

现在,您拥有了您在问题中所述的内容:定期更新您的数据库的同步机制。

英文:

The main problem is that some inserts could be ignored. When you update the database with the current list (jdbcTemplate.update(...)) and clear it right after (buffer.clear()), any insert in between would be ignored because it would be instantly cleared away.

One way to solve that is to simply mark involved sections as synchronized:

void insertToDB(List&lt;Result&gt; results){
    synchronized (buffer) {
        this.buffer.add(results)
    }
}
scheduleHourlyCheck(){
    taskScheduler.schedule(() -&gt; {
        synchronized (buffer) {
            jdbcTemplate.update(...);
            buffer.clear();
        }
    }, everyHour)
}

If you want to avoid that the adders have to wait for all database-updates to finish (and you certainly do) then you can simply copy the list for the updates.

scheduleHourlyCheck(){
    taskScheduler.schedule(() -&gt; {
        List&lt;Result&gt; temp; 
        synchronized (buffer) {
            temp = new ArrayList&lt;&gt;(buffer);
            buffer.clear();
        }
        jdbcTemplate.update(...); // but use temp here
    }, everyHour)
}

Now you've exactly what you've stated in the question: A synchronized mechanism that periodically updates your database.

huangapple
  • 本文由 发表于 2020年8月15日 22:22:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/63427065.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定