When writing a huge amount of data, parts of it get lost / When every data is present, the write process is very slow

huangapple go评论81阅读模式
英文:

When writing a huge amount of data, parts of it get lost / When every data is present, the write process is very slow

问题

我在将您提供的内容翻译为中文时,只包括代码部分的翻译,不包含其他内容。以下是翻译好的代码部分:

import java.awt.EventQueue;
import java.awt.event.ActionEvent;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.OutputStreamWriter;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.swing.GroupLayout;
import static javax.swing.GroupLayout.Alignment.BASELINE;
import static javax.swing.GroupLayout.Alignment.LEADING;
import javax.swing.JButton;
import javax.swing.JFileChooser;
import javax.swing.JFrame;
import javax.swing.JProgressBar;
import javax.swing.SwingWorker;
import javax.swing.UIManager;
import javax.swing.WindowConstants;

/**
 * 逐行读取文件,修改其内容并将其写入另一个文件。
 */
public class gui extends JFrame {

  /**
   * 后台任务,以便GUI不会被阻塞,并且可以更新进度条。
   */
  class fileConversionWorker extends SwingWorker<Integer, Double> {
    private final File file;

    public fileConversionWorker(File file) {
      this.file = file;
    }

    /**
     * 读取文件的行数,用于设置进度条的边界设置。
     */
    private int countLines(File aFile) throws IOException {
      LineNumberReader reader = null;
      try {
          reader = new LineNumberReader(new FileReader(aFile));
          while ((reader.readLine()) != null);
          return reader.getLineNumber();
      } catch (Exception ex) {
          return -1;
      } finally {
          if(reader != null)
              reader.close();
      }
    }

    /**
     * 逐行读取文件,修改行内容并立即写入不同的文件。
     */
    @Override
    public Integer doInBackground() {
      // ... 省略前面的代码 ...

      if (totalLines > 0) {
        // ... 省略前面的代码 ...

        writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(filename, true)));

        String line;
        // 读取原始文件,修改行并立即写入新文件
        while ((line = br.readLine()) != null) {
          writer.write(line + " // " + lineNb);
          writer.newLine();

          publish((double) (lineNb / totalLines));
          lineNb++;
        }
      }
      return 0;
    }

    // ... 省略后面的代码 ...
  }

  /**
   * 启动GUI。
   */
  public static void main() {
    EventQueue.invokeLater(() -> {
      new gui().setVisible(true);
    });
  }

  // ... 省略后面的代码 ...
}

请注意,由于您要求只返回翻译好的部分,我已经忽略了代码中的注释和额外的说明性文本。如果您需要更多上下文或解释,请随时告诉我。

英文:

I have a problem with the Buffered writer when writing a large amount of strings to a file.

Situation:
I have to read a large text file (>100k lines) and perform some modification to each line (remove whitspaces, check for optional commands, etc.) and write the modified content to a new file.

I have tried two possibilities to write to the file and get only one of the two following results:

  1. The write process is horribly slow, but all lines are processed
  2. Several chunks of lines are getting munched during the writing process, leaving an incomplete modified result.

Approaches and results:

  1. Horribly slow but complete
// read file content and put it in List&lt;String&gt; fileContent
for (String line : fileContent)
{
try(BufferedWriter writer = new BufferedWriter(new OutputStreamwriter(new FileOutputStream(filename, true))))
{
writer.write(modifyFileContent(fileContent));
}
}

I already know, opening a file to write one line and closing it directly is very good at underperforming. A modification of a file with around 4M lines takes around 4h or so, which is not desireable. At least, it works...

  1. Faster, but incomplete write
// read file content and put it in List&lt;String&gt; fileContent
// This is placed in a try/catch block, I&#39;m omitting it here for brevity
BufferedWriter writer = new BufferedWriter(new OutputStreamwriter(new FileOutputStream(filename, true);
for (String line : fileContent)
{
writer.write(modifyFileContent(fileContent));
}
writer.close();

This works faster, but I get following content in the result file (I use the line number from the original file for this debug purpose):

...
Very long line with interesting content // line nb 567
Very long line with interesting content // line nb 568
Very long line with interesting content // line nb 569
Very long line wi
Very long line with interesting content // line nb 834
Very long line with interesting content // line nb 835
Very long line with interesting content // line nb 836
...

When printing this strings to the console, I see no gaps in the line numbering! So it seems, there is somewhere a buffering issue...

Other approaches:
I also tried the NIO version of newBufferedWriter, which also omitted several lines.

Question:
What am I missing here? Is there a way, to get a good write performance with correctness here?
The input files are usually in the area of several 100MB and Millions of lines... Any hints are much appreciated When writing a huge amount of data, parts of it get lost / When every data is present, the write process is very slow

[edit]

Thanks to Sir Lopez I found a working solution. I never stumbled upon RandomAccessFile before...

Now with this information, I guess I run into a race condition or something else thread related... As I started working with threads just recently, I guess, this could've be expected...

To give the proper view, I made a minimal example, which shows the context, in which my problem originally occured. Any Feedback is welcome When writing a huge amount of data, parts of it get lost / When every data is present, the write process is very slow :

package minex;
import java.awt.EventQueue;
import java.awt.event.ActionEvent;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.OutputStreamWriter;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.swing.GroupLayout;
import static javax.swing.GroupLayout.Alignment.BASELINE;
import static javax.swing.GroupLayout.Alignment.LEADING;
import javax.swing.JButton;
import javax.swing.JFileChooser;
import javax.swing.JFrame;
import javax.swing.JProgressBar;
import javax.swing.SwingWorker;
import javax.swing.UIManager;
import javax.swing.WindowConstants;
/**
* Read a file line by line, modify its content and write it to another file.
* @author demo
*/
public class gui extends JFrame {
/**
* Back ground task, so the gui isn&#39;t blocked and the progress bar can be updated.
*/
class fileConversionWorker extends SwingWorker&lt;Integer, Double&gt;
{
private final File file;
public fileConversionWorker(File file)
{
this.file = file;
}
/**
* Count the lines in the provided file. Needed to set the boundary
* settings for the progress bar.
* @param aFile File to read.
* @return Number of lines present in aFile
* @throws IOException 
* @see quick and dirty taken from https://stackoverflow.com/a/1277955
*/
private int countLines(File aFile) throws IOException {
LineNumberReader reader = null;
try {
reader = new LineNumberReader(new FileReader(aFile));
while ((reader.readLine()) != null);
return reader.getLineNumber();
} catch (Exception ex) {
return -1;
} finally { 
if(reader != null) 
reader.close();
}
}
/**
* Reads a file line by line, modify the line
* content and write it back to a different file immediately.
* @return 
*/
@Override
public Integer doInBackground()
{
int totalLines = 0;
try {
// Indicate, that something is happening
barProgress.setIndeterminate(true);
totalLines = countLines(file);
barProgress.setIndeterminate(false);
} catch (IOException ex) {
Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
}
// only proceed, when we at least have 1 line to manipulate.
if (totalLines &gt; 0)
{
BufferedReader br = null;
BufferedWriter writer = null;
try {
barProgress.setMaximum(totalLines);
br = new BufferedReader(new FileReader(file));
String filename =  file.getAbsolutePath() + &quot;.mod&quot;;
long lineNb = 0;
writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(filename, true)));
String line;
// Read original file, modify line and immediately write to new file
while ((line = br.readLine()) != null)
{
writer.write(line + &quot; // &quot; + lineNb);
writer.newLine();
publish((double)(lineNb / totalLines));
lineNb++;
}
} catch (FileNotFoundException ex) {
Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
} catch ( IOException ex) {
Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
}
finally {
// Tidying up
try {
if (br != null)
br.close();
if (writer != null)
writer.close();
} catch (IOException ex) {
Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
return 0;
}
/**
* Prevent any interaction, which could interrupt the worker
*/
@Override 
public void done()
{
butLoadFile.setEnabled(true); 
}
/**
* Update progress the progress bar,
* @param aDoubles
*/
@Override
protected void process(java.util.List&lt;Double&gt; aDoubles) {    
int amount = barProgress.getMaximum() - barProgress.getMinimum();
barProgress.setValue( ( int ) (barProgress.getMinimum() + ( amount * aDoubles.get( aDoubles.size() - 1 ))) );
}
}
/**
* Start the gui.
*/
public static void main()
{
EventQueue.invokeLater(() -&gt; {
new gui().setVisible(true);
});
}
/**
* Initialize all things needed.
*/
public gui()
{
initComponents();
}
/**
* Load a file and immediately begin processing it.
* @param evt 
*/
private void butLoadFileActionListener(ActionEvent evt)
{
javax.swing.JFileChooser fc = new javax.swing.JFileChooser(&quot;/home/demo/fileFolder&quot;);
int returnVal = fc.showOpenDialog(gui.this);
if (returnVal == JFileChooser.APPROVE_OPTION) {
File file = fc.getSelectedFile();
butLoadFile.setEnabled(false);
fileConversionWorker worker = new fileConversionWorker(file);
worker.execute();
}
}
/**
* Paint the canvas.
*/
private void initComponents()
{
setDefaultCloseOperation(WindowConstants.EXIT_ON_CLOSE);
setResizable(false);
setTitle(&quot;Min Example&quot;);
butLoadFile = new JButton(&quot;Load file&quot;);
butLoadFile.addActionListener((ActionEvent evt) -&gt; {
butLoadFileActionListener(evt);
});
barProgress = new JProgressBar();
barProgress.setStringPainted(true);
barProgress.setMinimum(0);
javax.swing.GroupLayout layout = new GroupLayout(getContentPane());
getContentPane().setLayout(layout);
layout.setHorizontalGroup(
layout.createParallelGroup(LEADING)
.addComponent(butLoadFile, GroupLayout.PREFERRED_SIZE, 200, GroupLayout.PREFERRED_SIZE)
.addComponent(barProgress, GroupLayout.PREFERRED_SIZE, 200, GroupLayout.PREFERRED_SIZE)
);
layout.setVerticalGroup(
layout.createParallelGroup(BASELINE)
.addGroup(layout.createSequentialGroup()
.addComponent(butLoadFile, GroupLayout.PREFERRED_SIZE, 20, GroupLayout.PREFERRED_SIZE)
.addComponent(barProgress, GroupLayout.PREFERRED_SIZE, 20, GroupLayout.PREFERRED_SIZE)            
)
);
pack();
}
private JButton butLoadFile;        /** Button to load a file. */
private JProgressBar barProgress;   /** Progress bar to visualize progress. */  
}

[/edit]

答案1

得分: 0

也许这可以帮助你:

https://stackoverflow.com/questions/1062113/fastest-way-to-write-huge-data-in-text-file-java

https://www.quora.com/How-do-to-read-and-write-large-size-file-in-Java-efficiently

英文:

Maybe this can help you

https://stackoverflow.com/questions/1062113/fastest-way-to-write-huge-data-in-text-file-java

https://www.quora.com/How-do-to-read-and-write-large-size-file-in-Java-efficiently

huangapple
  • 本文由 发表于 2020年9月22日 22:33:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/64011975.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定