如何在Java中读取多部分文件输入流的内容

huangapple go评论78阅读模式
英文:

How to read contents of a multipart file inputstream in Java

问题

我有一个使用Thymeleaf的HTML表单,它接受一个上传的文件作为输入,然后向Java控制器发出多部分文件的POST请求。然后我将文件转换为输入流。虽然我能够读取文件的大小和输入类型,但不能成功地打印出内容。

例如,对于一个.doc文件,如果我尝试找到的打印文件内容的方法,它只会打印出一系列数字。我猜想这是一种编码。是否存在一种方法可以打印出上传的.doc文件的内容?

我目前正在使用的控制器动作尝试打印出文件内容是:

@PostMapping("/file-upload")
public String uploadFile(@RequestParam("fileUpload") MultipartFile fileUpload, Model model) throws IOException {
    InputStream fis = fileUpload.getInputStream();

    for (int i = 0; i < fis.available(); i++) {
        System.out.println("" + fis.read());
    }

    return "home";
}

我用于提交文件的表单是:

<form th:action="@{/file-upload}" enctype="multipart/form-data" method="POST">
    <div class="container">
        <div class="row" style="margin: 1em;">
            <div class="col-sm-2">
                <label for="fileUpload">Upload a New File:</label>
            </div>
            <div class="col-sm-6">
                <input type="file" class="form-control-file" id="fileUpload" name="fileUpload">
            </div>
            <div class="col-sm-4">
                <button type="submit" class="btn btn-dark">Upload</button>
            </div>
        </div>
    </div>
</form>
英文:

I have a Thymeleaf html form that accepts an uploaded file as input and then makes a post request to a Java controller for the multipart file. I then convert the file into an inputstream. While I am able to read the file's size and input type, I am not able to successfully print out the contents.

For example, for a .doc file, if I try methods I have found to print out the file's contents, it merely prints a series of numbers. Which I'm assuming is an encoding. Does a method exist to print out the contents of an uploaded .doc file?

The controller action I'm currently using to attempt to print out the file's contents is:

@PostMapping(&quot;/file-upload&quot;)
    public String uploadFile(@RequestParam(&quot;fileUpload&quot;) MultipartFile fileUpload, Model model) throws IOException {
        InputStream fis = fileUpload.getInputStream();

        for (int i = 0; i &lt; fis.available(); i++) {
            System.out.println(&quot;&quot; + fis.read());
        }

        return &quot;home&quot;;
}

And the form I am using to submit the file is:

                        &lt;form th:action=&quot;@{/file-upload}&quot; enctype=&quot;multipart/form-data&quot; method=&quot;POST&quot;&gt;
                            &lt;div class=&quot;container&quot;&gt;
                                &lt;div class=&quot;row&quot; style=&quot;margin: 1em;&quot;&gt;
                                    &lt;div class=&quot;col-sm-2&quot;&gt;
                                        &lt;label for=&quot;fileUpload&quot;&gt;Upload a New File:&lt;/label&gt;
                                    &lt;/div&gt;
                                    &lt;div class=&quot;col-sm-6&quot;&gt;
                                        &lt;input type=&quot;file&quot; class=&quot;form-control-file&quot; id=&quot;fileUpload&quot; name=&quot;fileUpload&quot;&gt;
                                    &lt;/div&gt;
                                    &lt;div class=&quot;col-sm-4&quot;&gt;
                                        &lt;button type=&quot;submit&quot; class=&quot;btn btn-dark&quot;&gt;Upload&lt;/button&gt;
                                    &lt;/div&gt;
                                &lt;/div&gt;
                            &lt;/div&gt;
                        &lt;/form&gt;

答案1

得分: 1

不要使用 InputStream.available()。 来自文档

> 使用此方法的返回值来分配一个旨在容纳此流中所有数据的缓冲区是不正确的。

只有从 read() 中获得值 -1 才表示 InputStream 的结束。

> 例如,对于 .doc 文件,如果我尝试找到的方法来打印文件内容,它只会打印一系列数字。我猜想这是一种编码。

你的假设是错误的。.doc 文件是复杂的二进制格式,不仅仅是文本编码。(尝试在记事本中打开 .doc 文件。)

你得到数字是因为你在打印数字。InputStream.read() 返回一个整数。&quot;&quot; + fis.read() 将每个返回的整数转换为字符串。

如果你真的想要打印文件的内容,直接写入字节:

int b;
while ((b = fis.read()) &gt;= 0) {
    System.out.write(b);
}

如果你使用的是 Java 9 或更高版本,你可以使用:

fis.transferTo(System.out);

然而,无论哪种选项都不会以可读的形式显示 Word 文档的内容。你将需要一个能够从 Word 文件中读取文本内容的库,比如 Apache POI。(还有其他可用的库;你可能需要搜索一下。)

英文:

Do not use InputStream.available(). From the documentation:

>It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.

Only getting value of -1 from read() indicates the end of the InputStream.

>For example, for a .doc file, if I try methods I have found to print out the file's contents, it merely prints a series of numbers. Which I'm assuming is an encoding.

Your assumption is incorrect. A .doc file is complex binary format, not just a text encoding. (Try opening a .doc file in Notepad.)

You are getting numbers because you are printing numbers. InputStream.read() returns an int. &quot;&quot; + fis.read() converts each returned int to a String.

If you really want to print the contents of the file, write the bytes directly:

int b;
while ((b = fis.read()) &gt;= 0) {
    System.out.write(b);
}

If you’re using Java 9 or later, you can just use:

fis.transferTo(System.out);

However, neither option will show the contents of a Word document in a readable form. You will need a library that can read the text content from a Word file, like Apache POI. (There are other libraries available; you may want to search for them.)

huangapple
  • 本文由 发表于 2020年7月27日 08:19:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/63107131.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定