java JarEntry getSize() 返回 -1

huangapple go评论60阅读模式
英文:

java JarEntry getSize() returns -1

问题

EDIT 3

事实证明,一些JAR文件会正确报告entry.getSize(),而一些则不会。我从2013年开始创建的所有JAR文件(在OSX Java 8及更高版本上)都有效,而其他一些文件,如mongo-spark-connector-10.0.0.jar有效。而其他一些文件,如antlr-runtime-4.7.2.jarmongodb-driver-core-4.9.0.jarhadoop-azure-3.2.0.jar则不会。但是,通过JarFile访问时,所有文件都会正确报告大小。

我有一个有效的JarInputStream js,是从数据库中获取的(即不使用文件系统上的文件)。如果byte[]、流和解压缩存在一些细微差别,那么JarInputStream的设置如下所示:

byte[] bb = 获取完整的字节集; // 打印bb.length是14523,所以没问题
ByteArrayInputStream bas = new ByteArrayInputStream(bb);
JarInputStream js = new JarInputStream(bas);

我以以下方式迭代它:

JarEntry entry;
while ((entry = js.getNextJarEntry()) != null) {
    if (!entry.isDirectory() && entry.getName().endsWith(".class")) {
        String className = entry.getName().replace('/', '.').substring(0, entry.getName().length() - 6);
                        
        long len = entry.getSize();
        long zlen = entry.getCompressedSize();

        System.out.println("  class [" + className + "]: Z " + zlen + "; unZ " + len);

        if (len > 0) {
            byte[] classBytes = new byte[len];
            js.read(classBytes);

            System.out.println("captured [" + className + "]");
            classBytesMap.put(className, classBytes);
        }
    }
    // ...
}

循环“有效”,因为它正确地提取了所有类名,因此显然它正确地遍历了输入流。然而,entry.getSize()entry.getCompressedSize始终为-1。这是openjdk版本“17.0.7”(2023-04-18)。

Javadoc说明-1表示大小未知,但在继续处理流之前必须有一种大小或其他处理该条目的方法。

我不固守这种方法;最终目标是遍历JarInputStream并提取类名-类字节条目。

EDIT

作为测试,将byte[] bb数组写回文件XX2.jar,然后使用常规的JarFile类进行检查。这是有效的

JarFile jf = new JarFile("XX2.jar");
for (java.util.Enumeration<JarEntry> e = jf.entries(); e.hasMoreElements();) {
    entry = e.nextElement();
    if (!entry.isDirectory() && entry.getName().endsWith(".class")) {
        String className = entry.getName().replace('/', '.').substring(0, entry.getName().length() - 6);
        int x = (int) entry.getSize();
        int cx = (int) entry.getCompressedSize();
        System.out.println("  class [" + className + "]: Z " + cx + "; unZ " + x);
    }
}

然而,尝试使用常规的FileInputStream读取该文件,如我们在许多其他SO示例中看到的,不起作用

JarInputStream jarInputStream = new JarInputStream(new FileInputStream("XX2.jar"));

循环运行但大小和compressedSize仍然为-1

EDIT 2

以下是演示问题的完整示例:

import java.util.jar.JarEntry;
import java.util.jar.JarInputStream;
import java.util.jar.JarFile;

import java.io.FileInputStream;

class jartest {
    public static void showEntry(JarEntry entry) {
        if (!entry.isDirectory() && entry.getName().endsWith(".class")) {
            String className = entry.getName().replace('/', '.').substring(0, entry.getName().length() - 6);
            int x = (int) entry.getSize();
            int cx = (int) entry.getCompressedSize();
            System.out.println("  class [" + className + "]: Z " + cx + "; unZ " + x);
        }
    }

    public static void main(String[] args) {
        try {
            JarEntry entry;

            String fname = "XX2.jar";

            // 这个有效
            JarFile jf = new JarFile(fname);
            for (java.util.Enumeration<JarEntry> e = jf.entries(); e.hasMoreElements();) {
                entry = e.nextElement();
                showEntry(entry);
            }

            // 这个不起作用
            JarInputStream js = new JarInputStream(new FileInputStream(fname));
            while ((entry = js.getNextJarEntry()) != null) {
                showEntry(entry);
            }

        } catch(Exception e) {
            System.out.println("fail: " + e);
        }
    }
}

$ java jartest
class [grun$ExecutionContextImpl]: Z 706; unZ 1245
class [grun]: Z 3210; unZ 6476
class [grun$ExecutionContextImpl]: Z -1; unZ -1
class [grun]: Z -1; unZ -1

有什么线索吗?总的来说,看起来流不起作用,但基于文件名的逻辑有效。

英文:

EDIT 3

It turns out some jar files will properly report entry.getSize() with JarStreamInput and some don't. All the jar files I created going back to 2013 (on OSX Java 8 and higher) work and various others like mongo-spark-connector-10.0.0.jar work. Others like antlr-runtime-4.7.2.jar, mongodb-driver-core-4.9.0.jar, and hadoop-azure-3.2.0.jar do not. But all files properly report size when accessed via JarFile.

I have a valid JarInputStream js sourced from a database (i.e. not using files on filesystem). In case there are some nuances to byte[], streams, and unzipping, this is how the JarInputStream is set up:

byte[] bb = get complete set of bytes; // print bb.length is 14523 so OK
ByteArrayInputStream bas = new ByteArrayInputStream(bb);
JarInputStream js = new JarInputStream(bas);

I iterate it in this way:

        JarEntry entry;
while ((entry = js.getNextJarEntry()) != null) {
if (!entry.isDirectory() &amp;&amp; entry.getName().endsWith(&quot;.class&quot;)) {
String className = entry.getName().replace(&#39;/&#39;, &#39;.&#39;).substring(0, entry\
.getName().length() - 6);
long len = entry.getSize();
long zlen = entry.getCompressedSize();
System.out.println(&quot;  class [&quot; + className + &quot;]: Z &quot; + zlen + &quot;; unZ &quot; + len);
if(len &gt; 0) {
byte[] classBytes = new byte[len];
js.read(classBytes);
System.out.println(&quot;captured [&quot; + className + &quot;]&quot;);
classBytesMap.put(className, classBytes);
}
...

The loop "works" because it correctly pulls all the class names out so it is clearly walking the input stream properly. However, both entry.getSize() and entry.getCompressedSize are always -1. This is openjdk version "17.0.7" 2023-04-18.

The javadoc states that -1 means the size is unknown but there must be a size or some other means by which to process just this entry before moving further down the stream.

I am not wed to this approach; the ultimate goal is to walk a JarInputStream and extract classname-classbyte entries.

EDIT

As a test, the byte[] bb array is written back to file XX2.jar and then examined using regular JarFile classes. This works:

        JarFile jf = new JarFile(&quot;XX2.jar&quot;);
for (java.util.Enumeration&lt;JarEntry&gt; e = jf.entries(); e.hasMoreElements();) {
entry = e.nextElement();
if (!entry.isDirectory() &amp;&amp; entry.getName().endsWith(&quot;.class&quot;)) {
String className = entry.getName().replace(&#39;/&#39;, &#39;.&#39;).substring(0, entry\
.getName().length() - 6);
int x = (int)entry.getSize();
int cx = (int)entry.getCompressedSize();
System.out.println(&quot;  class [&quot; + className + &quot;]: Z &quot; + cx + &quot;; unZ &quot; + \
x);
}
}
class [org.bson.AbstractBsonReader$1]: Z 506; unZ 792
class [org.bson.AbstractBsonReader$Context]: Z 497; unZ 1227
class [org.bson.AbstractBsonReader$Mark]: Z 813; unZ 2201

However, trying to read that file using a regular FileInputStream as we see in many other SO examples does not work:

      JarInputStream jarInputStream = new JarInputStream(new FileInputStream(&quot;XX2.jar&quot;));
loop runs but size and compressedSize are STILL -1.

EDIT 2

Here is a complete example that demos the problem:

import java.util.jar.JarEntry;
import java.util.jar.JarInputStream;
import java.util.jar.JarFile;
import java.io.FileInputStream;
class jartest {
public static void showEntry(JarEntry entry) {
if (!entry.isDirectory() &amp;&amp; entry.getName().endsWith(&quot;.class&quot;)) {
String className = entry.getName().replace(&#39;/&#39;, &#39;.&#39;).substring(0, entry.get\
Name().length() - 6);
int x = (int)entry.getSize();
int cx = (int)entry.getCompressedSize();
System.out.println(&quot;  class [&quot; + className + &quot;]: Z &quot; + cx + &quot;; unZ &quot; + x);
}
}
public static void main(String[] args) {
try {
JarEntry entry;
String fname = &quot;XX2.jar&quot;;
// THIS WORKS                                                               
JarFile jf = new JarFile(fname);
for (java.util.Enumeration&lt;JarEntry&gt; e = jf.entries(); e.hasMoreElements();\
) {
entry =	e.nextElement();
showEntry(entry);
}
// THIS DOES NOT WORK                                                       
JarInputStream js = new JarInputStream(new FileInputStream(fname));
while ((entry = js.getNextJarEntry()) != null) {
showEntry(entry);
}
} catch(Exception e) {
System.out.println(&quot;fail: &quot; + e);
}
}
}
$ java jartest
class [grun$ExecutionContextImpl]: Z 706; unZ 1245
class [grun]: Z 3210; unZ 6476
class [grun$ExecutionContextImpl]: Z -1; unZ -1
class [grun]: Z -1; unZ -1

Any clues? Overall it looks like streams are not working but filename based logic does work.

答案1

得分: 1

根据Java文档,-1 表示相应的大小是未知的。

根据源代码,ZipEntry / JarEntry 的 size 字段的值是从ZIP或JAR文件本身读取的。逻辑有点复杂1,但立即解释为什么会得到-1的原因是代码无法从LOC头部提取大小,或者因为LOC头部中的大小为-1。

要弄清楚您的情况实际发生了什么,您需要手动解码JAR文件的头部。请注意,JAR文件实际上是一个带有清单的ZIP文件,因此您可以使用ZIP文件格式维基百科页面中的格式描述作为参考。

正如@g00se的评论所建议的,问题可能是您尝试读取的JAR文件不完整或格式不正确。因此,另一种选择是将字节写入文件,然后查看ZIP工具或jar命令是否能够读取该文件。


1 - 经典ZIP和ZIP64在LOC头部中以不同的方式表示大小。ZipEntry代码必须解析这一点。

英文:

According to the javadoc, -1 means that the respective size is unknown.

According to the source code, the values of a ZipEntry / JarEntry's size fields are read from the ZIP or JAR file itself. The logic is a bit complicated<sup>1</sup>, but immediate explanation for getting a -1 is that the code was unable to extract sizes from the LOC header, or because the sizes were -1 in the LOC header.

To figure out what is actually going on in your case, you will need to decode the JAR file's headers by hand. Note that a JAR file is actually a ZIP file with a manifest, so you can use the format description in the ZIP file format Wikipedia page as a reference.

As suggested by @g00se's comment, the problem could be that the JAR file you are trying to read is incomplete or malformed. So another alternative would be to write the bytes to a file and see if a ZIP tool or the jar command can read the file.


<sup>1 - Classic ZIP and ZIP64 represent the sizes differently in the LOC headers. The ZipEntry code has to unpick this.</sup>

huangapple
  • 本文由 发表于 2023年6月11日 22:15:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76450896.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定