英文:
java JarEntry getSize() returns -1
问题
EDIT 3
事实证明,一些JAR文件会正确报告entry.getSize()
,而一些则不会。我从2013年开始创建的所有JAR文件(在OSX Java 8及更高版本上)都有效,而其他一些文件,如mongo-spark-connector-10.0.0.jar
有效。而其他一些文件,如antlr-runtime-4.7.2.jar
、mongodb-driver-core-4.9.0.jar
和hadoop-azure-3.2.0.jar
则不会。但是,通过JarFile
访问时,所有文件都会正确报告大小。
我有一个有效的JarInputStream js
,是从数据库中获取的(即不使用文件系统上的文件)。如果byte[]
、流和解压缩存在一些细微差别,那么JarInputStream
的设置如下所示:
byte[] bb = 获取完整的字节集; // 打印bb.length是14523,所以没问题
ByteArrayInputStream bas = new ByteArrayInputStream(bb);
JarInputStream js = new JarInputStream(bas);
我以以下方式迭代它:
JarEntry entry;
while ((entry = js.getNextJarEntry()) != null) {
if (!entry.isDirectory() && entry.getName().endsWith(".class")) {
String className = entry.getName().replace('/', '.').substring(0, entry.getName().length() - 6);
long len = entry.getSize();
long zlen = entry.getCompressedSize();
System.out.println(" class [" + className + "]: Z " + zlen + "; unZ " + len);
if (len > 0) {
byte[] classBytes = new byte[len];
js.read(classBytes);
System.out.println("captured [" + className + "]");
classBytesMap.put(className, classBytes);
}
}
// ...
}
循环“有效”,因为它正确地提取了所有类名,因此显然它正确地遍历了输入流。然而,entry.getSize()
和entry.getCompressedSize
始终为-1。这是openjdk版本“17.0.7”(2023-04-18)。
Javadoc说明-1表示大小未知,但在继续处理流之前必须有一种大小或其他处理该条目的方法。
我不固守这种方法;最终目标是遍历JarInputStream
并提取类名-类字节条目。
EDIT
作为测试,将byte[] bb
数组写回文件XX2.jar
,然后使用常规的JarFile
类进行检查。这是有效的:
JarFile jf = new JarFile("XX2.jar");
for (java.util.Enumeration<JarEntry> e = jf.entries(); e.hasMoreElements();) {
entry = e.nextElement();
if (!entry.isDirectory() && entry.getName().endsWith(".class")) {
String className = entry.getName().replace('/', '.').substring(0, entry.getName().length() - 6);
int x = (int) entry.getSize();
int cx = (int) entry.getCompressedSize();
System.out.println(" class [" + className + "]: Z " + cx + "; unZ " + x);
}
}
然而,尝试使用常规的FileInputStream
读取该文件,如我们在许多其他SO示例中看到的,不起作用:
JarInputStream jarInputStream = new JarInputStream(new FileInputStream("XX2.jar"));
循环运行,但大小和compressedSize仍然为-1。
EDIT 2
以下是演示问题的完整示例:
import java.util.jar.JarEntry;
import java.util.jar.JarInputStream;
import java.util.jar.JarFile;
import java.io.FileInputStream;
class jartest {
public static void showEntry(JarEntry entry) {
if (!entry.isDirectory() && entry.getName().endsWith(".class")) {
String className = entry.getName().replace('/', '.').substring(0, entry.getName().length() - 6);
int x = (int) entry.getSize();
int cx = (int) entry.getCompressedSize();
System.out.println(" class [" + className + "]: Z " + cx + "; unZ " + x);
}
}
public static void main(String[] args) {
try {
JarEntry entry;
String fname = "XX2.jar";
// 这个有效
JarFile jf = new JarFile(fname);
for (java.util.Enumeration<JarEntry> e = jf.entries(); e.hasMoreElements();) {
entry = e.nextElement();
showEntry(entry);
}
// 这个不起作用
JarInputStream js = new JarInputStream(new FileInputStream(fname));
while ((entry = js.getNextJarEntry()) != null) {
showEntry(entry);
}
} catch(Exception e) {
System.out.println("fail: " + e);
}
}
}
$ java jartest
class [grun$ExecutionContextImpl]: Z 706; unZ 1245
class [grun]: Z 3210; unZ 6476
class [grun$ExecutionContextImpl]: Z -1; unZ -1
class [grun]: Z -1; unZ -1
有什么线索吗?总的来说,看起来流不起作用,但基于文件名的逻辑有效。
英文:
EDIT 3
It turns out some jar files will properly report entry.getSize()
with JarStreamInput
and some don't. All the jar files I created going back to 2013 (on OSX Java 8 and higher) work and various others like mongo-spark-connector-10.0.0.jar
work. Others like antlr-runtime-4.7.2.jar
, mongodb-driver-core-4.9.0.jar
, and hadoop-azure-3.2.0.jar
do not. But all files properly report size when accessed via JarFile
.
I have a valid JarInputStream js
sourced from a database (i.e. not using files on filesystem). In case there are some nuances to byte[]
, streams, and unzipping, this is how the JarInputStream
is set up:
byte[] bb = get complete set of bytes; // print bb.length is 14523 so OK
ByteArrayInputStream bas = new ByteArrayInputStream(bb);
JarInputStream js = new JarInputStream(bas);
I iterate it in this way:
JarEntry entry;
while ((entry = js.getNextJarEntry()) != null) {
if (!entry.isDirectory() && entry.getName().endsWith(".class")) {
String className = entry.getName().replace('/', '.').substring(0, entry\
.getName().length() - 6);
long len = entry.getSize();
long zlen = entry.getCompressedSize();
System.out.println(" class [" + className + "]: Z " + zlen + "; unZ " + len);
if(len > 0) {
byte[] classBytes = new byte[len];
js.read(classBytes);
System.out.println("captured [" + className + "]");
classBytesMap.put(className, classBytes);
}
...
The loop "works" because it correctly pulls all the class names out so it is clearly walking the input stream properly. However, both entry.getSize()
and entry.getCompressedSize
are always -1. This is openjdk version "17.0.7" 2023-04-18.
The javadoc states that -1 means the size is unknown but there must be a size or some other means by which to process just this entry before moving further down the stream.
I am not wed to this approach; the ultimate goal is to walk a JarInputStream
and extract classname-classbyte entries.
EDIT
As a test, the byte[] bb
array is written back to file XX2.jar
and then examined using regular JarFile
classes. This works:
JarFile jf = new JarFile("XX2.jar");
for (java.util.Enumeration<JarEntry> e = jf.entries(); e.hasMoreElements();) {
entry = e.nextElement();
if (!entry.isDirectory() && entry.getName().endsWith(".class")) {
String className = entry.getName().replace('/', '.').substring(0, entry\
.getName().length() - 6);
int x = (int)entry.getSize();
int cx = (int)entry.getCompressedSize();
System.out.println(" class [" + className + "]: Z " + cx + "; unZ " + \
x);
}
}
class [org.bson.AbstractBsonReader$1]: Z 506; unZ 792
class [org.bson.AbstractBsonReader$Context]: Z 497; unZ 1227
class [org.bson.AbstractBsonReader$Mark]: Z 813; unZ 2201
However, trying to read that file using a regular FileInputStream
as we see in many other SO examples does not work:
JarInputStream jarInputStream = new JarInputStream(new FileInputStream("XX2.jar"));
loop runs but size and compressedSize are STILL -1.
EDIT 2
Here is a complete example that demos the problem:
import java.util.jar.JarEntry;
import java.util.jar.JarInputStream;
import java.util.jar.JarFile;
import java.io.FileInputStream;
class jartest {
public static void showEntry(JarEntry entry) {
if (!entry.isDirectory() && entry.getName().endsWith(".class")) {
String className = entry.getName().replace('/', '.').substring(0, entry.get\
Name().length() - 6);
int x = (int)entry.getSize();
int cx = (int)entry.getCompressedSize();
System.out.println(" class [" + className + "]: Z " + cx + "; unZ " + x);
}
}
public static void main(String[] args) {
try {
JarEntry entry;
String fname = "XX2.jar";
// THIS WORKS
JarFile jf = new JarFile(fname);
for (java.util.Enumeration<JarEntry> e = jf.entries(); e.hasMoreElements();\
) {
entry = e.nextElement();
showEntry(entry);
}
// THIS DOES NOT WORK
JarInputStream js = new JarInputStream(new FileInputStream(fname));
while ((entry = js.getNextJarEntry()) != null) {
showEntry(entry);
}
} catch(Exception e) {
System.out.println("fail: " + e);
}
}
}
$ java jartest
class [grun$ExecutionContextImpl]: Z 706; unZ 1245
class [grun]: Z 3210; unZ 6476
class [grun$ExecutionContextImpl]: Z -1; unZ -1
class [grun]: Z -1; unZ -1
Any clues? Overall it looks like streams are not working but filename based logic does work.
答案1
得分: 1
根据Java文档,-1 表示相应的大小是未知的。
根据源代码,ZipEntry / JarEntry 的 size 字段的值是从ZIP或JAR文件本身读取的。逻辑有点复杂1,但立即解释为什么会得到-1的原因是代码无法从LOC头部提取大小,或者因为LOC头部中的大小为-1。
要弄清楚您的情况实际发生了什么,您需要手动解码JAR文件的头部。请注意,JAR文件实际上是一个带有清单的ZIP文件,因此您可以使用ZIP文件格式维基百科页面中的格式描述作为参考。
正如@g00se的评论所建议的,问题可能是您尝试读取的JAR文件不完整或格式不正确。因此,另一种选择是将字节写入文件,然后查看ZIP工具或jar
命令是否能够读取该文件。
1 - 经典ZIP和ZIP64在LOC头部中以不同的方式表示大小。ZipEntry
代码必须解析这一点。
英文:
According to the javadoc, -1 means that the respective size is unknown.
According to the source code, the values of a ZipEntry / JarEntry's size fields are read from the ZIP or JAR file itself. The logic is a bit complicated<sup>1</sup>, but immediate explanation for getting a -1 is that the code was unable to extract sizes from the LOC header, or because the sizes were -1 in the LOC header.
To figure out what is actually going on in your case, you will need to decode the JAR file's headers by hand. Note that a JAR file is actually a ZIP file with a manifest, so you can use the format description in the ZIP file format Wikipedia page as a reference.
As suggested by @g00se's comment, the problem could be that the JAR file you are trying to read is incomplete or malformed. So another alternative would be to write the bytes to a file and see if a ZIP tool or the jar
command can read the file.
<sup>1 - Classic ZIP and ZIP64 represent the sizes differently in the LOC headers. The ZipEntry
code has to unpick this.</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论