英文:
Java Bytecode Mystery: Illegal Operation Order in Constructors
问题
我一直在尝试对一个Java应用进行反向工程,发现了一些有趣的东西。我找到的字节码似乎违反了规则,没有在构造函数中首先初始化超类。
我正在努力弄清楚这是如何可能的。这可能是Java编译器的正常行为,还是一种狡猾的混淆技术(注意:原始类名没有被混淆器删除,这表明混淆过程可能不是很彻底。因此,字节码结构不太可能是混淆的故意结果)。
有人可以提供一些关于原始代码可能是什么样的见解,以生成这种非传统的字节码吗?我渴望学习并解开这个谜团。非常感谢!
以下是字节码部分。
final class a/ka$a extends java/lang/Thread {
<ClassVersion=51>
<SourceFile=CLThreadPool.java>
private synthetic a.ka a;
public ka$a(a.ka arg0, java.lang.String arg1, boolean arg2) { // <init> //(La/ka;Ljava/lang/String;Z)V
<localVar:index=0 , name=this , desc=La/ka$a;, sig=null, start=L0, end=L4>
<localVar:index=2 , name=name , desc=Ljava/lang/String;, sig=null, start=L0, end=L4>
<localVar:index=3 , name=daemon , desc=Z, sig=null, start=L0, end=L4>
L0 {
aload 0 // reference to self
aload 1 // reference to arg0
putfield a/ka$a.a:a.ka
}
L1 {
aload 0 // reference to self
aload 1 // reference to arg0
new java/lang/StringBuilder
dup
aload 2 // reference to arg1
invokestatic java/lang/String.valueOf(Ljava/lang/Object;)Ljava/lang/String;
invokespecial java/lang/StringBuilder.<init>(Ljava/lang/String;)V
ldc ".pool[" (java.lang.String)
invokevirtual java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
aload 1 // reference to arg0
dup
invokestatic a/ka.a(La/ka;)I
dup_x1
iconst_1
iadd
invokestatic a/ka.a(La/ka;I)V
invokevirtual java/lang/StringBuilder.append(I)Ljava/lang/StringBuilder;
ldc "]" (java.lang.String)
invokevirtual java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
invokevirtual java/lang/StringBuilder.toString()Ljava/lang/String;
invokespecial java/lang/Thread.<init>(Ljava/lang/ThreadGroup;Ljava/lang/String;)V
}
L2 {
aload 0 // reference to self
iload 3
invokevirtual a/ka$a.setDaemon(Z)V
}
L3 {
return
}
L4 {
}
}
编译器首选项也仍然在混淆的JAR文件中:
eclipse.preferences.version=1
org.eclipse.jdt.core.compiler.codegen.inlineJsrBytecode=enabled
org.eclipse.jdt.core.compiler.codegen.methodParameters=do not generate
org.eclipse.jdt.core.compiler.codegen.targetPlatform=1.7
org.eclipse.jdt.core.compiler.codegen.unusedLocal=preserve
org.eclipse.jdt.core.compiler.compliance=1.7
org.eclipse.jdt.core.compiler.debug.lineNumber=generate
org.eclipse.jdt.core.compiler.debug.localVariable=generate
org.eclipse.jdt.core.compiler.debug.sourceFile=generate
org.eclipse.jdt.core.compiler.problem.assertIdentifier=error
org.eclipse.jdt.core.compiler.problem.enumIdentifier=error
org.eclipse.jdt.core.compiler.release=disabled
org.eclipse.jdt.core.compiler.source=1.7
我考虑到可能存在一个静态方法,负责在编译期间内联的逻辑。尽管我尝试过,但我还没有能够复现类似的输出。另外,我注意到存在一个合成字段以及这个类是一个内部类。这些因素似乎在不寻常的字节码结构中起了一定作用。
英文:
I've been tinkering with reverse engineering a Java app, and I've stumbled upon something interesting. The bytecode I found seems to break the rules by not initializing the superclass first in a constructor.
I'm trying to figure out how this is possible. Could it be a normal behavior of Java compilers, or is it some sneaky obfuscation technique (Note: It's worth mentioning that the original class name hasn't been stripped by the obfuscator, which indicates that the obfuscation process might not have been very thorough. So, it's less likely that the bytecode structure is a deliberate result of obfuscation.)
Could anyone perhaps offer some insight on what could the original code have looked like to generate such unconventional bytecode? I'm eager to learn and unravel this mystery. Thanks a bunch!
Here is the bytecode.
final class a/ka$a extends java/lang/Thread {
<ClassVersion=51>
<SourceFile=CLThreadPool.java>
private synthetic a.ka a;
public ka$a(a.ka arg0, java.lang.String arg1, boolean arg2) { // <init> //(La/ka;Ljava/lang/String;Z)V
<localVar:index=0 , name=this , desc=La/ka$a;, sig=null, start=L0, end=L4>
<localVar:index=2 , name=name , desc=Ljava/lang/String;, sig=null, start=L0, end=L4>
<localVar:index=3 , name=daemon , desc=Z, sig=null, start=L0, end=L4>
L0 {
aload 0 // reference to self
aload 1 // reference to arg0
putfield a/ka$a.a:a.ka
}
L1 {
aload 0 // reference to self
aload 1 // reference to arg0
new java/lang/StringBuilder
dup
aload 2 // reference to arg1
invokestatic java/lang/String.valueOf(Ljava/lang/Object;)Ljava/lang/String;
invokespecial java/lang/StringBuilder.<init>(Ljava/lang/String;)V
ldc ".pool[" (java.lang.String)
invokevirtual java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
aload 1 // reference to arg0
dup
invokestatic a/ka.a(La/ka;)I
dup_x1
iconst_1
iadd
invokestatic a/ka.a(La/ka;I)V
invokevirtual java/lang/StringBuilder.append(I)Ljava/lang/StringBuilder;
ldc "]" (java.lang.String)
invokevirtual java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
invokevirtual java/lang/StringBuilder.toString()Ljava/lang/String;
invokespecial java/lang/Thread.<init>(Ljava/lang/ThreadGroup;Ljava/lang/String;)V
}
L2 {
aload 0 // reference to self
iload 3
invokevirtual a/ka$a.setDaemon(Z)V
}
L3 {
return
}
L4 {
}
}
The compiler preferences were also still in the obfuscated jar:
eclipse.preferences.version=1
org.eclipse.jdt.core.compiler.codegen.inlineJsrBytecode=enabled
org.eclipse.jdt.core.compiler.codegen.methodParameters=do not generate
org.eclipse.jdt.core.compiler.codegen.targetPlatform=1.7
org.eclipse.jdt.core.compiler.codegen.unusedLocal=preserve
org.eclipse.jdt.core.compiler.compliance=1.7
org.eclipse.jdt.core.compiler.debug.lineNumber=generate
org.eclipse.jdt.core.compiler.debug.localVariable=generate
org.eclipse.jdt.core.compiler.debug.sourceFile=generate
org.eclipse.jdt.core.compiler.problem.assertIdentifier=error
org.eclipse.jdt.core.compiler.problem.enumIdentifier=error
org.eclipse.jdt.core.compiler.release=disabled
org.eclipse.jdt.core.compiler.source=1.7
I've considered the possibility that there might have been a static method involved, responsible for the logic that ended up getting inlined during compilation. Despite my attempts, I haven't been able to reproduce a similar output. Additionally, I noticed the presence of a synthetic field and the fact that this class was a inner class. These factors seem to play a role in the unusual bytecode structure.
答案1
得分: 6
Loading this
and assigning its instance fields before calling another constructor is perfectly allowed by the JVM. It is the Java language that does not allow setting instance fields before calling this(...)
or super(...)
.
From the JVM spec:
> Each instance initialization method, except for the instance initialization method derived from the constructor of class Object
, must call either another instance initialization method of this
or an instance initialization method of its direct superclass super
before its instance members are accessed.
>
> However, instance fields of this
that are declared in the current class may be assigned by putfield
before calling any instance initialization method.
So the JVM does not allow reading instance fields before calling another constructor, but it allows writes. Implementations of a Java compiler can totally rearrange the code to produce something like this, given that it can prove that the behaviors are the same.
It could also be the work of an obfuscator. One possibility is that this is intended to cause (naive) decompilers to output illegal code. A decompiler looking at this might read the line numbers and assume that there is a this.a = arg0;
statement on the first line, causing the decompiler's output to not compile.
In your particular case, based on the fact that the field is synthetic and this is an inner class, this field is highly likely to store the enclosing instance.
For example, the Inner
class below would need to store an instance Outer
, and a synthetic field would be created for that.
class Outer {
class Inner extends Thread {
// the JVM representation of Inner's constructor takes a parameter of Outer
// so that it can be assigned to the field storing the enclosing instance
}
}
My compiler generates a putfield
before the superclass constructor call, that assigns the first constructor parameter to the synthetic field. In Java code, the Inner
class would look something like this (this is invalid Java code, just for illustrative purposes):
class Inner extends Thread {
private final Outer $this0;
Inner(Outer arg0) {
this.$this0 = arg0;
super();
}
}
Generating putfield
before the super
call is required (credits to Holger for informing me of this!), because the superclass constructor could call a method that the subclass overrides. That method could access the enclosing instance.
class Outer {
public void outerMethod() {
}
class Inner extends SomeSuperClass {
@Override
public void superclassConstructorWillCallThis() {
// This accesses the enclosing instance, i.e. Outer.this
outerMethod();
// if the enclosing instance field is not set before the superclass constructor call,
// this call will throw a NullPointerException
}
}
}
英文:
Loading this
and assigning its instance fields before calling another constructor is perfectly allowed by the JVM. It is the Java language that does not allow setting instance fields before calling this(...)
or super(...)
.
From the JVM spec:
> Each instance initialization method, except for the instance initialization method derived from the constructor of class Object
, must call either another instance initialization method of this
or an instance initialization method of its direct superclass super
before its instance members are accessed.
>
> However, instance fields of this
that are declared in the current class may be assigned by putfield
before calling any instance initialization method.
So the JVM does not allow reading instance fields before calling another constructor, but it allows writes. Implementations of a Java compiler can totally rearrange the code to produce something like this, given that it can prove that the behaviours are the same.
It could also be the work of an obfuscator. One possibility is that this is intended to cause (naive) decompilers to output illegal code. A decompiler looking at this might read the line numbers and assume that there is a this.a = arg0;
statement on the first line, causing the decompiler's output to not compile.
In your particular case, based on the fact that the field is synthetic and this is an inner class, this field is highly likely to store the enclosing instance.
For example, the Inner
class below would need to store an instance Outer
, and a synthetic field would be created for that.
class Outer {
class Inner extends Thread {
// the JVM representation of Inner's constructor takes a parameter of Outer
// so that it can be assigned to the field storing the enclosing instance
}
}
My compiler generates a putfield
before the superclass constructor call, that assigns the first constructor parameter to the synthetic field. In Java code, the Inner
class would look something like this (this is invalid Java code, just for illustrative purposes):
class Inner extends Thread {
private final Outer $this0;
Inner(Outer arg0) {
this.$this0 = arg0;
super();
}
}
Generating putfield
before the super
call is required (credits to Holger for informing me of this!), because the superclass constructor could call a method that the subclass overrides. That method could access the enclosing instance.
class Outer {
public void outerMethod() {
}
class Inner extends SomeSuperClass {
@Override
public void superclassConstructorWillCallThis() {
// This accesses the enclosing instance, i.e. Outer.this
outerMethod();
// if the enclosing instance field is not set before the superclass constructor call,
// this call will throw a NullPointerException
}
}
}
答案2
得分: 4
以下是翻译好的部分:
除了Sweeper的答案之外,原始代码可能如下所示:
class ka {
static int a = 0;
class a extends Thread {
a(String arg1, boolean arg2) {
super(arg1 + ".pool[" + a++ + "]");
setDaemon(arg2);
}
}
}
使用Java 8编译此代码片段将产生与您的字节码类似的结果。
英文:
In addition to the answer from Sweeper, the original code might have looked like this:
class ka {
static int a = 0;
class a extends Thread {
a(String arg1, boolean arg2) {
super(arg1 + ".pool["+ a++ + "]");
setDaemon(arg2);
}
}
}
Compiling this fragment with Java 8 will give bytecode similar to yours.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论