Java字节码谜题:构造函数中的非法操作顺序

huangapple go评论65阅读模式
英文:

Java Bytecode Mystery: Illegal Operation Order in Constructors

问题

我一直在尝试对一个Java应用进行反向工程,发现了一些有趣的东西。我找到的字节码似乎违反了规则,没有在构造函数中首先初始化超类。

我正在努力弄清楚这是如何可能的。这可能是Java编译器的正常行为,还是一种狡猾的混淆技术(注意:原始类名没有被混淆器删除,这表明混淆过程可能不是很彻底。因此,字节码结构不太可能是混淆的故意结果)。

有人可以提供一些关于原始代码可能是什么样的见解,以生成这种非传统的字节码吗?我渴望学习并解开这个谜团。非常感谢!

以下是字节码部分。

final class a/ka$a extends java/lang/Thread {
     <ClassVersion=51>
     <SourceFile=CLThreadPool.java>

     private synthetic a.ka a;

     public ka$a(a.ka arg0, java.lang.String arg1, boolean arg2) { // <init> //(La/ka;Ljava/lang/String;Z)V
         <localVar:index=0 , name=this , desc=La/ka$a;, sig=null, start=L0, end=L4>
         <localVar:index=2 , name=name , desc=Ljava/lang/String;, sig=null, start=L0, end=L4>
         <localVar:index=3 , name=daemon , desc=Z, sig=null, start=L0, end=L4>

         L0 {
             aload 0 // reference to self
             aload 1 // reference to arg0
             putfield a/ka$a.a:a.ka
         }
         L1 {
             aload 0 // reference to self
             aload 1 // reference to arg0
             new java/lang/StringBuilder
             dup
             aload 2 // reference to arg1
             invokestatic java/lang/String.valueOf(Ljava/lang/Object;)Ljava/lang/String;
             invokespecial java/lang/StringBuilder.<init>(Ljava/lang/String;)V
             ldc ".pool[" (java.lang.String)
             invokevirtual java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
             aload 1 // reference to arg0
             dup
             invokestatic a/ka.a(La/ka;)I
             dup_x1
             iconst_1
             iadd
             invokestatic a/ka.a(La/ka;I)V
             invokevirtual java/lang/StringBuilder.append(I)Ljava/lang/StringBuilder;
             ldc "]" (java.lang.String)
             invokevirtual java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
             invokevirtual java/lang/StringBuilder.toString()Ljava/lang/String;
             invokespecial java/lang/Thread.<init>(Ljava/lang/ThreadGroup;Ljava/lang/String;)V
         }
         L2 {
             aload 0 // reference to self
             iload 3
             invokevirtual a/ka$a.setDaemon(Z)V
         }
         L3 {
             return
         }
         L4 {
         }
     }

编译器首选项也仍然在混淆的JAR文件中:

eclipse.preferences.version=1
org.eclipse.jdt.core.compiler.codegen.inlineJsrBytecode=enabled
org.eclipse.jdt.core.compiler.codegen.methodParameters=do not generate
org.eclipse.jdt.core.compiler.codegen.targetPlatform=1.7
org.eclipse.jdt.core.compiler.codegen.unusedLocal=preserve
org.eclipse.jdt.core.compiler.compliance=1.7
org.eclipse.jdt.core.compiler.debug.lineNumber=generate
org.eclipse.jdt.core.compiler.debug.localVariable=generate
org.eclipse.jdt.core.compiler.debug.sourceFile=generate
org.eclipse.jdt.core.compiler.problem.assertIdentifier=error
org.eclipse.jdt.core.compiler.problem.enumIdentifier=error
org.eclipse.jdt.core.compiler.release=disabled
org.eclipse.jdt.core.compiler.source=1.7

我考虑到可能存在一个静态方法,负责在编译期间内联的逻辑。尽管我尝试过,但我还没有能够复现类似的输出。另外,我注意到存在一个合成字段以及这个类是一个内部类。这些因素似乎在不寻常的字节码结构中起了一定作用。

英文:

I've been tinkering with reverse engineering a Java app, and I've stumbled upon something interesting. The bytecode I found seems to break the rules by not initializing the superclass first in a constructor.

I'm trying to figure out how this is possible. Could it be a normal behavior of Java compilers, or is it some sneaky obfuscation technique (Note: It's worth mentioning that the original class name hasn't been stripped by the obfuscator, which indicates that the obfuscation process might not have been very thorough. So, it's less likely that the bytecode structure is a deliberate result of obfuscation.)

Could anyone perhaps offer some insight on what could the original code have looked like to generate such unconventional bytecode? I'm eager to learn and unravel this mystery. Thanks a bunch!

Here is the bytecode.

final class a/ka$a extends java/lang/Thread {
     <ClassVersion=51>
     <SourceFile=CLThreadPool.java>

     private synthetic a.ka a;

     public ka$a(a.ka arg0, java.lang.String arg1, boolean arg2) { // <init> //(La/ka;Ljava/lang/String;Z)V
         <localVar:index=0 , name=this , desc=La/ka$a;, sig=null, start=L0, end=L4>
         <localVar:index=2 , name=name , desc=Ljava/lang/String;, sig=null, start=L0, end=L4>
         <localVar:index=3 , name=daemon , desc=Z, sig=null, start=L0, end=L4>

         L0 {
             aload 0 // reference to self
             aload 1 // reference to arg0
             putfield a/ka$a.a:a.ka
         }
         L1 {
             aload 0 // reference to self
             aload 1 // reference to arg0
             new java/lang/StringBuilder
             dup
             aload 2 // reference to arg1
             invokestatic java/lang/String.valueOf(Ljava/lang/Object;)Ljava/lang/String;
             invokespecial java/lang/StringBuilder.<init>(Ljava/lang/String;)V
             ldc ".pool[" (java.lang.String)
             invokevirtual java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
             aload 1 // reference to arg0
             dup
             invokestatic a/ka.a(La/ka;)I
             dup_x1
             iconst_1
             iadd
             invokestatic a/ka.a(La/ka;I)V
             invokevirtual java/lang/StringBuilder.append(I)Ljava/lang/StringBuilder;
             ldc "]" (java.lang.String)
             invokevirtual java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
             invokevirtual java/lang/StringBuilder.toString()Ljava/lang/String;
             invokespecial java/lang/Thread.<init>(Ljava/lang/ThreadGroup;Ljava/lang/String;)V
         }
         L2 {
             aload 0 // reference to self
             iload 3
             invokevirtual a/ka$a.setDaemon(Z)V
         }
         L3 {
             return
         }
         L4 {
         }
     }


The compiler preferences were also still in the obfuscated jar:

eclipse.preferences.version=1
org.eclipse.jdt.core.compiler.codegen.inlineJsrBytecode=enabled
org.eclipse.jdt.core.compiler.codegen.methodParameters=do not generate
org.eclipse.jdt.core.compiler.codegen.targetPlatform=1.7
org.eclipse.jdt.core.compiler.codegen.unusedLocal=preserve
org.eclipse.jdt.core.compiler.compliance=1.7
org.eclipse.jdt.core.compiler.debug.lineNumber=generate
org.eclipse.jdt.core.compiler.debug.localVariable=generate
org.eclipse.jdt.core.compiler.debug.sourceFile=generate
org.eclipse.jdt.core.compiler.problem.assertIdentifier=error
org.eclipse.jdt.core.compiler.problem.enumIdentifier=error
org.eclipse.jdt.core.compiler.release=disabled
org.eclipse.jdt.core.compiler.source=1.7

I've considered the possibility that there might have been a static method involved, responsible for the logic that ended up getting inlined during compilation. Despite my attempts, I haven't been able to reproduce a similar output. Additionally, I noticed the presence of a synthetic field and the fact that this class was a inner class. These factors seem to play a role in the unusual bytecode structure.

答案1

得分: 6

Loading this and assigning its instance fields before calling another constructor is perfectly allowed by the JVM. It is the Java language that does not allow setting instance fields before calling this(...) or super(...).

From the JVM spec:

> Each instance initialization method, except for the instance initialization method derived from the constructor of class Object, must call either another instance initialization method of this or an instance initialization method of its direct superclass super before its instance members are accessed.
>
> However, instance fields of this that are declared in the current class may be assigned by putfield before calling any instance initialization method.

So the JVM does not allow reading instance fields before calling another constructor, but it allows writes. Implementations of a Java compiler can totally rearrange the code to produce something like this, given that it can prove that the behaviors are the same.

It could also be the work of an obfuscator. One possibility is that this is intended to cause (naive) decompilers to output illegal code. A decompiler looking at this might read the line numbers and assume that there is a this.a = arg0; statement on the first line, causing the decompiler's output to not compile.

In your particular case, based on the fact that the field is synthetic and this is an inner class, this field is highly likely to store the enclosing instance.

For example, the Inner class below would need to store an instance Outer, and a synthetic field would be created for that.

class Outer {
    class Inner extends Thread {
        // the JVM representation of Inner's constructor takes a parameter of Outer
        // so that it can be assigned to the field storing the enclosing instance
    }
}

My compiler generates a putfield before the superclass constructor call, that assigns the first constructor parameter to the synthetic field. In Java code, the Inner class would look something like this (this is invalid Java code, just for illustrative purposes):

class Inner extends Thread {
    private final Outer $this0;

    Inner(Outer arg0) {
        this.$this0 = arg0;
        super();
    }
}

Generating putfield before the super call is required (credits to Holger for informing me of this!), because the superclass constructor could call a method that the subclass overrides. That method could access the enclosing instance.

class Outer {
    public void outerMethod() {

    }

    class Inner extends SomeSuperClass {
        
        @Override
        public void superclassConstructorWillCallThis() {
            // This accesses the enclosing instance, i.e. Outer.this
            outerMethod();
            // if the enclosing instance field is not set before the superclass constructor call,
            // this call will throw a NullPointerException
        }
    }
}
英文:

Loading this and assigning its instance fields before calling another constructor is perfectly allowed by the JVM. It is the Java language that does not allow setting instance fields before calling this(...) or super(...).

From the JVM spec:

> Each instance initialization method, except for the instance initialization method derived from the constructor of class Object, must call either another instance initialization method of this or an instance initialization method of its direct superclass super before its instance members are accessed.
>
> However, instance fields of this that are declared in the current class may be assigned by putfield before calling any instance initialization method.

So the JVM does not allow reading instance fields before calling another constructor, but it allows writes. Implementations of a Java compiler can totally rearrange the code to produce something like this, given that it can prove that the behaviours are the same.

It could also be the work of an obfuscator. One possibility is that this is intended to cause (naive) decompilers to output illegal code. A decompiler looking at this might read the line numbers and assume that there is a this.a = arg0; statement on the first line, causing the decompiler's output to not compile.

In your particular case, based on the fact that the field is synthetic and this is an inner class, this field is highly likely to store the enclosing instance.

For example, the Inner class below would need to store an instance Outer, and a synthetic field would be created for that.

class Outer {
    class Inner extends Thread {
        // the JVM representation of Inner's constructor takes a parameter of Outer
        // so that it can be assigned to the field storing the enclosing instance
    }
}

My compiler generates a putfield before the superclass constructor call, that assigns the first constructor parameter to the synthetic field. In Java code, the Inner class would look something like this (this is invalid Java code, just for illustrative purposes):

class Inner extends Thread {
    private final Outer $this0;

    Inner(Outer arg0) {
        this.$this0 = arg0;
        super();
    }
}

Generating putfield before the super call is required (credits to Holger for informing me of this!), because the superclass constructor could call a method that the subclass overrides. That method could access the enclosing instance.

class Outer {
    public void outerMethod() {

    }

    class Inner extends SomeSuperClass {
        
        @Override
        public void superclassConstructorWillCallThis() {
            // This accesses the enclosing instance, i.e. Outer.this
            outerMethod();
            // if the enclosing instance field is not set before the superclass constructor call,
            // this call will throw a NullPointerException
        }
    }
}

答案2

得分: 4

以下是翻译好的部分:

除了Sweeper的答案之外,原始代码可能如下所示:

class ka {
    static int a = 0;
    class a extends Thread {
        a(String arg1, boolean arg2) {
            super(arg1 + ".pool[" + a++ + "]");
            setDaemon(arg2);
        }
    }
}

使用Java 8编译此代码片段将产生与您的字节码类似的结果。

英文:

In addition to the answer from Sweeper, the original code might have looked like this:

class ka {
    static int a = 0;
    class a extends Thread {
        a(String arg1, boolean arg2) {
            super(arg1 + ".pool["+ a++ + "]");
            setDaemon(arg2);
        }
    }
}

Compiling this fragment with Java 8 will give bytecode similar to yours.

huangapple
  • 本文由 发表于 2023年7月18日 15:55:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76710612.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定