为什么在使用JavaLang Python包时会在某些Java文件上引发异常?

huangapple go评论73阅读模式
英文:

why exception is rised on some java files using JavaLang python package

问题

我会提供代码部分的翻译,但在翻译代码时,我将忽略掉HTML转义字符(如 &")。

import javalang
import os
from javalang import tree

class JLCodeAnalyzer:
    def __init__(self, code_path):
        self.code_path = code_path
        self.codelines = None

    def get_method_start_end(self, method_node, tree):
        startpos = None
        endpos = None
        startline = None
        endline = None
        for path, node in tree:
            if startpos is not None and method_node not in path:
                endpos = node.position
                endline = node.position.line if node.position is not None else None
                break
            if startpos is None and node == method_node:
                startpos = node.position
                startline = node.position.line if node.position is not None else None
        return startpos, endpos, startline, endline

    def get_method_text(self, startpos, endpos, startline, endline, last_endline_index, codelines, tree):
        if startpos is None:
            return "", None, None, None
        else:
            startline_index = startline - 1
            endline_index = endline - 1 if endpos is not None else None

            # 1. check for and fetch annotations
            if last_endline_index is not None:
                for line in codelines[(last_endline_index + 1):(startline_index)]:
                    if "@" in line:
                        startline_index = startline_index - 1
            meth_text = "<ST>".join(codelines[startline_index:endline_index])
            meth_text = meth_text[:meth_text.rfind("}") + 1]

            # 2. remove trailing rbrace for last methods & any external content/comments
            # if endpos is None and
            if not abs(meth_text.count("}") - meth_text.count("{")) == 0:
                # imbalanced braces
                brace_diff = abs(meth_text.count("}") - meth_text.count("{"))

                for _ in range(brace_diff):
                    meth_text = meth_text[:meth_text.rfind("}")]
                    meth_text = meth_text[:meth_text.rfind("}") + 1]

            meth_lines = meth_text.split("<ST>")
            meth_text = "".join(meth_lines)
            last_endline_index = startline_index + (len(meth_lines) - 1)

            return meth_text, (startline_index + 1), (last_endline_index + 1), last_endline_index

    def get_java_files(self, directory):
        '''
        :param directory: path to the main directory of java files
        :return: list of java files found
        search for all .java files recursively in "directory"
        '''

        java_files = []
        for root, _, files in os.walk(directory):
            for file in files:
                if file.endswith(".java"):
                    java_files.append(os.path.join(root, file))
        return java_files

    def start(self):
        java_files = self.get_java_files(self.code_path)


        for target_file in java_files:
            with open(target_file, 'r') as r:
                codelines = r.readlines()
                code_text = ''.join(codelines)
                lex = None
                print("working on ", target_file)
                tree = javalang.parse.parse(code_text)
                methods = {}
                for _, method_node in tree.filter(javalang.tree.MethodDeclaration):
                    startpos, endpos, startline, endline = self.get_method_start_end(method_node, tree)
                    method_text, startline, endline, lex = self.get_method_text(startpos, endpos, startline, endline, lex, codelines, tree)
                    methods[method_node.name] = method_text

                print(f"total methods in {target_file} = {len(methods)})

# ...

关于您在代码中遇到的错误,如 &amp;&quot;,它们是HTML实体编码,用于表示字符 &",它们在HTML文档中通常被解释为特殊字符。在Python代码中,您应该将它们还原为 &",以使代码正确解析。

英文:

I am aiming to obtain all methods names in a java file and count theit lines of code.
I actually managed to obtain something using the javalang parser for python. It identifies all methods but not creator which make me think there is a specific property I should use to check weather the creator exists or not and retrieve its name (any idea?).

It works well (javalang python pkg) most of the fimes but it fails on some files like the following one, which rises an exception

/*
* Decompiled with CFR 0.152.
*/
package com.vaadin.flow.di;
import com.vaadin.flow.component.Component;
import com.vaadin.flow.component.HasElement;
import com.vaadin.flow.component.UI;
import com.vaadin.flow.i18n.I18NProvider;
import com.vaadin.flow.router.NavigationEvent;
import com.vaadin.flow.server.BootstrapListener;
import com.vaadin.flow.server.DependencyFilter;
import com.vaadin.flow.server.VaadinService;
import com.vaadin.flow.server.VaadinServiceInitListener;
import com.vaadin.flow.server.VaadinSession;
import com.vaadin.flow.server.communication.IndexHtmlRequestListener;
import java.io.Serializable;
import java.util.stream.Stream;
public interface Instantiator
extends Serializable {
@Deprecated
public boolean init(VaadinService var1);
public Stream&lt;VaadinServiceInitListener&gt; getServiceInitListeners();
@Deprecated
default public Stream&lt;BootstrapListener&gt; getBootstrapListeners(Stream&lt;BootstrapListener&gt; serviceInitListeners) {
return serviceInitListeners;
}
default public Stream&lt;IndexHtmlRequestListener&gt; getIndexHtmlRequestListeners(Stream&lt;IndexHtmlRequestListener&gt; indexHtmlRequestListeners) {
return indexHtmlRequestListeners;
}
default public Stream&lt;DependencyFilter&gt; getDependencyFilters(Stream&lt;DependencyFilter&gt; serviceInitFilters) {
return serviceInitFilters;
}
public &lt;T&gt; T getOrCreate(Class&lt;T&gt; var1);
default public &lt;T extends HasElement&gt; T createRouteTarget(Class&lt;T&gt; routeTargetType, NavigationEvent event) {
return (T)((HasElement)this.getOrCreate(routeTargetType));
}
public &lt;T extends Component&gt; T createComponent(Class&lt;T&gt; var1);
public static Instantiator get(UI ui) {
if (!1.$assertionsDisabled &amp;&amp; ui == null) {
throw new AssertionError();
}
VaadinSession session = ui.getSession();
if (!1.$assertionsDisabled &amp;&amp; session == null) {
throw new AssertionError();
}
return session.getService().getInstantiator();
}
default public I18NProvider getI18NProvider() {
return this.getOrCreate(I18NProvider.class);
}
static {
if (1.$assertionsDisabled) {
// empty if block
}
}
}

which causes an exception:

Traceback (most recent call last):
File &quot;/Users/irene/PycharmProjects/pythonProject/main.py&quot;, line 113, in &lt;module&gt;
main()
File &quot;/Users/irene/PycharmProjects/pythonProject/main.py&quot;, line 109, in main
jlca.start()
File &quot;/Users/irene/PycharmProjects/pythonProject/JLCodeAnalyzer.py&quot;, line 80, in start
tree = javalang.parse.parse(code_text)
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parse.py&quot;, line 53, in parse
return parser.parse()
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 110, in parse
return self.parse_compilation_unit()
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 302, in parse_compilation_unit
type_declaration = self.parse_type_declaration()
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 347, in parse_type_declaration
return self.parse_class_or_interface_declaration()
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 360, in parse_class_or_interface_declaration
type_declaration = self.parse_normal_interface_declaration()
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 436, in parse_normal_interface_declaration
body = self.parse_interface_body()
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 966, in parse_interface_body
declaration = self.parse_interface_body_declaration()
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 981, in parse_interface_body_declaration
declaration = self.parse_interface_member_declaration()
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 1008, in parse_interface_member_declaration
declaration = self.parse_interface_method_or_field_declaration()
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 1018, in parse_interface_method_or_field_declaration
member = self.parse_interface_method_or_field_rest()
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 1035, in parse_interface_method_or_field_rest
rest = self.parse_interface_method_declarator_rest()
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 1082, in parse_interface_method_declarator_rest
body = self.parse_block()
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 1274, in parse_block
statement = self.parse_block_statement()
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 1339, in parse_block_statement
return self.parse_statement()
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 1389, in parse_statement
condition = self.parse_par_expression()
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 2032, in parse_par_expression
self.accept(&#39;)&#39;)
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 131, in accept
self.illegal(&quot;Expected &#39;%s&#39;&quot; % (accept,))
File &quot;/usr/local/lib/python3.10/site-packages/javalang/parser.py&quot;, line 119, in illegal
raise JavaSyntaxError(description, at)
javalang.parser.JavaSyntaxError

And here is the JLCodeAnalyzer:

import javalang
import os
from javalang import tree
class JLCodeAnalyzer:
def __init__(self, code_path):
self.code_path = code_path
self.codelines = None
def get_method_start_end(self, method_node, tree):
startpos  = None
endpos    = None
startline = None
endline   = None
for path, node in tree:
if startpos is not None and method_node not in path:
endpos = node.position
endline = node.position.line if node.position is not None else None
break
if startpos is None and node == method_node:
startpos = node.position
startline = node.position.line if node.position is not None else None
return startpos, endpos, startline, endline
def get_method_text(self, startpos, endpos, startline, endline, last_endline_index, codelines, tree):
if startpos is None:
return &quot;&quot;, None, None, None
else:
startline_index = startline - 1
endline_index = endline - 1 if endpos is not None else None
# 1. check for and fetch annotations
if last_endline_index is not None:
for line in codelines[(last_endline_index + 1):(startline_index)]:
if &quot;@&quot; in line:
startline_index = startline_index - 1
meth_text = &quot;&lt;ST&gt;&quot;.join(codelines[startline_index:endline_index])
meth_text = meth_text[:meth_text.rfind(&quot;}&quot;) + 1]
# 2. remove trailing rbrace for last methods &amp; any external content/comments
# if endpos is None and
if not abs(meth_text.count(&quot;}&quot;) - meth_text.count(&quot;{&quot;)) == 0:
# imbalanced braces
brace_diff = abs(meth_text.count(&quot;}&quot;) - meth_text.count(&quot;{&quot;))
for _ in range(brace_diff):
meth_text  = meth_text[:meth_text.rfind(&quot;}&quot;)]
meth_text  = meth_text[:meth_text.rfind(&quot;}&quot;) + 1]
meth_lines = meth_text.split(&quot;&lt;ST&gt;&quot;)
meth_text  = &quot;&quot;.join(meth_lines)
last_endline_index = startline_index + (len(meth_lines) - 1)
return meth_text, (startline_index + 1), (last_endline_index + 1), last_endline_index
def get_java_files(self, directory):
&#39;&#39;&#39;
:param directory: path to the main directory of java files
:return: list of java files found
search for all .java files recursively in &quot;directory&quot;
&#39;&#39;&#39;
java_files = []
for root, _, files in os.walk(directory):
for file in files:
if file.endswith(&quot;.java&quot;):
java_files.append(os.path.join(root, file))
return java_files
def start(self):
java_files = self.get_java_files(self.code_path)
for target_file in java_files:
with open(target_file, &#39;r&#39;) as r
codelines = r.readlines()
code_text = &#39;&#39;.join(codelines)
lex = None
print(&quot;working on &quot;, target_file)
tree = javalang.parse.parse(code_text)
methods = {}
for _, method_node in tree.filter(javalang.tree.MethodDeclaration):
startpos, endpos, startline, endline = self.get_method_start_end(method_node, tree)
method_text, startline, endline, lex = self.get_method_text(startpos, endpos, startline, endline, lex, codelines, tree)
methods[method_node.name] = method_text
print(f&quot;total methods in {target_file} = {len(methods)}&quot;)

I tried to compile the code on line and got some syntax errors.
For example:

error: &#39;)&#39; expected
if (!1.$assertionsDisabled &amp;&amp; ui == null) {
^

but ths code has been downloaded from a maven reposotory, I would expect it has a correct syntax!
Or am I wrong ?

答案1

得分: 2

这不是有效的Java代码。例如:

if (!1.$assertionsDisabled && ui == null) {

我怀疑你实际上下载了一个.class文件并对其进行反编译。不幸的是,反编译器没有正常工作,你所拥有的内容无法编译。

我的建议是尝试获取实际的Java源代码(如果存在的话),或者将其从你的分析中排除。

从导入语句等方面有一些提示,表明你反编译的".class"文件最初是从Vaadin而不是Java编译的。


注意:Maven存储库通常保存已编译的构件而不是源代码。你通常会从源代码存储库获取源代码。

英文:

That is not valid Java code. For example:

    if (!1.$assertionsDisabled &amp;&amp; ui == null) {

I suspect that what you have actually done is downloaded a .class file and decompiled it<sup>1</sup>. Unfortunately, the decompiler has not worked properly, and what you have there cannot be compiled.

My suggestions are to try to get the actual Java source code (if it exists) or just exclude it from your analysis.

There are hints (e.g. in the import statements) that the ".class" file you have decompiled was originally compiled from Vaadin rather than Java.


<sup>1 - Note that a Maven repository will typically hold compiled artifacts rather that source code. You would typically get source code from a source code repository.</sup>

huangapple
  • 本文由 发表于 2023年4月4日 15:46:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75926758.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定