英文:
Does a multi programming language parsing / function extraction toolkit exist?
问题
我正在寻找一种从多种不同的编程语言中提取函数名称和其定义的方法。我希望避免手动编写提取器,因为我想支持大约15种编程语言。
是否有可以用来实现这一目标的库或程序?搜索并没有给我任何有用的结果。
我目前在我的应用程序中使用Go语言,但我不介意用其他语言处理这个问题。
该应用程序本身将是开源的,因此不希望使用专有解决方案。
英文:
I'm looking for a way to extract function names and their definitions from multiple different programming languages. I would like to avoid writing extractors by hand as I want to support about 15 programming languages.
Is there a library / program that could be used to achieve this? Searching didn't give me any useful results.
I'm currently using go for my application, but I don't mind handling this in a different language.
The app itself will be open-source so proprietary solutions are not desired.
答案1
得分: 2
如果你只想提取函数而不是解析源文件,那么传统的做法是使用ctags。
大多数类Unix操作系统要么已经安装了ctags,要么可以获得ctags。然而,ctags并不是一个单独的程序。就像其他Unix实用程序一样,它可能最初是一个单独的程序,但现在有几个ctags的实现版本。
最广泛使用的实现可能是Exuberant Ctags。它对许多语言有相当好的支持,但它不支持许多更现代的语言(例如,它不原生支持go语言)。目前它支持大约40种语言:http://ctags.sourceforge.net/languages.html
Universal Ctags是一个较新的项目,我相信它起初是Exuberant Ctags的一个分支。Universal Ctags支持更多的语言(包括go语言):https://github.com/universal-ctags/ctags/tree/master/parsers
Ctags生成一个包含所有找到的对象信息的tags
文件。tags
文件的实际格式取决于ctags程序的实现,但它们通常包含找到的对象的类型(变量、类、函数等)、所在的文件、行号以及对于Exuberant Ctags来说,查找该对象所需的搜索项(有时是字符串文字,有时是正则表达式)。
英文:
If you just want to extract functions instead of parsing the source files then the traditional way to do this is using ctags.
Most unix-like OSes either comes with ctags already installed or has ctags available. However, ctags is not a single program. Like other unix utility programs it may have started as a single program but by now there are several implementations of ctags.
The most widely used implementation is probably Exuberant Ctags. It has a fairly good coverage of languages but it does not handle a lot of more modern languages (for example, it does not natively handle go). It currently supports around 40 languages: http://ctags.sourceforge.net/languages.html
Universal Ctags is a more recent project and I believe started as a fork of Exuberant Ctags. Universal Ctags supports a lot more languages (including go): https://github.com/universal-ctags/ctags/tree/master/parsers
Ctags generates a tags
file containing information of all the objects found. The actual format of the tags
file depends on the implementation of the ctags program but they generally contain what type of object was found (variable, class, function etc.), the file it was found in, the line number and for Exuberant Ctags the search term you need to find the object (sometimes a string literal sometimes a regexp).
答案2
得分: 1
这很难做到,因为每种语言对于合法语法和构成“函数”的规则都不同。
我可以提供我们公司的DMS软件重构工具包作为一种实现的方式。我们已经解决了解析多种语言(也许包括你提到的15种语言,请参考DMS支持的语言列表)和构建各种类型的事实提取机制的问题。你需要根据你想要提取的具体事实进行定制。
[是的,这是我们的专有产品。在我回答这个问题之后,OP提出了非专有的要求。其他人可能没有这个限制。]
英文:
This isn't easy to do, because each language has different rules about legal syntax and what constitutes a "function".
I can offer my company's DMS Software Reengineering Toolkit as a way to do this. We've fought the battle of parsing multiple languages (maybe all of your 15, see list of languages supported by DMS) and building various kinds of fact-extraction machinery. You'd have to customize it for the specific facts you want to extract.
[Yes, its proprietary. OP added a not-proprietary requirement after I answered this question. Other folks might not have this constraint.]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论