2023年7月7日 03:32:51go评论118阅读模式

英文:

I am looking for a regular expression in python which will identify all the function bodies in a C file

问题

我正在寻找一个Python中的正则表达式，它将识别所有的C函数。

我想要自动在每个函数的开头插入一些注释，例如一个函数看起来像这样：

static my_struct1* alloc_mem (my_struct2* a)
{
... 
}

我想要插入注释，使其看起来像这样：

static my_struct1* alloc_mem (my_struct2* a)
{
/* my comment */
... 
}

所以我想要识别所有函数体的开头（以{结尾），然后在那里插入注释。

我尝试了下面的代码：

def insert_comment():
    comment = "/* my comment */"
    pattern = r'[^(if|else|switch|for|if\s+|else\s+|switch\s+|for\s+)]\(.*\)(\s|\n)*\{'
    matches = list(re.finditer(pattern1, content))
    for match in matches:
        print('*****')
        print(match.group())
with open(filename, "r") as i:
    content = i.read()
    insert_comment()

但这也匹配了具有嵌套(表达式的if和else语句。

例如，

if(MACRO(expa) && MACRO(expb)) {

然后匹配的模式将是

O(expb)) {

有什么更好的正则表达式可以用来获取函数体的开头吗？

英文:

I am looking for a RE in python which will identify all the C function

I want to automate the insertion of some comments at the start of each function,
For example a function looks like

static my_struct1* alloc_mem (my_struct2* a)
{
...
}

I want to insert comment and make it look like

static my_struct1* alloc_mem (my_struct2* a)
{
/* my comment */
...
}

So I want to identify all the head (ending with {) of function bodies and insert comment there.

I tried below code:

def insert_comment():
    comment = &quot;/* my comment */&quot;
    pattern = r&#39;[^(if|else|switch|for|if\s+|else\s+|switch\s+|for\s+)]\(.*\)(\s|\n)*\{&#39;
    matches = list(re.finditer(pattern1, content))
    for match in matches:
        print(&#39;*****&#39;)
        print(match.group())
with open(filename, &quot;r&quot;) as i:
        content = i.read()
        insert_comment()

But this is also matching if and else statements which have nested ( expression.

For example,

if(MACRO(expa) &amp;&amp; MACRO(expb)) {

Then the matched pattern will be

O(expb)) {

What might be a better RE to get start of function body?

答案1

得分: 0

根据其他人的评论，C语法过于复杂，如果没有一个专门的解析器，就很难进行严格分析。

然而，就你的需求而言，似乎一些较松散的条件下的识别问题可能是可以接受的（无害的），因为它不会导致对原始源代码的缺陷，只是插入注释。

以下是一个简单的代码示例，用于在有限条件下在所需的位置插入注释：

#!/usr/bin/python
import regex
with open(filename) as f:
    s = f.read()
    m = regex.sub(r'\b(?:if|switch|for|while|until)\b(*SKIP)(*FAIL)|(([A-Za-z_]\w*\s*\((?:[^()]+|(?2))*\))\s*{)', r'\n/* my comment */', s)
    print(m)

[解释]

regex 是一个带有额外正则表达式功能的 PyPi 模块。
(?:if|switch|for|while|until) 是类似于C函数的保留名称（应该被排除在外）。
\b(?:if|switch|for|while|until)\b(*SKIP)(*FAIL) 丢弃了这些匹配项。
(([A-Za-z_]\w*\s*\((?:[^()]+|(?2))*\))\s*{) 是一个递归正则表达式，用于匹配带有平衡括号的C函数名称。

[编辑]
如果你确定宏名称都是大写字母，你可以通过调整正则表达式来排除它们：

#!/usr/bin/python
import regex
with open(filename) as f:
    s = f.read()
    m = regex.sub(r'\b(?:if|switch|for|while|until|[A-Z_]+)\b(*SKIP)(*FAIL)|(([A-Za-z_]\w*\s*\((?:[^()]+|(?2))*\))\s*{)', r'\n/* my comment */', s)
    print(m)

输入示例：

static my_struct1* alloc_mem (my_struct2* a)
{ 
// function
}
some_function(foo) {
// function
}
if(MACRO(expa) && MACRO(expb)) {
// reserved word
}
TAILQ_FOREACH(entry, hent, next) {
// macro
}
while(true) {
// reserved word
}

输出：

static my_struct1* alloc_mem (my_struct2* a)
{
/* my comment */
// function
}
some_function(foo) {
/* my comment */
// function
}
if(MACRO(expa) && MACRO(expb)) {
// reserved word
}
TAILQ_FOREACH(entry, hent, next) {
// macro
}
while(true) {
// reserved word
}

[解释]
正则表达式 [A-Z_]+ 匹配所有大写字母，将其添加到排除列表中：

(?:if|switch|for|while|until|[A-Z_]+)

英文:

As commented by others, C syntax is too complex to analyze without
a dedicated parser in a strict sense.
As for your requirements, however, it looks some under/over detection
may be acceptable (harmless), because it will not cause defects to
the original source code just to insert comments.

Here is a simple code to start with to insert comments at your desired points
under limited conditions:

#!/usr/bin/python
import regex
with open(filename) as f:
    s = f.read()
    m = regex.sub(r&#39;\b(?:if|switch|for|while|until)\b(*SKIP)(*FAIL)|(([A-Za-z_]\w*\s*\((?:[^()]+|(?2))*\))\s*{)&#39;, r&#39;\n/* my comment */&#39;, s)
    print(m)

[Explanations]

regex is a PyPi module with additional regex functionalities.
(?:if|switch|for|while|until) is the reserved names which has
a syntax similar to C functions (to be excluded).
\b(?:if|switch|for|while|until)\b(*SKIP)(*FAIL) discards these matches.
(([A-Za-z_]\w*\s*\((?:[^()]+|(?2))*\))\s*{) is a recursive regex which
matches C function names with balanced parentheses.

[Edit]
If you are sure the macro names are all uppercase letters, you can exclude them by tweaking the regex as:

#!/usr/bin/python
import regex
with open(filename) as f:
    s = f.read()
    m = regex.sub(r&#39;\b(?:if|switch|for|while|until|[A-Z_]+)\b(*SKIP)(*FAIL)|(([A-Za-z_]\w*\s*\((?:[^()]+|(?2))*\))\s*{)&#39;, r&#39;\n/* my comment */&#39;, s)
    print(m)

Input example:

static my_struct1* alloc_mem (my_struct2* a)
{ 
// function
}
some_function(foo) {
// function
}
if(MACRO(expa) &amp;&amp; MACRO(expb)) {
// reserved word
}
TAILQ_FOREACH(entry, hent, next) {
// macro
}
while(true) {
// reserved word
}

Output:

static my_struct1* alloc_mem (my_struct2* a)
{
/* my comment */
// function
}
some_function(foo) {
/* my comment */
// function
}
if(MACRO(expa) &amp;&amp; MACRO(expb)) {
// reserved word
}
TAILQ_FOREACH(entry, hent, next) {
// macro
}
while(true) {
// reserved word
}

[Explanation]

The regex [A-Z_]+, which matches the all uppercase letters, is appended to
the list of exclusion as:

(?:if|switch|for|while|until|[A-Z_]+)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我正在寻找一个在Python中识别C文件中所有函数体的正则表达式。

问题

答案1

能否将引用数据加入到pandas数据框中的嵌套字典？

Git push heroku main命令错误，pywin32错误。

sklearn fit_transform() CopyWarning : A value is trying to be set on a copy of a slice from a DataFrame

测量编译程序的CPU使用率（以核心为单位）和内存使用情况。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。