英文:
I am looking for a regular expression in python which will identify all the function bodies in a C file
问题
我正在寻找一个Python中的正则表达式,它将识别所有的C函数。
我想要自动在每个函数的开头插入一些注释,例如一个函数看起来像这样:
static my_struct1* alloc_mem (my_struct2* a)
{
...
}
我想要插入注释,使其看起来像这样:
static my_struct1* alloc_mem (my_struct2* a)
{
/* my comment */
...
}
所以我想要识别所有函数体的开头(以{
结尾),然后在那里插入注释。
我尝试了下面的代码:
def insert_comment():
comment = "/* my comment */"
pattern = r'[^(if|else|switch|for|if\s+|else\s+|switch\s+|for\s+)]\(.*\)(\s|\n)*\{'
matches = list(re.finditer(pattern1, content))
for match in matches:
print('*****')
print(match.group())
with open(filename, "r") as i:
content = i.read()
insert_comment()
但这也匹配了具有嵌套(
表达式的if
和else
语句。
例如,
if(MACRO(expa) && MACRO(expb)) {
然后匹配的模式将是
O(expb)) {
有什么更好的正则表达式可以用来获取函数体的开头吗?
英文:
I am looking for a RE in python which will identify all the C function
I want to automate the insertion of some comments at the start of each function,
For example a function looks like
static my_struct1* alloc_mem (my_struct2* a)
{
...
}
I want to insert comment and make it look like
static my_struct1* alloc_mem (my_struct2* a)
{
/* my comment */
...
}
So I want to identify all the head (ending with {
) of function bodies and insert comment there.
I tried below code:
def insert_comment():
comment = "/* my comment */"
pattern = r'[^(if|else|switch|for|if\s+|else\s+|switch\s+|for\s+)]\(.*\)(\s|\n)*\{'
matches = list(re.finditer(pattern1, content))
for match in matches:
print('*****')
print(match.group())
with open(filename, "r") as i:
content = i.read()
insert_comment()
But this is also matching if
and else
statements which have nested (
expression.
For example,
if(MACRO(expa) && MACRO(expb)) {
Then the matched pattern will be
O(expb)) {
What might be a better RE to get start of function body?
答案1
得分: 0
根据其他人的评论,C语法过于复杂,如果没有一个专门的解析器,就很难进行严格分析。
然而,就你的需求而言,似乎一些较松散的条件下的识别问题可能是可以接受的(无害的),因为它不会导致对原始源代码的缺陷,只是插入注释。
以下是一个简单的代码示例,用于在有限条件下在所需的位置插入注释:
#!/usr/bin/python
import regex
with open(filename) as f:
s = f.read()
m = regex.sub(r'\b(?:if|switch|for|while|until)\b(*SKIP)(*FAIL)|(([A-Za-z_]\w*\s*\((?:[^()]+|(?2))*\))\s*{)', r'\n/* my comment */', s)
print(m)
[解释]
regex
是一个带有额外正则表达式功能的 PyPi 模块。(?:if|switch|for|while|until)
是类似于C函数的保留名称(应该被排除在外)。\b(?:if|switch|for|while|until)\b(*SKIP)(*FAIL)
丢弃了这些匹配项。(([A-Za-z_]\w*\s*\((?:[^()]+|(?2))*\))\s*{)
是一个递归正则表达式,用于匹配带有平衡括号的C函数名称。
[编辑]
如果你确定宏名称都是大写字母,你可以通过调整正则表达式来排除它们:
#!/usr/bin/python
import regex
with open(filename) as f:
s = f.read()
m = regex.sub(r'\b(?:if|switch|for|while|until|[A-Z_]+)\b(*SKIP)(*FAIL)|(([A-Za-z_]\w*\s*\((?:[^()]+|(?2))*\))\s*{)', r'\n/* my comment */', s)
print(m)
输入示例:
static my_struct1* alloc_mem (my_struct2* a)
{
// function
}
some_function(foo) {
// function
}
if(MACRO(expa) && MACRO(expb)) {
// reserved word
}
TAILQ_FOREACH(entry, hent, next) {
// macro
}
while(true) {
// reserved word
}
输出:
static my_struct1* alloc_mem (my_struct2* a)
{
/* my comment */
// function
}
some_function(foo) {
/* my comment */
// function
}
if(MACRO(expa) && MACRO(expb)) {
// reserved word
}
TAILQ_FOREACH(entry, hent, next) {
// macro
}
while(true) {
// reserved word
}
[解释]
正则表达式 [A-Z_]+
匹配所有大写字母,将其添加到排除列表中:
(?:if|switch|for|while|until|[A-Z_]+)
英文:
As commented by others, C syntax is too complex to analyze without
a dedicated parser in a strict sense.
As for your requirements, however, it looks some under/over detection
may be acceptable (harmless), because it will not cause defects to
the original source code just to insert comments.
Here is a simple code to start with to insert comments at your desired points
under limited conditions:
#!/usr/bin/python
import regex
with open(filename) as f:
s = f.read()
m = regex.sub(r'\b(?:if|switch|for|while|until)\b(*SKIP)(*FAIL)|(([A-Za-z_]\w*\s*\((?:[^()]+|(?2))*\))\s*{)', r'\n/* my comment */', s)
print(m)
[Explanations]
regex
is a PyPi module with additional regex functionalities.(?:if|switch|for|while|until)
is the reserved names which has
a syntax similar to C functions (to be excluded).\b(?:if|switch|for|while|until)\b(*SKIP)(*FAIL)
discards these matches.(([A-Za-z_]\w*\s*\((?:[^()]+|(?2))*\))\s*{)
is a recursive regex which
matches C function names with balanced parentheses.
[Edit]
If you are sure the macro names are all uppercase letters, you can exclude them by tweaking the regex as:
#!/usr/bin/python
import regex
with open(filename) as f:
s = f.read()
m = regex.sub(r'\b(?:if|switch|for|while|until|[A-Z_]+)\b(*SKIP)(*FAIL)|(([A-Za-z_]\w*\s*\((?:[^()]+|(?2))*\))\s*{)', r'\n/* my comment */', s)
print(m)
Input example:
static my_struct1* alloc_mem (my_struct2* a)
{
// function
}
some_function(foo) {
// function
}
if(MACRO(expa) && MACRO(expb)) {
// reserved word
}
TAILQ_FOREACH(entry, hent, next) {
// macro
}
while(true) {
// reserved word
}
Output:
static my_struct1* alloc_mem (my_struct2* a)
{
/* my comment */
// function
}
some_function(foo) {
/* my comment */
// function
}
if(MACRO(expa) && MACRO(expb)) {
// reserved word
}
TAILQ_FOREACH(entry, hent, next) {
// macro
}
while(true) {
// reserved word
}
[Explanation]
The regex [A-Z_]+
, which matches the all uppercase letters, is appended to
the list of exclusion as:
(?:if|switch|for|while|until|[A-Z_]+)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论