英文:
Flex doesn't seem to be recognising my defintions correctly
问题
以下是翻译好的代码部分:
我正在尝试使用以下代码来匹配注释(注释以花括号开始和结束):
com ^"\{"(.|\n)*"\}"$
%option noyywrap
%%
[^{com}] ;
{com} printf("%s",yytext);
%%
void main()
{
yylex();
}
在这段文本上:
第一行 {第一个注释}
第二行 {多行
注释}
我得到了以下输出:
{comm}co{mcomm}
看起来只匹配了字母c,o和m(这个结果会根据定义注释的单词而改变),但同时包括了花括号。我尝试更改测试文本,但没有成功。
<details>
<summary>英文:</summary>
I am trying to match comments ({comment ends and starts with curtly brackets}) using this code
com ^"{"(.|\n)*"}"$
%option noyywrap
%%
[^{com}] ;
{com} printf("%s",yytext);
%%
void main()
{
yylex();
}
on this piece of text:
first line {first comment}
second line {multiline
comment}
I am getting this out put:
{comm}co{mcomm}
which seems to match only the letters c,o and m(this changes with changing the word com, it matches every letter of the word used to define the comment) but at the same time it includes curly brackets.
I tried changing the test text but no success.
</details>
# 答案1
**得分**: 1
以下是翻译好的部分:
"com" 宏现在已经足够简单,因此你可以考虑将正则表达式直接嵌入到你的规则中,而不是通过 "com" 定义。但这是一个风格上的问题,可以根据需要来决定。
另外,如果你想删除注释而不是非注释部分,或者如果你想在输出的注释文本中排除大括号,那么你可以通过对这些规则的操作进行适当的修改来实现这一点。
<details>
<summary>英文:</summary>
There are several problems with your scanner definition. The biggest one is that the pattern in this rule ...
> [^{com}] ;
... doesn't mean at all what you seem to think it means. What you have in mind seems to be to ignore anything that does not match the regex to which `com` expands, but the pattern in that rule doesn't mean anything remotely like that. It just matches and discards one character at a time that is not among `{`, `}`, `c`, `o`, `m`.
The idiomatic way to handle input that is not matched by any (other) rule is to add a match-anything rule at the end of the rule list. That would bring you to something along these lines ...
com ^"{"(.|\n)"}"$
%%
{com} printf("%s",yytext);
. / discard a single character that otherwise is unmatched */;
%%
But then you will see that you have some secondary problems:
- The definition of `com` expands to a regular expression that is anchored to the beginning and end of a line. From your example input, you at least don't seem to want to anchor to the beginning of the line, but the fact that you have a closing delimiter at all suggests that you probably don't want to anchor to the end of the line, either.
- The `com` regex does not stop matching at the first closing brace. It will collect everything from the opening brace of the first comment to the closing brace of the last comment.
Additionally, as a matter of style, it is not necessary or idiomatic to both quote and escape the curly braces.
This variation, then, will serve your intended purposes better:
com {[^}]*}
%%
{com} printf("%s",yytext);
. ;
%%
That version of `com` expands to a regular expression matching an opening curly brace (`{`), anywhere, followed by any number of characters other than a closing curly brace (`}`), followed by a closing curly brace, anywhere on the line.
You don't actually need the trivial `main()` supplied in your original flex input, as linking with `-lfl` provides an equivalent one, and the `%noyywrap` is not essential for the purposes of this discussion, so that's a complete solution. Demo:
$ flex com.l
$ gcc -o com lex.yy.c -lfl
$ ./com <<'EOF'
first line {first comment}
second line {multiline
comment}
EOF
{first comment}
{multiline
comment}
Of course, that `com` macro is now simple enough that, given it is used only once anyway, you could consider incorporating the regex directly into your rule instead of via the `com` definition. But that's a stylistic matter that could go either way.
Also, if you wanted to strip the comments instead of the non-comments, or if you wanted to exclude the braces from the output comment text, then you could do that by making appropriate modifications to the actions of those rules.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论