英文:
Nom 7 doesn't backtrack when `alt` branch fails to parse
问题
I understand your question. It seems you're encountering some unexpected behavior with the alt
combinator in the nom
parsing library in Rust.
In the original code, the issue with cases #6 and #7 is due to how alt
works in nom
. alt
does try the next branch in case of failure, but the key difference between #6/#7 and the others is the use of multispace0
before and after the ident
parser within the delimited
parser.
Here's the relevant part of your code for reference:
delimited(
char('{'),
alt((
delimited(multispace0, map(ident, Res::Good), multispace0),
map(opt(is_not("}")), |s| Res::Bad(s.unwrap_or_default())),
)),
char('}')
)
In cases #6 and #7, when ident
fails, it leaves the input position at the character just after the failed identifier parsing attempt. However, multispace0
expects zero or more whitespaces, and when it doesn't find any whitespaces immediately after the failed ident
, it also fails, resulting in an Err
. This behavior is consistent with how nom
works, and it explains why #6 and #7 produce Err
instead of trying the next branch of alt
.
In contrast, in the modified brace_expr2
function, you have removed the multispace0
parsers from the branches of alt
, so alt
is able to correctly move to the next branch when ident
fails, and it works as expected.
So, in summary, the behavior you're observing in #6 and #7 is due to the interaction between the ident
parser, multispace0
, and the alt
combinator. Removing multispace0
from the branches as you did in brace_expr2
allows alt
to work as you originally expected.
英文:
I am encountering some odd behavior in Nom 7 that I don't understand. I was under the impression that a failed parse in a branch of alt
would backtrack and try the next branch of the alt
, but this seems not to be the case.
In my case, I am trying to parse text of the form {stuff between braces}
, where if the stuff
is a valid identifier (alphanumerics and underscores), I return Ok(Good(stuff))
, if it's not a valid identifier I return Ok(Bad(stuff))
, and if the text isn't a bunch of stuff between curly braces than a nom::Err
is returned.
use nom::{
branch::alt,
bytes::complete::is_not,
character::complete::{char, multispace0, satisfy},
combinator::{map, opt, recognize},
multi::many0_count,
sequence::{delimited, pair},
IResult,
};
#[derive(Debug)]
enum Res<'a> {
Good(&'a str),
Bad(&'a str),
}
fn ident(input: &str) -> IResult<&str, &str> {
recognize(pair(
satisfy(|c| c.is_alphabetic() || c == '_'),
many0_count(satisfy(|c| c.is_alphanumeric() || c == '_')),
))(input)
}
fn brace_expr(input: &str) -> IResult<&str, Res<'_>> {
delimited(
char('{'),
alt((
// Try to parse an identifier optionally surrounded by whitespace
delimited(multispace0, map(ident, Res::Good), multispace0),
// Otherwise, just take everything between the braces
map(opt(is_not("}")), |s| Res::Bad(s.unwrap_or_default())),
)),
char('}'),
)(input)
}
fn main() {
println!("1. {:?}", brace_expr("{}"));
println!("2. {:?}", brace_expr("{a}"));
println!("3. {:?}", brace_expr("{ a }"));
println!("4. {:?}", brace_expr("{?a}"));
println!("5. {:?}", brace_expr("{ ?a }"));
println!("6. {:?}", brace_expr("{a?}"));
println!("7. {:?}", brace_expr("{ a? }"));
}
Output
1. Ok(("", Bad("")))
2. Ok(("", Good("a")))
3. Ok(("", Good("a")))
4. Ok(("", Bad("?a")))
5. Ok(("", Bad(" ?a ")))
6. Err(Error(Error { input: "?}", code: Char }))
7. Err(Error(Error { input: "? }", code: Char }))
I understand 1-5. They attempted to parse delimited(multispace0, map(ident, Res::Good), multispace0)
, and either succeeded (#2, #3), or failed (#1, #4, #5) and moved onto the second branch of the alt
, but in either case the alt
succeeds and we get an Ok
.
But I do not understand #6 and #7. When the first branch of the alt
fails, instead of moving onto the second, they fail outright, resulting in an Err
. Doesn't this contradict the fact that alt
is supposed to try the next branch in the case of failure?
The only thing that seems different about #6 and #7 is that they succeed at parsing some ident
before failing, whereas the Ok(Bad)
ones never succeed at parsing an ident
. But that shouldn't affect whether alt
tries the next branch, should it?
EDIT:
If I “un-factor-out” the char
s, then it works:
fn brace_expr2(input: &str) -> IResult<&str, Res<'_>> {
alt((
delimited(
char('{'),
delimited(multispace0, map(ident, Res::Good), multispace0),
char('}'),
),
delimited(
char('{'),
map(opt(is_not("}")), |s| Res::Bad(s.unwrap_or_default())),
char('}'),
),
))(input)
}
fn main() {
println!("1. {:?}", brace_expr2("{}"));
println!("2. {:?}", brace_expr2("{a}"));
println!("3. {:?}", brace_expr2("{ a }"));
println!("4. {:?}", brace_expr2("{?a}"));
println!("5. {:?}", brace_expr2("{ ?a }"));
println!("6. {:?}", brace_expr2("{a?}"));
println!("7. {:?}", brace_expr2("{ a? }"));
}
1. Ok(("", Bad("")))
2. Ok(("", Good("a")))
3. Ok(("", Good("a")))
4. Ok(("", Bad("?a")))
5. Ok(("", Bad(" ?a ")))
6. Ok(("", Bad("a?")))
7. Ok(("", Bad(" a? ")))
I still don't understand why alt
doesn't work in the first case, but does in the second case.
答案1
得分: 1
这很有趣!
你的第一个 alt-arm 接受,例如 6 和 7 "a" 或 " a",但然后期待字符 "}"。但它找到了 "?" 或 "? "。
第二个 alt-arm 在 6 和 7 中永远不会被触发。
为了解决这个问题,以你提供的方式,当 ident 遇到不喜欢的字符时,你需要抛出一个错误。
我猜 peek() 是用来检查 "}" 是否出现,意味着这真的很好。
但是,我个人认为 take_while 可能更适合。
英文:
That's a fun one!
Your first alt-arm eats up, for example 6 and 7 "a" resp " a" and not, then expects the character "}". But it finds "?" resp "? ".
The second alt-arm is never reached in 6 and 7.
To solve this, in the manner you provided your solution, you need to throw an error when ident encounters anything to it's disliking.
I guess peek() is an order to see if "}" comes a long, meaning it's really good.
But, IMHO, I thin take_while might be better fit.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论