Nom 7 doesn't backtrack when `alt` branch fails to parse

huangapple go评论64阅读模式
英文:

Nom 7 doesn't backtrack when `alt` branch fails to parse

问题

I understand your question. It seems you're encountering some unexpected behavior with the alt combinator in the nom parsing library in Rust.

In the original code, the issue with cases #6 and #7 is due to how alt works in nom. alt does try the next branch in case of failure, but the key difference between #6/#7 and the others is the use of multispace0 before and after the ident parser within the delimited parser.

Here's the relevant part of your code for reference:

delimited(
    char('{'),
    alt((
        delimited(multispace0, map(ident, Res::Good), multispace0),
        map(opt(is_not("}")), |s| Res::Bad(s.unwrap_or_default())),
    )),
    char('}')
)

In cases #6 and #7, when ident fails, it leaves the input position at the character just after the failed identifier parsing attempt. However, multispace0 expects zero or more whitespaces, and when it doesn't find any whitespaces immediately after the failed ident, it also fails, resulting in an Err. This behavior is consistent with how nom works, and it explains why #6 and #7 produce Err instead of trying the next branch of alt.

In contrast, in the modified brace_expr2 function, you have removed the multispace0 parsers from the branches of alt, so alt is able to correctly move to the next branch when ident fails, and it works as expected.

So, in summary, the behavior you're observing in #6 and #7 is due to the interaction between the ident parser, multispace0, and the alt combinator. Removing multispace0 from the branches as you did in brace_expr2 allows alt to work as you originally expected.

英文:

I am encountering some odd behavior in Nom 7 that I don't understand. I was under the impression that a failed parse in a branch of alt would backtrack and try the next branch of the alt, but this seems not to be the case.

In my case, I am trying to parse text of the form {stuff between braces}, where if the stuff is a valid identifier (alphanumerics and underscores), I return Ok(Good(stuff)), if it's not a valid identifier I return Ok(Bad(stuff)), and if the text isn't a bunch of stuff between curly braces than a nom::Err is returned.

use nom::{
	branch::alt,
	bytes::complete::is_not,
	character::complete::{char, multispace0, satisfy},
	combinator::{map, opt, recognize},
	multi::many0_count,
	sequence::{delimited, pair},
	IResult,
};

#[derive(Debug)]
enum Res<'a> {
	Good(&'a str),
	Bad(&'a str),
}

fn ident(input: &str) -> IResult<&str, &str> {
	recognize(pair(
		satisfy(|c| c.is_alphabetic() || c == '_'),
		many0_count(satisfy(|c| c.is_alphanumeric() || c == '_')),
	))(input)
}

fn brace_expr(input: &str) -> IResult<&str, Res<'_>> {
	delimited(
		char('{'),
		alt((
            // Try to parse an identifier optionally surrounded by whitespace
			delimited(multispace0, map(ident, Res::Good), multispace0),
            // Otherwise, just take everything between the braces
			map(opt(is_not("}")), |s| Res::Bad(s.unwrap_or_default())),
		)),
		char('}'),
	)(input)
}

fn main() {
	println!("1. {:?}", brace_expr("{}"));
	println!("2. {:?}", brace_expr("{a}"));
	println!("3. {:?}", brace_expr("{ a }"));
	println!("4. {:?}", brace_expr("{?a}"));
	println!("5. {:?}", brace_expr("{ ?a }"));
	println!("6. {:?}", brace_expr("{a?}"));
	println!("7. {:?}", brace_expr("{ a? }"));
}

Output

1. Ok(("", Bad("")))
2. Ok(("", Good("a")))
3. Ok(("", Good("a")))
4. Ok(("", Bad("?a")))
5. Ok(("", Bad(" ?a ")))
6. Err(Error(Error { input: "?}", code: Char }))
7. Err(Error(Error { input: "? }", code: Char }))

I understand 1-5. They attempted to parse delimited(multispace0, map(ident, Res::Good), multispace0), and either succeeded (#2, #3), or failed (#1, #4, #5) and moved onto the second branch of the alt, but in either case the alt succeeds and we get an Ok.

But I do not understand #6 and #7. When the first branch of the alt fails, instead of moving onto the second, they fail outright, resulting in an Err. Doesn't this contradict the fact that alt is supposed to try the next branch in the case of failure?

The only thing that seems different about #6 and #7 is that they succeed at parsing some ident before failing, whereas the Ok(Bad) ones never succeed at parsing an ident. But that shouldn't affect whether alt tries the next branch, should it?

EDIT:

If I “un-factor-out” the chars, then it works:

fn brace_expr2(input: &str) -> IResult<&str, Res<'_>> {
	alt((
		delimited(
			char('{'),
			delimited(multispace0, map(ident, Res::Good), multispace0),
			char('}'),
		),
		delimited(
			char('{'),
			map(opt(is_not("}")), |s| Res::Bad(s.unwrap_or_default())),
			char('}'),
		),
	))(input)
}

fn main() {
	println!("1. {:?}", brace_expr2("{}"));
	println!("2. {:?}", brace_expr2("{a}"));
	println!("3. {:?}", brace_expr2("{ a }"));
	println!("4. {:?}", brace_expr2("{?a}"));
	println!("5. {:?}", brace_expr2("{ ?a }"));
	println!("6. {:?}", brace_expr2("{a?}"));
	println!("7. {:?}", brace_expr2("{ a? }"));
}
1. Ok(("", Bad("")))
2. Ok(("", Good("a")))
3. Ok(("", Good("a")))
4. Ok(("", Bad("?a")))
5. Ok(("", Bad(" ?a ")))
6. Ok(("", Bad("a?")))
7. Ok(("", Bad(" a? ")))

I still don't understand why alt doesn't work in the first case, but does in the second case.

答案1

得分: 1

这很有趣!

你的第一个 alt-arm 接受,例如 6 和 7 "a" 或 " a",但然后期待字符 "}"。但它找到了 "?" 或 "? "。

第二个 alt-arm 在 6 和 7 中永远不会被触发。

为了解决这个问题,以你提供的方式,当 ident 遇到不喜欢的字符时,你需要抛出一个错误。

我猜 peek() 是用来检查 "}" 是否出现,意味着这真的很好。

但是,我个人认为 take_while 可能更适合。

英文:

That's a fun one!

Your first alt-arm eats up, for example 6 and 7 "a" resp " a" and not, then expects the character "}". But it finds "?" resp "? ".

The second alt-arm is never reached in 6 and 7.

To solve this, in the manner you provided your solution, you need to throw an error when ident encounters anything to it's disliking.
I guess peek() is an order to see if "}" comes a long, meaning it's really good.

But, IMHO, I thin take_while might be better fit.

huangapple
  • 本文由 发表于 2023年5月30日 12:00:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76361538.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定