pyparsing: NotAny(FollowedBy()) 失败

huangapple go评论101阅读模式
英文:

pyparsing: NotAny(FollowedBy()) failing

问题

  1. import pyparsing as pp
  2. from pyparsing import *
  3. def pyparsing_test():
  4. data = "[gog1] [G1] [gog2] [gog3] [gog4] [G2] [gog5] [G3] [gog6]"
  5. poi_type = Word(alphas).set_results_name('type')
  6. poi = Suppress('[') + poi_type + Word(nums) + Suppress(']')
  7. def cnd_is_type(val):
  8. return lambda toks: toks.type == val
  9. def cnd_is_not_type(val):
  10. return lambda toks: toks.type != val
  11. poi_gog = poi('gog').add_condition(cnd_is_type('gog'))
  12. poi_g = poi('g').add_condition(cnd_is_type('G'))
  13. poi_not_g = poi('not_g').add_condition(cnd_is_not_type('G'))
  14. pattern = poi_gog + ~poi_g
  15. r = pattern.search_string(data)
  16. print(data)
  17. print('=' * 10)
  18. print(r)
英文:

i have some input data like

[gog1] [G1] [gog2] [gog3] [gog4] [G2] [gog5] [G3] [gog6]

and want to find all gogs, if not G after it. so in this case i want to get gog2 and gog3 (and maybe gog6).

looks pretty simple, rigth? but i failed pyparsing: NotAny(FollowedBy()) 失败

  1. import pyparsing as pp
  2. from pyparsing import *
  3. def pyparsing_test():
  4. # this also dont helps
  5. # ParserElement.enable_left_recursion(force=True)
  6. data=""" [gog1] [G1] [gog2] [gog3] [gog4] [G2] [gog5] [G3] [gog6] """
  7. poi_type = Word(alphas).set_results_name('type')
  8. poi = Suppress('[') + poi_type + Char(nums) + Suppress(']')
  9. def cnd_is_type(val):
  10. return lambda toks: toks.type==val
  11. def cnd_is_not_type(val):
  12. return lambda toks: toks.type!=val
  13. poi_gog=poi('gog').add_condition(cnd_is_type('gog'))
  14. poi_g=poi('g').add_condition(cnd_is_type('G'))
  15. poi_not_g=poi('not_g').add_condition(cnd_is_not_type('G'))
  16. pattern = poi_gog + ~poi_g
  17. #WTF this finds only `gog6`, why??
  18. pattern = poi_gog + NotAny(FollowedBy(poi_g))
  19. #WTF same, only `gog6`
  20. pattern = poi_gog + poi_not_g.suppress()
  21. #WTF this works better but find only `gog2`, why not `gog3` also?
  22. r=pattern.search_string(data)
  23. print(data)
  24. print('='*10)
  25. print(r)

答案1

得分: 0

我会选择使用正则表达式模块 re

  1. import re
  2. data = """[gog1] [G1] [gog2] [gog3] [gog4] [G2] [gog5] [G3] [gog6] """
  3. m = re.findall(r'\[(gog.)(?!...G)', data)
  4. print(m)

结果是:

  1. ['gog2', 'gog3', 'gog6']

如果需要,正则表达式仍然可以进一步改进,以排除最后的 gog?和/或处理大于9的数字,或者使其更健壮。

英文:

I would go for the regexp module re

  1. import re
  2. data=""" [gog1] [G1] [gog2] [gog3] [gog4] [G2] [gog5] [G3] [gog6] """
  3. m = re.findall('\[(gog.)(?!...G)', data)
  4. print(m)

the result is:

  1. ['gog2', 'gog3', 'gog6']

The regexp can still be improved if you want to exclute the last gog ? and/or you need to handle numbers larger than 9 if needed ? or make it more robust.

答案2

得分: 0

Finally, we know what's happening, thanks to @ptmcg!
His original answer on GitHub here.

Summary:

First of all, you need to use grouping with StringEnd(), and this one works:

  1. pattern = poi_gog + FollowedBy(Group(poi_not_g) | StringEnd())

Regarding the title problem - NotAny() has a bug; it skips parse actions and conditions. The current version of pyparsing is 3.0.9.

英文:

Finnaly, we know whats happens, thanks to @ptmcg!
His original answer on github https://github.com/pyparsing/pyparsing/issues/482#issuecomment-1546779260.

Summary:

First of all, need to use grouping with StringEnd() and this one works:

  1. pattern = poi_gog + FollowedBy(Group(poi_not_g) | StringEnd())

About title problem - NotAny() have bug, it skips parse actions (and conditions). Current version pyparsing 3.0.9

huangapple
  • 本文由 发表于 2023年5月11日 18:06:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76226442.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定