英文:
pyparsing: NotAny(FollowedBy()) failing
问题
import pyparsing as pp
from pyparsing import *
def pyparsing_test():
data = "[gog1] [G1] [gog2] [gog3] [gog4] [G2] [gog5] [G3] [gog6]"
poi_type = Word(alphas).set_results_name('type')
poi = Suppress('[') + poi_type + Word(nums) + Suppress(']')
def cnd_is_type(val):
return lambda toks: toks.type == val
def cnd_is_not_type(val):
return lambda toks: toks.type != val
poi_gog = poi('gog').add_condition(cnd_is_type('gog'))
poi_g = poi('g').add_condition(cnd_is_type('G'))
poi_not_g = poi('not_g').add_condition(cnd_is_not_type('G'))
pattern = poi_gog + ~poi_g
r = pattern.search_string(data)
print(data)
print('=' * 10)
print(r)
英文:
i have some input data like
[gog1] [G1] [gog2] [gog3] [gog4] [G2] [gog5] [G3] [gog6]
and want to find all gogs, if not G after it. so in this case i want to get gog2 and gog3 (and maybe gog6).
looks pretty simple, rigth? but i failed
import pyparsing as pp
from pyparsing import *
def pyparsing_test():
# this also dont helps
# ParserElement.enable_left_recursion(force=True)
data=""" [gog1] [G1] [gog2] [gog3] [gog4] [G2] [gog5] [G3] [gog6] """
poi_type = Word(alphas).set_results_name('type')
poi = Suppress('[') + poi_type + Char(nums) + Suppress(']')
def cnd_is_type(val):
return lambda toks: toks.type==val
def cnd_is_not_type(val):
return lambda toks: toks.type!=val
poi_gog=poi('gog').add_condition(cnd_is_type('gog'))
poi_g=poi('g').add_condition(cnd_is_type('G'))
poi_not_g=poi('not_g').add_condition(cnd_is_not_type('G'))
pattern = poi_gog + ~poi_g
#WTF this finds only `gog6`, why??
pattern = poi_gog + NotAny(FollowedBy(poi_g))
#WTF same, only `gog6`
pattern = poi_gog + poi_not_g.suppress()
#WTF this works better but find only `gog2`, why not `gog3` also?
r=pattern.search_string(data)
print(data)
print('='*10)
print(r)
答案1
得分: 0
我会选择使用正则表达式模块 re
import re
data = """[gog1] [G1] [gog2] [gog3] [gog4] [G2] [gog5] [G3] [gog6] """
m = re.findall(r'\[(gog.)(?!...G)', data)
print(m)
结果是:
['gog2', 'gog3', 'gog6']
如果需要,正则表达式仍然可以进一步改进,以排除最后的 gog?和/或处理大于9的数字,或者使其更健壮。
英文:
I would go for the regexp module re
import re
data=""" [gog1] [G1] [gog2] [gog3] [gog4] [G2] [gog5] [G3] [gog6] """
m = re.findall('\[(gog.)(?!...G)', data)
print(m)
the result is:
['gog2', 'gog3', 'gog6']
The regexp can still be improved if you want to exclute the last gog ? and/or you need to handle numbers larger than 9 if needed ? or make it more robust.
答案2
得分: 0
Finally, we know what's happening, thanks to @ptmcg!
His original answer on GitHub here.
Summary:
First of all, you need to use grouping with StringEnd()
, and this one works:
pattern = poi_gog + FollowedBy(Group(poi_not_g) | StringEnd())
Regarding the title problem - NotAny()
has a bug; it skips parse actions and conditions. The current version of pyparsing is 3.0.9.
英文:
Finnaly, we know whats happens, thanks to @ptmcg!
His original answer on github https://github.com/pyparsing/pyparsing/issues/482#issuecomment-1546779260.
Summary:
First of all, need to use grouping with StringEnd()
and this one works:
pattern = poi_gog + FollowedBy(Group(poi_not_g) | StringEnd())
About title problem - NotAny()
have bug, it skips parse actions (and conditions). Current version pyparsing 3.0.9
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论