英文:
Why is my Parser written in Racket Brag exceeding 128 MB memory limit
问题
I'm writing an esolang I designed (called RifL) in Racket using the beautiful racket textbook. I'm 99% done, and was testing RifL by writing a larger program in it. When the RifL program got large enough, running it became very very slow, and eventually, running it crashed the evaluation because the program ran out of memory.
I have identified that the problem is coming from the parser I built, which is written in brag. My original parser looks like so:
#lang brag
RifL-program: \[convert-to-deck\] (/NEWLINE \[convert-to-deck\])\*
convert-to-deck: convert-to-name /DIVIDER convert-to-stack
convert-to-name: ((S-PIP-CARD \[/COMMA\])\* S-PIP-CARD) |
((C-PIP-CARD \[/COMMA\])\* C-PIP-CARD) |
((H-PIP-CARD \[/COMMA\])\* H-PIP-CARD) |
((D-PIP-CARD \[/COMMA\])\* D-PIP-CARD)
convert-to-stack: (entry\* /NEWLINE)\* entry\*
entry: (S-PIP-CARD | C-PIP-CARD | H-PIP-CARD | D-PIP-CARD| ROYAL-CARD | JOKER | FACE-DOWN) \[/COMMA\]
I've cut off as many of the processes needed to run RifL, to isolate the problem. Below is a link to a git that has a tokenizer, a lexer, a parser, and a tester file. You will need the beautiful racket package and brag package installed in racket to run the tester.
If you run the tester file in racket with a 128 MB limit, it should succeed on parsing the first code example, and exceed memory on the second. In these files I provided, the parser has been simplified to:
RifL-program: [anything] (/NEWLINE [anything])*
@anything: (S-PIP-CARD | C-PIP-CARD | H-PIP-CARD | D-PIP-CARD| ROYAL-CARD | JOKER | FACE-DOWN
| /COMMA | /NEWLINE | /DIVIDER)*
This is the minimum complexity of the Parser grammar needed to exceed the memory limit. This alarms me, as it is such a simple grammar. My question is, am I doing something wrong, and if so, what? Is the problem in the tokenizer or lexer? If I'm not doing something wrong, does brag just suck? And if brag sucks, what can I replace it with, that doesn't require me to refactor my whole executer, and doesn't force me to build a full parser from scratch?
Matthew Butterick or Greg Hendershott, if you're listening, please save me from writing a full parser.
英文:
I'm writing an esolang I designed (called RifL) in Racket using the beautiful racket textbook. I'm 99% done, and was testing RifL by writing a larger program in it. When the RifL program got large enough, running it became very very slow, and eventually, running it crashed the evaluation because the program ran out of memory.
I have identified that the problem is coming from the parser I built, which is written in brag. My original parser looks like so:
#lang brag
RifL-program: \[convert-to-deck\] (/NEWLINE \[convert-to-deck\])\*
convert-to-deck: convert-to-name /DIVIDER convert-to-stack
convert-to-name: ((S-PIP-CARD \[/COMMA\])\* S-PIP-CARD) |
((C-PIP-CARD \[/COMMA\])\* C-PIP-CARD) |
((H-PIP-CARD \[/COMMA\])\* H-PIP-CARD) |
((D-PIP-CARD \[/COMMA\])\* D-PIP-CARD)
convert-to-stack: (entry\* /NEWLINE)\* entry\*
entry: (S-PIP-CARD | C-PIP-CARD | H-PIP-CARD | D-PIP-CARD| ROYAL-CARD | JOKER | FACE-DOWN) \[/COMMA\]
I've cut off as many of the processes needed to run RifL, to isolate the problem. Below is a link to a git that has a tokenizer, a lexer, a parser, and a tester file. You will need the beautiful racket package and brag package installed in racket to run the tester.
https://github.com/Jesse-Hamlin-Navias/Rifl-parser-fail
If you run the tester file in racket with a 128 MB limit, it should succeed on parsing the first code example, and exceed memory on the second. In these files I provided, the parser has been simplified to:
RifL-program: [anything] (/NEWLINE [anything])*
@anything: (S-PIP-CARD | C-PIP-CARD | H-PIP-CARD | D-PIP-CARD| ROYAL-CARD | JOKER | FACE-DOWN
| /COMMA | /NEWLINE | /DIVIDER)*
This is the minimum complexity of the Parser grammar needed to exceed the memory limit. This alarms me, as it is such a simple grammar. My question is, am I doing something wrong, and if so, what? Is the problem in the tokenizer or lexer? If I'm not doing something wrong, does brag just suck? And if brag sucks, what can I replace it with, that doesn't require me to refactor my whole executer, and doesn't force me to build a full parser from scratch?
Matthew Butterick or Greg Hendershott, if you're listening, please save me from writing a full parser.
答案1
得分: 1
It turns out its pretty important. Also removing all the * was also integral. The parser is still slow with large code, and can crash, but it crashes way less, so a big improvement. I think brag is probably not built well, and could be improved, but approaching the grammar right is important.
RifL-program: () | 转换为牌组 | RifL-program-sub [/NEWLINE] [转换为牌组]
@RifL-program-sub: [转换为牌组] | RifL-program-sub [/NEWLINE] [转换为牌组]
转换为牌组: 转换为名称 /分隔符 转换为堆栈
转换为名称: [(s-名称 [/逗号])] S-PIP-卡 | [(c-名称 [/逗号])] C-PIP-卡 |
[(h-名称 [/逗号])] H-PIP-卡 | [(d-名称 [/逗号])] D-PIP-卡
@s-名称: [(s-名称 [/逗号])] S-PIP-卡
@c-名称: [(c-名称 [/逗号])] C-PIP-卡
@h-名称: [(h-名称 [/逗号])] H-PIP-卡
@d-名称: [(d-名称 [/逗号])] D-PIP-卡
转换为堆栈: [(子堆栈 [/逗号])] [/NEWLINE] [入口]
@子堆栈: [(子堆栈 [/逗号])] [/NEWLINE] [入口]
入口: (S-PIP-卡 | C-PIP-卡 | H-PIP-卡 | D-PIP-卡 | ROYAL-卡 | JOKER | FACE-DOWN)
英文:
It turns out its pretty important. Also removing all the * was also integral. The parser is still slow with large code, and can crash, but it crashes way less, so a big improvement. I think brag is probably not built well, and could be improved, but approaching the grammar right is important.
#lang brag
RifL-program: () | convert-to-deck | RifL-program-sub [/NEWLINE] [convert-to-deck]
@RifL-program-sub: [convert-to-deck] | RifL-program-sub [/NEWLINE] [convert-to-deck]
convert-to-deck: convert-to-name /DIVIDER convert-to-stack
convert-to-name: [(s-name [/COMMA])] S-PIP-CARD | [(c-name [/COMMA])] C-PIP-CARD |
[(h-name [/COMMA])] H-PIP-CARD | [(d-name [/COMMA])] D-PIP-CARD
@s-name: [(s-name [/COMMA])] S-PIP-CARD
@c-name: [(c-name [/COMMA])] C-PIP-CARD
@h-name: [(h-name [/COMMA])] H-PIP-CARD
@d-name: [(d-name [/COMMA])] D-PIP-CARD
convert-to-stack: [(sub-stack [/COMMA])] [/NEWLINE] [entry]
@sub-stack: [(sub-stack [/COMMA])] [/NEWLINE] [entry]
entry: (S-PIP-CARD | C-PIP-CARD | H-PIP-CARD | D-PIP-CARD| ROYAL-CARD | JOKER | FACE-DOWN)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论