为什么我的用Racket编写的解析器超出了128 MB的内存限制?

huangapple go评论61阅读模式
英文:

Why is my Parser written in Racket Brag exceeding 128 MB memory limit

问题

I'm writing an esolang I designed (called RifL) in Racket using the beautiful racket textbook. I'm 99% done, and was testing RifL by writing a larger program in it. When the RifL program got large enough, running it became very very slow, and eventually, running it crashed the evaluation because the program ran out of memory.

I have identified that the problem is coming from the parser I built, which is written in brag. My original parser looks like so:

#lang brag
RifL-program: \[convert-to-deck\] (/NEWLINE \[convert-to-deck\])\*
convert-to-deck: convert-to-name /DIVIDER convert-to-stack
convert-to-name: ((S-PIP-CARD \[/COMMA\])\* S-PIP-CARD) |
                 ((C-PIP-CARD \[/COMMA\])\* C-PIP-CARD) |
                 ((H-PIP-CARD \[/COMMA\])\* H-PIP-CARD) |
                 ((D-PIP-CARD \[/COMMA\])\* D-PIP-CARD)
convert-to-stack: (entry\* /NEWLINE)\* entry\*
entry: (S-PIP-CARD | C-PIP-CARD | H-PIP-CARD | D-PIP-CARD| ROYAL-CARD | JOKER | FACE-DOWN) \[/COMMA\]

I've cut off as many of the processes needed to run RifL, to isolate the problem. Below is a link to a git that has a tokenizer, a lexer, a parser, and a tester file. You will need the beautiful racket package and brag package installed in racket to run the tester.

Link to GitHub Repository

If you run the tester file in racket with a 128 MB limit, it should succeed on parsing the first code example, and exceed memory on the second. In these files I provided, the parser has been simplified to:

RifL-program: [anything] (/NEWLINE [anything])*
@anything: (S-PIP-CARD | C-PIP-CARD | H-PIP-CARD | D-PIP-CARD| ROYAL-CARD | JOKER | FACE-DOWN
               | /COMMA | /NEWLINE | /DIVIDER)*

This is the minimum complexity of the Parser grammar needed to exceed the memory limit. This alarms me, as it is such a simple grammar. My question is, am I doing something wrong, and if so, what? Is the problem in the tokenizer or lexer? If I'm not doing something wrong, does brag just suck? And if brag sucks, what can I replace it with, that doesn't require me to refactor my whole executer, and doesn't force me to build a full parser from scratch?

Matthew Butt­erick or Greg Hendershott, if you're listening, please save me from writing a full parser.

英文:

I'm writing an esolang I designed (called RifL) in Racket using the beautiful racket textbook. I'm 99% done, and was testing RifL by writing a larger program in it. When the RifL program got large enough, running it became very very slow, and eventually, running it crashed the evaluation because the program ran out of memory.

I have identified that the problem is coming from the parser I built, which is written in brag. My original parser looks like so:

#lang brag
RifL-program: \[convert-to-deck\] (/NEWLINE \[convert-to-deck\])\*
convert-to-deck: convert-to-name /DIVIDER convert-to-stack
convert-to-name: ((S-PIP-CARD \[/COMMA\])\* S-PIP-CARD) |
                 ((C-PIP-CARD \[/COMMA\])\* C-PIP-CARD) |
                 ((H-PIP-CARD \[/COMMA\])\* H-PIP-CARD) |
                 ((D-PIP-CARD \[/COMMA\])\* D-PIP-CARD)
convert-to-stack: (entry\* /NEWLINE)\* entry\*
entry: (S-PIP-CARD | C-PIP-CARD | H-PIP-CARD | D-PIP-CARD| ROYAL-CARD | JOKER | FACE-DOWN) \[/COMMA\]

I've cut off as many of the processes needed to run RifL, to isolate the problem. Below is a link to a git that has a tokenizer, a lexer, a parser, and a tester file. You will need the beautiful racket package and brag package installed in racket to run the tester.

https://github.com/Jesse-Hamlin-Navias/Rifl-parser-fail

If you run the tester file in racket with a 128 MB limit, it should succeed on parsing the first code example, and exceed memory on the second. In these files I provided, the parser has been simplified to:

RifL-program: [anything] (/NEWLINE [anything])*
@anything: (S-PIP-CARD | C-PIP-CARD | H-PIP-CARD | D-PIP-CARD| ROYAL-CARD | JOKER | FACE-DOWN
               | /COMMA | /NEWLINE | /DIVIDER)*

This is the minimum complexity of the Parser grammar needed to exceed the memory limit. This alarms me, as it is such a simple grammar. My question is, am I doing something wrong, and if so, what? Is the problem in the tokenizer or lexer? If I'm not doing something wrong, does brag just suck? And if brag sucks, what can I replace it with, that doesn't require me to refactor my whole executer, and doesn't force me to build a full parser from scratch?

Matthew Butt­erick or Greg Hendershott, if you're listening, please save me from writing a full parser.

答案1

得分: 1

左递归右递归!

It turns out its pretty important. Also removing all the * was also integral. The parser is still slow with large code, and can crash, but it crashes way less, so a big improvement. I think brag is probably not built well, and could be improved, but approaching the grammar right is important.

RifL-program: () | 转换为牌组 | RifL-program-sub [/NEWLINE] [转换为牌组]
@RifL-program-sub: [转换为牌组] | RifL-program-sub [/NEWLINE] [转换为牌组]
转换为牌组: 转换为名称 /分隔符 转换为堆栈
转换为名称: [(s-名称 [/逗号])] S-PIP-卡 | [(c-名称 [/逗号])] C-PIP-卡 |
                 [(h-名称 [/逗号])] H-PIP-卡 | [(d-名称 [/逗号])] D-PIP-卡
@s-名称: [(s-名称 [/逗号])] S-PIP-卡
@c-名称: [(c-名称 [/逗号])] C-PIP-卡
@h-名称: [(h-名称 [/逗号])] H-PIP-卡
@d-名称: [(d-名称 [/逗号])] D-PIP-卡
转换为堆栈: [(子堆栈 [/逗号])] [/NEWLINE] [入口]
@子堆栈: [(子堆栈 [/逗号])] [/NEWLINE] [入口]
入口: (S-PIP-卡 | C-PIP-卡 | H-PIP-卡 | D-PIP-卡 | ROYAL-卡 | JOKER | FACE-DOWN)
英文:

Left right recursion!

It turns out its pretty important. Also removing all the * was also integral. The parser is still slow with large code, and can crash, but it crashes way less, so a big improvement. I think brag is probably not built well, and could be improved, but approaching the grammar right is important.

#lang brag
RifL-program: () | convert-to-deck | RifL-program-sub [/NEWLINE] [convert-to-deck]
@RifL-program-sub: [convert-to-deck] | RifL-program-sub [/NEWLINE] [convert-to-deck]
convert-to-deck: convert-to-name /DIVIDER convert-to-stack
convert-to-name: [(s-name [/COMMA])] S-PIP-CARD | [(c-name [/COMMA])] C-PIP-CARD |
                 [(h-name [/COMMA])] H-PIP-CARD | [(d-name [/COMMA])] D-PIP-CARD
@s-name: [(s-name [/COMMA])] S-PIP-CARD
@c-name: [(c-name [/COMMA])] C-PIP-CARD
@h-name: [(h-name [/COMMA])] H-PIP-CARD
@d-name: [(d-name [/COMMA])] D-PIP-CARD
convert-to-stack: [(sub-stack [/COMMA])] [/NEWLINE] [entry]
@sub-stack: [(sub-stack [/COMMA])] [/NEWLINE] [entry]
entry: (S-PIP-CARD | C-PIP-CARD | H-PIP-CARD | D-PIP-CARD| ROYAL-CARD | JOKER | FACE-DOWN)

huangapple
  • 本文由 发表于 2023年3月7日 03:51:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75655235.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定