2023年3月4日 07:35:37go评论149阅读模式

英文:

Trying to make a lexer using Beautiful Racket

问题

I am new to Racket and I am trying to tokenize a grammar using the Beautiful Racket library. I have defined the grammar in a separate file and it seems to be completely fine. I have also created a parser that uses the 'parse-to-datum' procedure in Beautiful Racket, which is also working fine. However, I am encountering an error with my tokenizer. As the parser encounters an ID such as 'A', it produces an error message:

Encountered unexpected token of type 'ID (value "A") while parsing 'unknown

I assume this error has to do with the way I am tokenizing IDs. Can you help me adjust my tokenizer to correctly handle IDs? Here is the specific grammar I am trying to parse:

10 read A 
20 read B 
30 gosub 400 
40 if C = 400 then write C 
50 if C = 0 then goto 1000 
400 C = A + B : return 
$$

here is my grammar:

program -&gt; linelist $$ 
linelist -&gt; line linelist | epsilon 
line -&gt; idx stmt linetail* [EOL]
idx -&gt; nonzero_digit digit* 
linetail -&gt; :stmt | epsilon 
stmt -&gt; id = expr | if expr then stmt | read id | write expr | goto idx | gosub idx | return
expr -&gt; id etail | num etail | (expr)
etail -&gt; + expr | - expr | = expr | epsilon
id -&gt; [a-zA-Z]+
num -&gt; numsign digit digit*
numsign -&gt; + | - | epsilon 
nonzero_digit -&gt; 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
digit -&gt; 0 | nonzero_digit

Here is my tokenizer:

#lang br/quicklang
(require brag/support)
(define (make-tokenizer port)
  (port-count-lines! port) ; get line data
  (define (next-token)
  
    (define-lex-abbrevs
      (lower-letter (:/ "a" "z"))
      (upper-letter (:/ #\A #\Z))
      (digit (:/ "0" "9)))
      
    (define odai-lexer
      (lexer
       
       [whitespace (token 'WS lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start) #:skip? #t)]
       
       ["{" (token 'PROG-START lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["}" (token 'PROG-STOP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["$" (token 'DOLLAR lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       ["read" (token 'READ lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["write" (token 'WRITE lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       [";" (token 'DELIMIT lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       ["if" (token 'IF lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["then" (token 'THEN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       ["=" (token 'ASSIGN-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["+" (token 'ADD-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["-" (token 'SUB-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["(" (token 'OPENa-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       [")" (token 'CLOSE-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       ["goto" (token 'GOTO lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["gosub" (token 'GOSUB lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["return" (token 'RETURN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       [(:+ (:or lower-letter upper-letter)) (token 'ID lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       [(:+ digit) (token 'DIGIT (string->number lexeme) #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       [any-char (token 'MISC lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]))
    (odai-lexer port))
  next-token)
(provide make-tokenizer)

I have tried adjusting the way I define the "ID" in my tokenizer, and I have also tried defining the grammar for ID in many different ways. Currently, I am simply calling it:

id : LETTER+

英文:

Encountered unexpected token of type 'ID (value "A") while parsing 'unknown

I assume this error has to do with the way I am tokenizing IDs. Can you help me adjust my tokenizer to correctly handle IDs? Here is the specific grammar I am trying to parse:

10 read A 
20 read B 
30 gosub 400 
40 if C = 400 then write C 
50 if C = 0 then goto 1000 
400 C = A + B : return 
$$

here is my grammar:

program -&gt; linelist $$ 
linelist -&gt; line linelist | epsilon 
line -&gt; idx stmt linetail* [EOL]
idx -&gt; nonzero_digit digit* 
linetail -&gt; :stmt | epsilon 
stmt -&gt; id = expr | if expr then stmt | read id | write expr | goto idx | gosub idx | return
expr -&gt; id etail | num etail | (expr)
etail -&gt; + expr | - expr | = expr | epsilon
id -&gt; [a-zA-Z]+
num -&gt; numsign digit digit*
numsign -&gt; + | - | epsilon 
nonzero_digit -&gt; 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
digit -&gt; 0 | nonzero_digit

Here is my tokenizer:

#lang br/quicklang
(require brag/support)
(define (make-tokenizer port)
(port-count-lines! port) ; get line data
(define (next-token)
(define-lex-abbrevs
(lower-letter (:/ &quot;a&quot; &quot;z&quot;))
(upper-letter (:/ #\A #\Z))
(digit (:/ &quot;0&quot; &quot;9&quot;)))
(define odai-lexer
(lexer
[whitespace (token &#39;WS lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start) #:skip? #t)]
[&quot;{&quot; (token &#39;PROG-START lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[&quot;}&quot; (token &#39;PROG-STOP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[&quot;$&quot; (token &#39;DOLLAR lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[&quot;read&quot; (token &#39;READ lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[&quot;write&quot; (token &#39;WRITE lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[&quot;;&quot; (token &#39;DELIMIT lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[&quot;if&quot; (token &#39;IF lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[&quot;then&quot; (token &#39;THEN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[&quot;=&quot; (token &#39;ASSIGN-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[&quot;+&quot; (token &#39;ADD-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[&quot;-&quot; (token &#39;SUB-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[&quot;(&quot; (token &#39;OPENa-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[&quot;)&quot; (token &#39;CLOSE-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[&quot;goto&quot; (token &#39;GOTO lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[&quot;gosub&quot; (token &#39;GOSUB lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[&quot;return&quot; (token &#39;RETURN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[(:+ (:or lower-letter upper-letter)) (token &#39;ID lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[(:+ digit) (token &#39;DIGIT (string-&gt;number lexeme) #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[any-char (token &#39;MISC lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]))
(odai-lexer port))
next-token)
(provide make-tokenizer)

I have tried adjusting the way I define the "ID" in my tokenizer, and I have also tried defining the grammar for ID in many different ways. Currently, I am simply calling it:

id : LETTER+

答案1

得分: 1

The lexer produces an ID token, so try this rule in the parser:

id -> ID

英文:

The lexer produces an ID token, so try this rule in the parser:

   id -&gt; ID

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Trying to make a lexer using Beautiful Racket

问题

答案1

如何使用Python检查网站是否使用WordPress编写？

Golang日期解析

跳过监听器中的子表达式

如何解析 JSON 文件

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。