Trying to make a lexer using Beautiful Racket

huangapple go评论125阅读模式
英文:

Trying to make a lexer using Beautiful Racket

问题

I am new to Racket and I am trying to tokenize a grammar using the Beautiful Racket library. I have defined the grammar in a separate file and it seems to be completely fine. I have also created a parser that uses the 'parse-to-datum' procedure in Beautiful Racket, which is also working fine. However, I am encountering an error with my tokenizer. As the parser encounters an ID such as 'A', it produces an error message:

Encountered unexpected token of type 'ID (value "A") while parsing 'unknown

I assume this error has to do with the way I am tokenizing IDs. Can you help me adjust my tokenizer to correctly handle IDs? Here is the specific grammar I am trying to parse:

10 read A 
20 read B 
30 gosub 400 
40 if C = 400 then write C 
50 if C = 0 then goto 1000 
400 C = A + B : return 
$$

here is my grammar:

program -> linelist $$ 
linelist -> line linelist | epsilon 
line -> idx stmt linetail* [EOL]
idx -> nonzero_digit digit* 
linetail -> :stmt | epsilon 
stmt -> id = expr | if expr then stmt | read id | write expr | goto idx | gosub idx | return
expr -> id etail | num etail | (expr)
etail -> + expr | - expr | = expr | epsilon
id -> [a-zA-Z]+
num -> numsign digit digit*
numsign -> + | - | epsilon 
nonzero_digit -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
digit -> 0 | nonzero_digit

Here is my tokenizer:

#lang br/quicklang
(require brag/support)

(define (make-tokenizer port)
  (port-count-lines! port) ; get line data
  (define (next-token)
  
    (define-lex-abbrevs
      (lower-letter (:/ "a" "z"))
      (upper-letter (:/ #\A #\Z))
      (digit (:/ "0" "9)))
      
    (define odai-lexer
      (lexer
       
       [whitespace (token 'WS lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start) #:skip? #t)]
       
       ["{" (token 'PROG-START lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["}" (token 'PROG-STOP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["$" (token 'DOLLAR lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       ["read" (token 'READ lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["write" (token 'WRITE lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       [";" (token 'DELIMIT lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       ["if" (token 'IF lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["then" (token 'THEN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       ["=" (token 'ASSIGN-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["+" (token 'ADD-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["-" (token 'SUB-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["(" (token 'OPENa-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       [")" (token 'CLOSE-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       ["goto" (token 'GOTO lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["gosub" (token 'GOSUB lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       ["return" (token 'RETURN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       
       [(:+ (:or lower-letter upper-letter)) (token 'ID lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
       [(:+ digit) (token 'DIGIT (string->number lexeme) #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]

       
       [any-char (token 'MISC lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]))
    (odai-lexer port))
  next-token)

(provide make-tokenizer)

I have tried adjusting the way I define the "ID" in my tokenizer, and I have also tried defining the grammar for ID in many different ways. Currently, I am simply calling it:

id : LETTER+

英文:

I am new to Racket and I am trying to tokenize a grammar using the Beautiful Racket library. I have defined the grammar in a separate file and it seems to be completely fine. I have also created a parser that uses the 'parse-to-datum' procedure in Beautiful Racket, which is also working fine. However, I am encountering an error with my tokenizer. As the parser encounters an ID such as 'A', it produces an error message:

Encountered unexpected token of type 'ID (value "A") while parsing 'unknown

I assume this error has to do with the way I am tokenizing IDs. Can you help me adjust my tokenizer to correctly handle IDs? Here is the specific grammar I am trying to parse:

10 read A 
20 read B 
30 gosub 400 
40 if C = 400 then write C 
50 if C = 0 then goto 1000 
400 C = A + B : return 
$$

here is my grammar:

program -> linelist $$ 
linelist -> line linelist | epsilon 
line -> idx stmt linetail* [EOL]
idx -> nonzero_digit digit* 
linetail -> :stmt | epsilon 
stmt -> id = expr | if expr then stmt | read id | write expr | goto idx | gosub idx | return
expr -> id etail | num etail | (expr)
etail -> + expr | - expr | = expr | epsilon
id -> [a-zA-Z]+
num -> numsign digit digit*
numsign -> + | - | epsilon 
nonzero_digit -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
digit -> 0 | nonzero_digit

Here is my tokenizer:

#lang br/quicklang
(require brag/support)
(define (make-tokenizer port)
(port-count-lines! port) ; get line data
(define (next-token)
(define-lex-abbrevs
(lower-letter (:/ "a" "z"))
(upper-letter (:/ #\A #\Z))
(digit (:/ "0" "9")))
(define odai-lexer
(lexer
[whitespace (token 'WS lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start) #:skip? #t)]
["{" (token 'PROG-START lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["}" (token 'PROG-STOP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["$" (token 'DOLLAR lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["read" (token 'READ lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["write" (token 'WRITE lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[";" (token 'DELIMIT lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["if" (token 'IF lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["then" (token 'THEN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["=" (token 'ASSIGN-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["+" (token 'ADD-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["-" (token 'SUB-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["(" (token 'OPENa-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[")" (token 'CLOSE-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["goto" (token 'GOTO lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["gosub" (token 'GOSUB lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["return" (token 'RETURN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[(:+ (:or lower-letter upper-letter)) (token 'ID lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[(:+ digit) (token 'DIGIT (string->number lexeme) #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[any-char (token 'MISC lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]))
(odai-lexer port))
next-token)
(provide make-tokenizer)

I have tried adjusting the way I define the "ID" in my tokenizer, and I have also tried defining the grammar for ID in many different ways. Currently, I am simply calling it:

id : LETTER+

答案1

得分: 1

The lexer produces an ID token, so try this rule in the parser:

id -> ID

英文:

The lexer produces an ID token, so try this rule in the parser:

   id -> ID

huangapple
  • 本文由 发表于 2023年3月4日 07:35:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/75632708.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定