英文:
Trying to make a lexer using Beautiful Racket
问题
I am new to Racket and I am trying to tokenize a grammar using the Beautiful Racket library. I have defined the grammar in a separate file and it seems to be completely fine. I have also created a parser that uses the 'parse-to-datum' procedure in Beautiful Racket, which is also working fine. However, I am encountering an error with my tokenizer. As the parser encounters an ID such as 'A', it produces an error message:
Encountered unexpected token of type 'ID (value "A") while parsing 'unknown
I assume this error has to do with the way I am tokenizing IDs. Can you help me adjust my tokenizer to correctly handle IDs? Here is the specific grammar I am trying to parse:
10 read A
20 read B
30 gosub 400
40 if C = 400 then write C
50 if C = 0 then goto 1000
400 C = A + B : return
$$
here is my grammar:
program -> linelist $$
linelist -> line linelist | epsilon
line -> idx stmt linetail* [EOL]
idx -> nonzero_digit digit*
linetail -> :stmt | epsilon
stmt -> id = expr | if expr then stmt | read id | write expr | goto idx | gosub idx | return
expr -> id etail | num etail | (expr)
etail -> + expr | - expr | = expr | epsilon
id -> [a-zA-Z]+
num -> numsign digit digit*
numsign -> + | - | epsilon
nonzero_digit -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
digit -> 0 | nonzero_digit
Here is my tokenizer:
#lang br/quicklang
(require brag/support)
(define (make-tokenizer port)
(port-count-lines! port) ; get line data
(define (next-token)
(define-lex-abbrevs
(lower-letter (:/ "a" "z"))
(upper-letter (:/ #\A #\Z))
(digit (:/ "0" "9)))
(define odai-lexer
(lexer
[whitespace (token 'WS lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start) #:skip? #t)]
["{" (token 'PROG-START lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["}" (token 'PROG-STOP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["$" (token 'DOLLAR lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["read" (token 'READ lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["write" (token 'WRITE lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[";" (token 'DELIMIT lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["if" (token 'IF lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["then" (token 'THEN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["=" (token 'ASSIGN-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["+" (token 'ADD-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["-" (token 'SUB-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["(" (token 'OPENa-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[")" (token 'CLOSE-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["goto" (token 'GOTO lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["gosub" (token 'GOSUB lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["return" (token 'RETURN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[(:+ (:or lower-letter upper-letter)) (token 'ID lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[(:+ digit) (token 'DIGIT (string->number lexeme) #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[any-char (token 'MISC lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]))
(odai-lexer port))
next-token)
(provide make-tokenizer)
I have tried adjusting the way I define the "ID" in my tokenizer, and I have also tried defining the grammar for ID in many different ways. Currently, I am simply calling it:
id : LETTER+
英文:
I am new to Racket and I am trying to tokenize a grammar using the Beautiful Racket library. I have defined the grammar in a separate file and it seems to be completely fine. I have also created a parser that uses the 'parse-to-datum' procedure in Beautiful Racket, which is also working fine. However, I am encountering an error with my tokenizer. As the parser encounters an ID such as 'A', it produces an error message:
Encountered unexpected token of type 'ID (value "A") while parsing 'unknown
I assume this error has to do with the way I am tokenizing IDs. Can you help me adjust my tokenizer to correctly handle IDs? Here is the specific grammar I am trying to parse:
10 read A
20 read B
30 gosub 400
40 if C = 400 then write C
50 if C = 0 then goto 1000
400 C = A + B : return
$$
here is my grammar:
program -> linelist $$
linelist -> line linelist | epsilon
line -> idx stmt linetail* [EOL]
idx -> nonzero_digit digit*
linetail -> :stmt | epsilon
stmt -> id = expr | if expr then stmt | read id | write expr | goto idx | gosub idx | return
expr -> id etail | num etail | (expr)
etail -> + expr | - expr | = expr | epsilon
id -> [a-zA-Z]+
num -> numsign digit digit*
numsign -> + | - | epsilon
nonzero_digit -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
digit -> 0 | nonzero_digit
Here is my tokenizer:
#lang br/quicklang
(require brag/support)
(define (make-tokenizer port)
(port-count-lines! port) ; get line data
(define (next-token)
(define-lex-abbrevs
(lower-letter (:/ "a" "z"))
(upper-letter (:/ #\A #\Z))
(digit (:/ "0" "9")))
(define odai-lexer
(lexer
[whitespace (token 'WS lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start) #:skip? #t)]
["{" (token 'PROG-START lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["}" (token 'PROG-STOP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["$" (token 'DOLLAR lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["read" (token 'READ lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["write" (token 'WRITE lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[";" (token 'DELIMIT lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["if" (token 'IF lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["then" (token 'THEN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["=" (token 'ASSIGN-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["+" (token 'ADD-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["-" (token 'SUB-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["(" (token 'OPENa-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[")" (token 'CLOSE-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["goto" (token 'GOTO lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["gosub" (token 'GOSUB lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
["return" (token 'RETURN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[(:+ (:or lower-letter upper-letter)) (token 'ID lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[(:+ digit) (token 'DIGIT (string->number lexeme) #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
[any-char (token 'MISC lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]))
(odai-lexer port))
next-token)
(provide make-tokenizer)
I have tried adjusting the way I define the "ID" in my tokenizer, and I have also tried defining the grammar for ID in many different ways. Currently, I am simply calling it:
id : LETTER+
答案1
得分: 1
The lexer produces an ID
token, so try this rule in the parser:
id -> ID
英文:
The lexer produces an ID
token, so try this rule in the parser:
id -> ID
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论