Trying to make a lexer using Beautiful Racket

huangapple go评论149阅读模式
英文:

Trying to make a lexer using Beautiful Racket

问题

I am new to Racket and I am trying to tokenize a grammar using the Beautiful Racket library. I have defined the grammar in a separate file and it seems to be completely fine. I have also created a parser that uses the 'parse-to-datum' procedure in Beautiful Racket, which is also working fine. However, I am encountering an error with my tokenizer. As the parser encounters an ID such as 'A', it produces an error message:

Encountered unexpected token of type 'ID (value "A") while parsing 'unknown

I assume this error has to do with the way I am tokenizing IDs. Can you help me adjust my tokenizer to correctly handle IDs? Here is the specific grammar I am trying to parse:

  1. 10 read A
  2. 20 read B
  3. 30 gosub 400
  4. 40 if C = 400 then write C
  5. 50 if C = 0 then goto 1000
  6. 400 C = A + B : return
  7. $$

here is my grammar:

  1. program -> linelist $$
  2. linelist -> line linelist | epsilon
  3. line -> idx stmt linetail* [EOL]
  4. idx -> nonzero_digit digit*
  5. linetail -> :stmt | epsilon
  6. stmt -> id = expr | if expr then stmt | read id | write expr | goto idx | gosub idx | return
  7. expr -> id etail | num etail | (expr)
  8. etail -> + expr | - expr | = expr | epsilon
  9. id -> [a-zA-Z]+
  10. num -> numsign digit digit*
  11. numsign -> + | - | epsilon
  12. nonzero_digit -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
  13. digit -> 0 | nonzero_digit

Here is my tokenizer:

  1. #lang br/quicklang
  2. (require brag/support)
  3. (define (make-tokenizer port)
  4. (port-count-lines! port) ; get line data
  5. (define (next-token)
  6. (define-lex-abbrevs
  7. (lower-letter (:/ "a" "z"))
  8. (upper-letter (:/ #\A #\Z))
  9. (digit (:/ "0" "9)))
  10. (define odai-lexer
  11. (lexer
  12. [whitespace (token 'WS lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start) #:skip? #t)]
  13. ["{" (token 'PROG-START lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  14. ["}" (token 'PROG-STOP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  15. ["$" (token 'DOLLAR lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  16. ["read" (token 'READ lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  17. ["write" (token 'WRITE lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  18. [";" (token 'DELIMIT lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  19. ["if" (token 'IF lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  20. ["then" (token 'THEN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  21. ["=" (token 'ASSIGN-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  22. ["+" (token 'ADD-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  23. ["-" (token 'SUB-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  24. ["(" (token 'OPENa-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  25. [")" (token 'CLOSE-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  26. ["goto" (token 'GOTO lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  27. ["gosub" (token 'GOSUB lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  28. ["return" (token 'RETURN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  29. [(:+ (:or lower-letter upper-letter)) (token 'ID lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  30. [(:+ digit) (token 'DIGIT (string->number lexeme) #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  31. [any-char (token 'MISC lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]))
  32. (odai-lexer port))
  33. next-token)
  34. (provide make-tokenizer)

I have tried adjusting the way I define the "ID" in my tokenizer, and I have also tried defining the grammar for ID in many different ways. Currently, I am simply calling it:

id : LETTER+

英文:

I am new to Racket and I am trying to tokenize a grammar using the Beautiful Racket library. I have defined the grammar in a separate file and it seems to be completely fine. I have also created a parser that uses the 'parse-to-datum' procedure in Beautiful Racket, which is also working fine. However, I am encountering an error with my tokenizer. As the parser encounters an ID such as 'A', it produces an error message:

Encountered unexpected token of type 'ID (value "A") while parsing 'unknown

I assume this error has to do with the way I am tokenizing IDs. Can you help me adjust my tokenizer to correctly handle IDs? Here is the specific grammar I am trying to parse:

  1. 10 read A
  2. 20 read B
  3. 30 gosub 400
  4. 40 if C = 400 then write C
  5. 50 if C = 0 then goto 1000
  6. 400 C = A + B : return
  7. $$

here is my grammar:

  1. program -> linelist $$
  2. linelist -> line linelist | epsilon
  3. line -> idx stmt linetail* [EOL]
  4. idx -> nonzero_digit digit*
  5. linetail -> :stmt | epsilon
  6. stmt -> id = expr | if expr then stmt | read id | write expr | goto idx | gosub idx | return
  7. expr -> id etail | num etail | (expr)
  8. etail -> + expr | - expr | = expr | epsilon
  9. id -> [a-zA-Z]+
  10. num -> numsign digit digit*
  11. numsign -> + | - | epsilon
  12. nonzero_digit -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
  13. digit -> 0 | nonzero_digit

Here is my tokenizer:

  1. #lang br/quicklang
  2. (require brag/support)
  3. (define (make-tokenizer port)
  4. (port-count-lines! port) ; get line data
  5. (define (next-token)
  6. (define-lex-abbrevs
  7. (lower-letter (:/ "a" "z"))
  8. (upper-letter (:/ #\A #\Z))
  9. (digit (:/ "0" "9")))
  10. (define odai-lexer
  11. (lexer
  12. [whitespace (token 'WS lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start) #:skip? #t)]
  13. ["{" (token 'PROG-START lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  14. ["}" (token 'PROG-STOP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  15. ["$" (token 'DOLLAR lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  16. ["read" (token 'READ lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  17. ["write" (token 'WRITE lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  18. [";" (token 'DELIMIT lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  19. ["if" (token 'IF lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  20. ["then" (token 'THEN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  21. ["=" (token 'ASSIGN-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  22. ["+" (token 'ADD-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  23. ["-" (token 'SUB-OP lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  24. ["(" (token 'OPENa-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  25. [")" (token 'CLOSE-PAREN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  26. ["goto" (token 'GOTO lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  27. ["gosub" (token 'GOSUB lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  28. ["return" (token 'RETURN lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  29. [(:+ (:or lower-letter upper-letter)) (token 'ID lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  30. [(:+ digit) (token 'DIGIT (string->number lexeme) #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]
  31. [any-char (token 'MISC lexeme #:position (+ (pos lexeme-start)) #:line (line lexeme-start))]))
  32. (odai-lexer port))
  33. next-token)
  34. (provide make-tokenizer)

I have tried adjusting the way I define the "ID" in my tokenizer, and I have also tried defining the grammar for ID in many different ways. Currently, I am simply calling it:

id : LETTER+

答案1

得分: 1

The lexer produces an ID token, so try this rule in the parser:

id -> ID

英文:

The lexer produces an ID token, so try this rule in the parser:

  1. id -> ID

huangapple
  • 本文由 发表于 2023年3月4日 07:35:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/75632708.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定