英文:
How can I access the whole input from an arbitrary parser in a sequence?
问题
你正在处理DNS消息解析器的代码,主要使用了megaparsec
库。你遇到的问题是在处理记录(Record)部分的压缩情况时,需要考虑整个输入,但是在达到第三个解析器时,部分输入已经被消耗,导致提前到达输入末尾而解析失败。对于这种情况,有一些可能的解决方法和建议:
-
使用
M.lookAhead
: 尽管你提到了它可能会涉及到来回移动偏移量,但是在某些情况下,使用M.lookAhead
仍然是一个有效的方法。你可以尝试将其应用在前两个解析器上,以查看接下来的内容,而不会消耗输入。 -
手动管理偏移量: 如果
megaparsec
没有提供直接的解决方法,你可以尝试手动管理偏移量。在解析压缩部分之前,记录当前的偏移量,然后在处理完压缩部分后将偏移量还原到之前记录的位置。 -
考虑其他解析库: 如果
megaparsec
不够灵活,你可以考虑使用其他解析库,如attoparsec
或parsec
,它们可能提供更适合处理这种情况的API。 -
重构代码: 有时,重构代码可以使问题更容易解决。你可以考虑将解析记录的部分拆分成多个子解析器,以更好地处理压缩情况。
最终的解决方案可能取决于你的具体需求和代码结构。你可以尝试上述建议中的一个或多个,以找到适合你情况的最佳方法。如果你需要更多具体的代码示例或进一步的帮助,请随时提出。
英文:
I'm working through a DNS message parser. I have defined the following using megaparsec
:
Header
data DNSHeader = DNSHeader
{ hid :: !Word16,
hflags :: !Word16,
hnumQuestions :: !Word16,
hnumAnswers :: !Word16,
hnumAuthorities :: !Word16,
hnumAdditionals :: !Word16
}
deriving stock (Show)
-- >>> _debugBuilderOutput $ header2Bytes (DNSHeader 1 2 3 4 5 6)
-- "000100020003000400050006"
header2Bytes :: DNSHeader -> Builder
header2Bytes h =
word16BE (hid h)
<> word16BE (hflags h)
<> word16BE (hnumQuestions h)
<> word16BE (hnumAnswers h)
<> word16BE (hnumAuthorities h)
<> word16BE (hnumAdditionals h)
parseHeader :: M.Parsec Void B.ByteString DNSHeader
parseHeader =
DNSHeader
<$> M.word16be
<*> M.word16be
<*> M.word16be
<*> M.word16be
<*> M.word16be
<*> M.word16be
Question
data DNSQuestion = DNSQuestion
{ qname :: B.ByteString,
qtype :: !Word16,
qclass :: !Word16
}
deriving stock (Show)
parseQuestion :: M.Parsec Void B.ByteString DNSQuestion
parseQuestion =
DNSQuestion
<$> decodeDNSNameSimple
<*> M.word16be
<*> M.word16be
decodeDNSNameSimple :: M.Parsec Void B.ByteString B.ByteString
decodeDNSNameSimple = do
len <- M.word8
if len == 0
then pure mempty
else do
name <- B.pack <$> replicateM (fromIntegral len) M.word8
rest <- decodeDNSNameSimple
pure $ name <> (if B.null rest then mempty else "." <> rest)
Record
data DNSRecord = DNSRecord
{ rname :: B.ByteString,
rtype :: !Word16,
rclass :: !Word16,
rttl :: !Word32,
rdataLength :: !Word16,
rdata :: B.ByteString
}
deriving stock (Show)
parseRecord :: M.Parsec Void B.ByteString DNSRecord
parseRecord = do
name <- decodeDNSName
rtype <- M.word16be
rclass <- M.word16be
rttl <- M.word32be
rdataLength <- M.word16be
rdata <- B.pack <$> replicateM (fromIntegral rdataLength) M.word8
pure $ DNSRecord name rtype rclass rttl rdataLength rdata
decodeDNSName :: M.Parsec Void B.ByteString B.ByteString
decodeDNSName = do
len <- M.word8
if len == 0
then pure mempty
else
if (len .&. 0b1100_0000) == 0b1100_0000
then decodeCompressedDNSName len
else do
name <- B.pack <$> replicateM (fromIntegral len) M.word8
rest <- decodeDNSName
pure $ name <> (if B.null rest then mempty else "." <> rest)
decodeCompressedDNSName :: Word8 -> M.Parsec Void B.ByteString B.ByteString
decodeCompressedDNSName l = do
offset' <- M.word8
let bytes = ((fromIntegral l :: Word16) .&. 0b0011_1111) `shiftL` 8
pointer = bytes .|. (fromIntegral offset' :: Word16)
currentPos <- M.getOffset
-- TODO: get to the offset defined by pointer (considering the whole input)
-- M.setOffset (fromIntegral pointer) ???
result <- decodeDNSName
M.setOffset currentPos
pure result
Parsing header, question and record
parseDNSResponse :: M.Parsec Void B.ByteString (DNSHeader, DNSQuestion, DNSRecord)
parseDNSResponse = do
header <- parseHeader
question <- parseQuestion
record <- parseRecord
pure (header, question, record)
I'm currently stuck in the last function of the Record section, which handles possible compression (len == 0b1100_0000
). I need to consider the whole input when moving the offset, and when I get to the third parser part of the input has already been consumed, so I reach end of input early and the parsing fails. Doing a M.lookAhead
of the 2 first parsers would require going back and forth with the offsets. I have tried a few things without success and I'm getting a bit lost. Am I in the right direction here? Do you have a recommendation?
This is the example I'm trying to parse:
_exampleResponse :: B.ByteString
_exampleResponse = Base16.decodeLenient "e35d8180000100010000000003777777076578616d706c6503636f6d0000010001c00c000100010000508900045db8d822"
Header and question parsing works otherwise.
I could consider moving to other methods or parsers (attoparsec
, etc) if they have an API more aligned with my use case, so feel free to suggest alternatives.
答案1
得分: 1
以下是您要翻译的部分:
"Turns out you can do this with megaparsec
, as it allows manipulating parser state without consuming the inputs in the process. Not sure if there are other alternatives that offer this.
Calling getInput
at the very beginning of your parser gives the whole input, to which you can then apply a modified parseRecord
so you can modify the parser input mid-parsing, do what you need, and then restore the previous state:"
英文:
Turns out you can do this with megaparsec
, as it allows manipulating parser state without consuming the inputs in the process. Not sure if there are other alternatives that offer this.
Calling getInput
at the very beginning of your parser gives the whole input, to which you can then apply a modified parseRecord
so you can modify the parser input mid-parsing, do what you need, and then restore the previous state:
import Text.Megaparsec qualified as M
type DNSParser a = M.Parsec Void B.ByteString a -- For readability
parseDNSPacket :: DNSParser DNSPacket
parseDNSPacket = do
fullInput <- M.getInput -- 1. Get input to this parser
let parseRecord' = parseRecord fullInput -- 2. Apply the parser to it
header <- parseHeader
-- ...
answers <- replicateM (fromIntegral $ hnumAnswers header) parseRecord'
-- ... ...
parseRecord :: B.ByteString -> DNSParser DNSRecord
parseRecord fullInput = do
name <- decodeDNSName fullInput
-- ... ...
decodeDNSName :: B.ByteString -> DNSParser B.ByteString
decodeDNSName input = do
len <- M.word8
-- ...
if (len .&. 0b1100_0000) == 0b1100_0000
then decodeCompressedDNSName input len
else do
-- ... ...
decodeCompressedDNSName :: B.ByteString -> Word8 -> DNSParser B.ByteString
decodeCompressedDNSName input len = do
offset' <- fromIntegral <$> M.word8
let bytes = ((fromIntegral len :: Word16) .&. 0b0011_1111) `shiftL` 8
pointer = fromIntegral (bytes .|. offset')
currentPos <- M.getOffset -- 3. Save the current input
M.setInput input -- 4. Set input to the argument `i`
M.skipCount pointer M.word8 -- 5. Perform the full input parsing
name <- decodeDNSName input
M.setInput currentInput -- 6. Restore the previous input
pure name -- 7. Profit!
Now it passes simple lookup tests!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论