英文:
HXT: Handling byte order mark from HTTP response body
问题
使用HXT解析HTTP调用的XML响应体时,发现响应体可能包含XML字节顺序标记(BOM),导致出现错误消息。您想知道如何在可能存在BOM的情况下解析XML,而不会打印错误消息。
您可以尝试使用以下方法来解决这个问题:
-
去除BOM:在解析XML之前,您可以检查响应体是否以BOM开头,并在需要时去除它。这可以通过检查前几个字节是否为BOM的方式实现。然后,再将处理过的响应体传递给XML解析器。
-
配置XML解析器:某些XML解析器允许您配置其行为以处理BOM。您可以查看HXT文档以了解是否有类似的配置选项,以便在解析XML时不打印错误消息。
这两种方法都可以帮助您在解析XML时避免打印BOM相关的错误消息。
英文:
Using HXT I'm parsing the XML response body of an HTTP call that was made using http-conduit.
val <- runX $ readString [withValidate no] (Data.ByteString.UTF8.toString . toStrict $ getResponseBody response) >>> getChildren >>> ...
Depending on the version of the API, I found that the response body includes a byte order mark before the XML:
error: ""279<?xml version=\"1.0\" encoding=\"utf-8\"?><Enume..."" (line 1, column 1):
unexpected "279"
expecting xml declaration, comment, processing instruction, "<!DOCTYPE" or "<"
Since the BOM may or may not be there, I did the following:
...
let resBody = Data.ByteString.UTF8.toString . toStrict $ getResponseBody response
parseBody body = runX $ readString [withValidate no] body >>> getChildren >>> ...
xs <- parseBody resBody
val <- case xs of
x : _ -> pure x
_ -> head <$> (parseBody $ drop 1 resBody)
...
It works, but it's printing the error message when the BOM is present. What are the options for parsing the XML with a possible BOM so that it's not printing error messages?
答案1
得分: 0
Sure, here is the translated content:
好的,鉴于你愿意假设编码是UTF-8,就像你在这里所做的一样,那么可能最简单的方法就是进行模式匹配以丢弃BOM:
match toString ... of
'\65279':s -> s
s -> s
另外,我刚刚查阅了XML规范,以查看编码应该如何处理,让我说一下:咦,恶心。似乎没有一种编码无关的方式来指定要使用的编码,所以在解析过程中唯一真正正确和健壮的方法是尝试多次,希望有一个成功。
英文:
Okay, given that you're willing to assume the encoding is UTF-8 as you do here, then probably the simplest is to just pattern match to discard a BOM:
case toString ... of
'279':s -> s
s -> s
As an aside, having just looked through the XML spec to see how encodings are supposed to be handled, let me just say: eugh, gross. There appears to be no encoding-agnostic way to specify what encoding to use, so the only really correct, robust thing to do is try a bunch during parsing and hope one succeeds.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论