问题

使用HXT解析HTTP调用的XML响应体时，发现响应体可能包含XML字节顺序标记（BOM），导致出现错误消息。您想知道如何在可能存在BOM的情况下解析XML，而不会打印错误消息。

您可以尝试使用以下方法来解决这个问题：

去除BOM：在解析XML之前，您可以检查响应体是否以BOM开头，并在需要时去除它。这可以通过检查前几个字节是否为BOM的方式实现。然后，再将处理过的响应体传递给XML解析器。
配置XML解析器：某些XML解析器允许您配置其行为以处理BOM。您可以查看HXT文档以了解是否有类似的配置选项，以便在解析XML时不打印错误消息。

这两种方法都可以帮助您在解析XML时避免打印BOM相关的错误消息。

英文:

Using HXT I'm parsing the XML response body of an HTTP call that was made using http-conduit.

val &lt;- runX $ readString [withValidate no] (Data.ByteString.UTF8.toString . toStrict $ getResponseBody response) &gt;&gt;&gt; getChildren &gt;&gt;&gt; ...

Depending on the version of the API, I found that the response body includes a byte order mark before the XML:

error: &quot;&quot;279&lt;?xml version=\&quot;1.0\&quot; encoding=\&quot;utf-8\&quot;?&gt;&lt;Enume...&quot;&quot; (line 1, column 1):
unexpected &quot;279&quot;
expecting xml declaration, comment, processing instruction, &quot;&lt;!DOCTYPE&quot; or &quot;&lt;&quot;

Since the BOM may or may not be there, I did the following:

...
let resBody = Data.ByteString.UTF8.toString . toStrict $ getResponseBody response
    parseBody body = runX $ readString [withValidate no] body &gt;&gt;&gt; getChildren &gt;&gt;&gt; ...
xs &lt;- parseBody resBody
val &lt;- case xs of
  x : _ -&gt; pure x
  _ -&gt; head &lt;$&gt; (parseBody $ drop 1 resBody)
...

It works, but it's printing the error message when the BOM is present. What are the options for parsing the XML with a possible BOM so that it's not printing error messages?

答案1

得分: 0

Sure, here is the translated content:

好的，鉴于你愿意假设编码是UTF-8，就像你在这里所做的一样，那么可能最简单的方法就是进行模式匹配以丢弃BOM：

match toString ... of
    '\65279':s -> s
    s -> s

另外，我刚刚查阅了XML规范，以查看编码应该如何处理，让我说一下：咦，恶心。似乎没有一种编码无关的方式来指定要使用的编码，所以在解析过程中唯一真正正确和健壮的方法是尝试多次，希望有一个成功。

英文:

Okay, given that you're willing to assume the encoding is UTF-8 as you do here, then probably the simplest is to just pattern match to discard a BOM:

case toString ... of
    &#39;279&#39;:s -&gt; s
    s -&gt; s

As an aside, having just looked through the XML spec to see how encodings are supposed to be handled, let me just say: eugh, gross. There appears to be no encoding-agnostic way to specify what encoding to use, so the only really correct, robust thing to do is try a bunch during parsing and hope one succeeds.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

处理来自HTTP响应体的字节顺序标记

问题

答案1

No Store instance for Aeson object.

成本中心适用于通用派生实例吗？

使用Earley库进行带有特征和统一性的解析。

Haskell Chesshs and Array Error How To Define `board :: Array (Int, Int) (Maybe Piece)`

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论