cXML .net XMLReader 错误:参数实体替代文本必须正确嵌套在标记声明内。

huangapple go评论54阅读模式
英文:

cXML .net XMLReader error The parameter entity replacement text must nest properly within markup declarations

问题

在XmlReaderSettings中是否有一些配置需要更改,以鼓励.NET (4.8, 6, 7) 处理一些cXML而不引发以下异常:

未处理的异常。System.Xml.Schema.XmlSchemaException:参数实体替代文本必须在标记声明内正确嵌套。

示例的cXML输入如下:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE cXML SYSTEM "http://xml.cxml.org/schemas/cXML/1.2.041/cXML.dtd">
<cXML payloadID="donkeys@example.com" timestamp="2023-02-13T01:01:01Z">
  <Header>
  </Header>
  <Request deploymentMode="production">
  </Request>
</cXML>

示例应用程序如下:

using System.Xml;
using System.Xml.Linq;

namespace Donkeys
{
    internal class Program
    {
        static void Main()
        {
            XmlReaderSettings settings = new()
            {
                XmlResolver = new XmlUrlResolver(),
                DtdProcessing = DtdProcessing.Parse,
                ValidationType = ValidationType.DTD,
            };

            FileStream fs = File.OpenRead("test.xml"); // 问题中的示例cXML
            XmlReader reader = XmlReader.Create(fs, settings);

            XDocument.Load(reader); // 这会导致错误
        }
    }
}

我想要使用XmlUrlResolver来缓存DTD,但不忽略验证。为什么我会得到上面的错误,我不太确定?

到目前为止,我尝试了不同的验证标志,但除非使用ValidationType.DTD,否则根本不进行验证,但这会导致问题。

实际的解析器似乎运行正常;如果我对它进行子类化,它会按预期返回DTD(作为MemoryStream)。

我可以添加一个事件处理程序来忽略问题,但这感觉不如我所希望的那样优雅。

using System.Xml;
using System.Xml.Linq;

namespace Donkeys
{
    internal class Program
    {
        static void Main()
        {
            XmlReaderSettings settings = new()
            {
                XmlResolver = new XmlUrlResolver(),
                DtdProcessing = DtdProcessing.Parse,
                ValidationType = ValidationType.DTD,
                IgnoreComments = true
            };

            settings.ValidationEventHandler += Settings_ValidationEventHandler;

            FileStream fs = File.OpenRead("test.xml");
            XmlReader reader = XmlReader.Create(fs, settings);

            XDocument dogs = XDocument.Load(reader);
         }

        private static void Settings_ValidationEventHandler(object? sender, System.Xml.Schema.ValidationEventArgs e)
        {
            // 这似乎很脆弱
            if (e.Message.ToLower() == "The parameter entity replacement text must nest properly within markup declarations.".ToLower()) // 并且这将是一个常量
                return;

            throw e.Exception;
        }
    }
}
英文:

Is there something I need to configure in the XmlReaderSettings to encourage .net (4.8, 6, 7) to handle some cXML without throwing the following exception:

Unhandled exception. System.Xml.Schema.XmlSchemaException: The parameter entity replacement text must nest properly within markup declarations.

Sample cXML input

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE cXML SYSTEM "http://xml.cxml.org/schemas/cXML/1.2.041/cXML.dtd">
<cXML payloadID="donkeys@example.com" timestamp="2023-02-13T01:01:01Z">
  <Header>
  </Header>
  <Request deploymentMode="production">
  </Request>
</cXML>

Sample Application

using System.Xml;
using System.Xml.Linq;

namespace Donkeys
{
    internal class Program
    {
        static void Main()
        {
            XmlReaderSettings settings = new()
            {
                XmlResolver = new XmlUrlResolver(),
                DtdProcessing = DtdProcessing.Parse,
                ValidationType = ValidationType.DTD,
            };

            FileStream fs = File.OpenRead("test.xml"); // sample cXML from question
            XmlReader reader = XmlReader.Create(fs, settings);

            XDocument.Load(reader); // this blows up
        }
    }
}

I'm looking to use the XmlUrlResolver to cache the DTDs but without ignoring the validation I get the error above but i'm not really sure why?

So far I've tried different validation flags but they don't validate at all unless I use ValidationType.DTD which goes pop.

The actual resolver seems to work fine; if I subclass it, it is returning the DTD (as a MemoryStream) as expected.

I can add an event handler to ignore the issue but this feels lamer than I'd like.

using System.Xml;
using System.Xml.Linq;

namespace Donkeys
{
    internal class Program
    {
        static void Main()
        {
            XmlReaderSettings settings = new()
            {
                XmlResolver = new XmlUrlResolver(),
                DtdProcessing = DtdProcessing.Parse,
                ValidationType = ValidationType.DTD,
                IgnoreComments = true
            };

            settings.ValidationEventHandler += Settings_ValidationEventHandler;

            FileStream fs = File.OpenRead("test.xml");
            XmlReader reader = XmlReader.Create(fs, settings);

            XDocument dogs = XDocument.Load(reader);
         }

        private static void Settings_ValidationEventHandler(object? sender, System.Xml.Schema.ValidationEventArgs e)
        {
            // this seems fragile
            if (e.Message.ToLower() == "The parameter entity replacement text must nest properly within markup declarations.".ToLower()) // and this would be a const
                return;

            throw e.Exception;
        }
    }
}

答案1

得分: 1

以下是您要翻译的文本:

"I've spent some time over the last few days looking into this and trying to get my head around what's going on here.

As far as I can tell, the error The parameter entity replacement text must nest properly within markup declarations is being reported incorrectly. My understanding of the spec is that this message means that you have mismatched < and > elements in the replacement text of a parameter entity in a DTD.

The following example is taken from this O'Reilly book sample page and demonstrates something that genuinely should reproduce this error:

<!ENTITY % finish_it ">">
<!ENTITY % bad "won't work" %finish_it;

Indeed the .NET DTD parser reports the same error for these two lines of DTD.

This doesn't mean you can't have < and > characters in parameter entity replacement text at all: the following two lines will declare an empty element with name Z, albeit in a somewhat round-about way:

<!ENTITY % Nested "<!ELEMENT Z EMPTY>">
%Nested;

The .NET DTD parser parses this successfully.

However, the .NET DTD parser appears to be objecting to this line in the cXML DTD, which defines the Object.ANY parameter entity:

<!ENTITY % Object.ANY '|xades:QualifyingProperties|cXMLSignedInfo|Extrinsic'>

There are of course no < and > characters in the replacement text, so the error is baffling.

This is by no means a new problem. I found this unanswered Stack Overflow question which basically reports the same problem. Also, this MSDN Forum post basically has the same problem, and it was asked in 2007. So is this unclear but intentional behaviour, or a bug that has been in .NET for 15+ years? I don't know.

For those who do want to look into things further, the following is about the minimum necessary to reproduce the problem. The necessary C# code to read the XML file can be taken from the question and adapted, I don't see the need to repeat it here:

example.dtd:

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT A EMPTY>
<!ENTITY % Rest '|A' >
<!ELEMENT example (#PCDATA %Rest;)*>

example.xml:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE example SYSTEM "example.dtd">
<example/>

There are various ways to tweak this to get rid of the error. One way is to move the | character from the parameter entity into the ELEMENT example declaration. Replacing #PCDATA with another element (which you would also have to define) is another way.

<hr>

But enough of the theory behind the problem. How can you actually move forwards with this?

I would take a local copy of the cXML DTD and adjust it to work around this error. You can download the DTD from the URL in your sample cXML input. The %Object.ANY; parameter entity is only used once in the DTD: I would replace this one occurrence with the replacement text, |xades:QualifyingProperties|cXMLSignedInfo|Extrinsic.

You then need to adjust the .NET XML parser to use your modified copy of the cXML DTD instead of fetching the one from the given URL. You create a custom URL resolver for this, for example:

using System.Xml;

namespace Donkeys
{
    internal class CXmlUrlResolver : XmlResolver
    {
        private static readonly Uri CXml1_2_041 = new Uri("http://xml.cxml.org/schemas/cXML/1.2.041/cXML.dtd");

        private readonly XmlResolver urlResolver;

        public CXmlUrlResolver()
        {
            this.urlResolver = new XmlUrlResolver();
        }

        public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
        {
            if (absoluteUri == CXml1_2_041)
            {
                // Return a Stream that reads from your custom version of the DTD,
                // for example:
                return File.OpenRead(@"SomeFilePathHere\cXML-1.2.401.dtd");
            }

            return this.urlResolver.GetEntity(absoluteUri, role, ofObjectToReturn);
        }
    }
}

This checks to see what URI is being requested, and if it matches the cXML URI, returns a stream that reads from your customised copy of the DTD. If some other URI is given, it passes the request to the nested XMLResolver, which then deals with it. You will of course need to use an instance of CXmlUrlResolver instead of XmlUrlResolver() when creating your XmlReaderSettings.

I don't know how many versions of cXML you will have to deal with, but if you are dealing with multiple versions, you might have to create a custom copy of the DTD for each version, and have your resolver return the correct local copy for each different URI.

A similar approach is given at this MSDN Forums post from 2008, which also deals with difficulties parsing cXML with .NET. This features a custom URL resolver created by subclassing XmlUrlResolver. Those who prefer composition over inheritance may prefer my custom URL resolver instead."

英文:

I've spent some time over the last few days looking into this and trying to get my head around what's going on here.

As far as I can tell, the error The parameter entity replacement text must nest properly within markup declarations is being reported incorrectly. My understanding of the spec is that this message means that you have mismatched &lt; and &gt; elements in the replacement text of a parameter entity in a DTD.

The following example is taken from this O'Reilly book sample page and demonstrates something that genuinely should reproduce this error:

&lt;!ENTITY % finish_it &quot;&gt;&quot;&gt;
&lt;!ENTITY % bad &quot;won&#39;t work&quot; %finish_it;

Indeed the .NET DTD parser reports the same error for these two lines of DTD.

This doesn't mean you can't have &lt; and &gt; characters in parameter entity replacement text at all: the following two lines will declare an empty element with name Z, albeit in a somewhat round-about way:

&lt;!ENTITY % Nested &quot;&lt;!ELEMENT Z EMPTY&gt;&quot;&gt;
%Nested;

The .NET DTD parser parses this successfully.

However, the .NET DTD parser appears to be objecting to this line in the cXML DTD, which defines the Object.ANY parameter entity:

&lt;!ENTITY % Object.ANY &#39;|xades:QualifyingProperties|cXMLSignedInfo|Extrinsic&#39;&gt;

There are of course no &lt; and &gt; characters in the replacement text, so the error is baffling.

This is by no means a new problem. I found this unanswered Stack Overflow question which basically reports the same problem. Also, this MSDN Forum post basically has the same problem, and it was asked in 2007. So is this unclear but intentional behaviour, or a bug that has been in .NET for 15+ years? I don't know.

For those who do want to look into things further, the following is about the minimum necessary to reproduce the problem. The necessary C# code to read the XML file can be taken from the question and adapted, I don't see the need to repeat it here:

example.dtd:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;!ELEMENT A EMPTY&gt;
&lt;!ENTITY % Rest &#39;|A&#39; &gt;
&lt;!ELEMENT example (#PCDATA %Rest;)*&gt;

example.xml:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;!DOCTYPE example SYSTEM &quot;example.dtd&quot;&gt;
&lt;example/&gt;

There are various ways to tweak this to get rid of the error. One way is to move the | character from the parameter entity into the ELEMENT example declaration. Replacing #PCDATA with another element (which you would also have to define) is another way.

<hr>

But enough of the theory behind the problem. How can you actually move forwards with this?

I would take a local copy of the cXML DTD and adjust it to work around this error. You can download the DTD from the URL in your sample cXML input. The %Object.ANY; parameter entity is only used once in the DTD: I would replace this one occurrence with the replacement text, |xades:QualifyingProperties|cXMLSignedInfo|Extrinsic.

You then need to adjust the .NET XML parser to use your modified copy of the cXML DTD instead of fetching the the one from the given URL. You create a custom URL resolver for this, for example:

using System.Xml;

namespace Donkeys
{
    internal class CXmlUrlResolver : XmlResolver
    {
        private static readonly Uri CXml1_2_041 = new Uri(&quot;http://xml.cxml.org/schemas/cXML/1.2.041/cXML.dtd&quot;);

        private readonly XmlResolver urlResolver;

        public CXmlUrlResolver()
        {
            this.urlResolver = new XmlUrlResolver();
        }

        public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
        {
            if (absoluteUri == CXml1_2_041)
            {
                // Return a Stream that reads from your custom version of the DTD,
                // for example:
                return File.OpenRead(@&quot;SomeFilePathHere\cXML-1.2.401.dtd&quot;);
            }

            return this.urlResolver.GetEntity(absoluteUri, role, ofObjectToReturn);
        }
    }
}

This checks to see what URI is being requested, and if it matches the cXML URI, returns a stream that reads from your customised copy of the DTD. If some other URI is given, it passes the request to the nested XMLResolver, which then deals with it. You will of course need to use an instance of CXmlUrlResolver instead of XmlUrlResolver() when creating your XmlReaderSettings.

I don't know how many versions of cXML you will have to deal with, but if you are dealing with multiple versions, you might have to create a custom copy of the DTD for each version, and have your resolver return the correct local copy for each different URI.

A similar approach is given at this MSDN Forums post from 2008, which also deals with difficulties parsing cXML with .NET. This features a custom URL resolver created by subclassing XmlUrlResolver. Those who prefer composition over inheritance may prefer my custom URL resolver instead.

huangapple
  • 本文由 发表于 2023年2月14日 05:15:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75441251.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定