英文:
MS Word Web Add-In fails when search notation for special characters is used with matchWildcards set to true
问题
我正在尝试在使用Word的JavaScript Web插件进行的文档中进行复杂的字符串搜索。它的工作方式还可以,搜索以"XYZ "开头的字符串,然后是1到10个字母数字字符,一个句点,然后是1到10个字母数字字符的字符串。搜索字符串:
body.search('XYZ [0-9a-zA-Z]{1,10}.[0-9a-zA-Z]{1,10}', { matchWildcards: true });
...可以找到大多数,但会错过一些,因为它不识别硬编码的空格。如果我搜索类似的字符串:
('XYZ^w^#^#^#.^#^#^#', { matchWildcards: false});
...使用搜索符号表示特殊字符(特别是^w表示空格),那么这将捕获所有空格,但对于实际搜索来说太具体了。
每当我尝试将搜索符号与通配符结合使用,甚至只是在上述代码中将matchWildcards设置为true,都会引发一般性异常。是否没有办法将这些项组合在一起,或以其他方式在启用通配符的情况下指定空格,而不是硬编码空格?
注意:我仔细查看了实际字符,期望某些Unicode差异是问题的罪魁祸首。我甚至打开了文档并解析了XML。我无法找到实际字符本身的差异,尽管XML中有一些差异。
英文:
I'm trying to do a complex search for strings in a document using a JavaScript web add-in for Word. It's working okay, searching for strings beginning with "XYZ " and then having a string of 1 to 10 alphanumeric characters, a period and another string of 1 to 10 alphanumeric characters. The search string:
body.search('XYZ [0-9a-zA-Z]{1,10}.[0-9a-zA-Z]{1,10}', { matchWildcards: true });
...finds most of them but misses some because it doesn't recognize the hard-coded blank. If I search instead for a string like:
('XYZ^w^#^#^#.^#^#^#', { matchWildcards: false});
...using search notation for special characters (specifically ^w for whitespace) then that will catch all the whitespaces, but is too specific for a practical search.
Whenever I try to combine search notation with wildcards, or even if I just set matchWildcards to true in the above, I get a general exception. Is there no way to combine these terms or to otherwise designate a whitespace with wildcards enabled without hard coding the white space?
NOTE: I've looked carefully at the actual characters, expecting some unicode difference to be the culprit. I've even opened up the document and parsed the XML. I can't find a difference in the actual characters themselves, although there is some difference in the XML.
答案1
得分: 0
已找到解决方法,尽管我仍然希望能够解释为什么在将matchWildCards设置为true时不能使用特殊字符搜索符号。
在我的情况下,事实证明,一些空格被忽略的原因是因为其中一些是非断行空格(160),而不是常规空格(32)。我通过以下方式解决了这个问题:
var sSearchString = "XYZ[" + String.fromCharCode(32, 160) + "][0-9a-zA-Z]{1,10}.[0-9a-zA-Z]{1,10}";
searchResults = body.search(sSearchString, { matchWildcards: true });
英文:
Figured it out, although I'd still like an explanation for why you can't use special character search notation with matchWildCards set to true.
In my case, as it turned out, the reason some whitespaces were being missed is because some were non-breaking blank spaces (160) instead of regular blank spaces (32). I solved it by looking for either as the fourth character of the search string thusly:
var sSearchString = "XYZ[" + String.fromCharCode(32, 160) + "][0-9a-zA-Z]{1,10}.[0-9a-zA-Z]{1,10}";
searchResults = body.search(sSearchString, { matchWildcards: true });
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论