如何在受保护的 Word 文档中获取所需的范围(NetOffice.Word C#)?

huangapple go评论76阅读模式
英文:

How can I get desired range in the protected word document (NetOffice.Word C#)?

问题

我有一个受保护的文档 (doc.ProtectionType == wdAllowOnlyFormFields)。它有一些可以编辑的区域,其他所有内容甚至连复制都受保护。我正在使用 NetOffice.Word 库,尝试以编程方式查找文本并在找到的范围内创建书签。问题是,当我尝试调用方法 wordDoc.Content.Duplicate.Find.Execute(smthParams) 时,会出现异常 "COMException: This method or property is not available because the object refers to a protected area of the document."。但是,我可以手动获取任何文本范围而没有任何问题:

var range = doc.Content.Duplicate;
range.SetRange(start, end);

通过这种方式获取的范围,我可以轻松创建书签。但是,我无法以这种方式找到所要查找的文本范围。我尝试以以下方式创建书签:

public void CreateBookmarkTest()
{
    Document doc = Context.WordDocument;

    var searchText = "smth text";
    var bookmarkName = "newBookmark";
    
    using Range docRange = doc.Content.Duplicate;

    foreach (var paragraph in docRange.Paragraphs)
    {
        using Range paragraphRange = paragraph.Range;
        var text = paragraphRange.Text;
        var startParagraph = paragraphRange.Start;
        var endParagraph = paragraphRange.End;

        var startIndex = text.IndexOf(searchText);
        if (startIndex >= 0)
        {
            text = GetParagraphTextWithHiddenSymbols(paragraphRange, text);
            startIndex = text.IndexOf(searchText);
            var startFoundRange = startParagraph + startIndex;
            var end = startFoundRange + searchText.Length;

            paragraphRange.SetRange(startFoundRange, end);

            var foundText = paragraphRange.Text;
            if (foundText == searchText)
            {
                doc.Bookmarks.Add(bookmarkName, paragraphRange);
                break;
            }
        }
    }
}

private string GetParagraphTextWithHiddenSymbols(Range paragraphRange, string initialText)
{
    var text = initialText;
    foreach (Field field in paragraphRange.Fields)
    {
        int index = text.IndexOf(field.Result.Text);
        if (index >= 0)
        {
            text = text.Replace(field.Result.Text, $"{{{field.Code.Text}}} {field.Result.Text}{(char)21}");
        }
    }
    return text;
}

问题是,在这种情况下,不总是 foundText == searchText。有时 foundText 偏移,我还不能弄清楚如何修复它。而且这种方式对我来说似乎慢而不够优化。也许有一种正确实现搜索和文本替换的方法(最好通过 Find.Execute)。我还想知道是否有一种方法可以获取允许编辑的区域(或仅查找当前 Range 是否允许编辑)?

我尝试根据下面回答中的 Oscar 的想法进行转换。代码效果要好得多,但在包含许多未受保护输入字段的大段落上也会出现问题。

非常感谢你的帮助,朋友!

英文:

I have a protected document (doc.ProtectionType == wdAllowOnlyFormFields). It has areas that can be edited. Everything else is protected even from copying. I'm using the NetOffice.Word library and I'm trying to programmatically find text and create a bookmark in the found range. The problem is that when I try to call the method wordDoc.Content.Duplicate.Find.Execute(smthParams), the exception "COMException: This method or property is not available because the object refers to a protected area of the document." occurs. And, I can get any range of text manually without any problems:

var range = doc.Content.Duplicate;
range.SetRange(start, end);

In a range obtained in this way, I can create a bookmark with no problem. But I can't find the range corresponding to the text I'm looking for in this way. I am trying to create a bookmark this way:

public void CreateBookmarkTest()
{
    Document doc = Context.WordDocument;

    var searchText = "smth text";
    var bookmarkName = "newBookmark";
    
    using Range docRange = doc.Content.Duplicate;

    foreach (var paragraph in docRange.Paragraphs)
    {
        using Range paragraphRange = paragraph.Range;
        var text = paragraphRange.Text;
        var startParagraph = paragraphRange.Start;
        var endParagraph = paragraphRange.End;

        var startIndex = text.IndexOf(searchText);
        if (startIndex >= 0)
        {
            text = GetParagraphTextWithHiddenSymbols(paragraphRange, text);
            startIndex = text.IndexOf(searchText);
            var startFoundRange = startParagraph + startIndex;
            var end = startFoundRange + searchText.Length;

            paragraphRange.SetRange(startFoundRange, end);

            var foundText = paragraphRange.Text;
            if (foundText == searchText)
            {
                doc.Bookmarks.Add(bookmarkName, paragraphRange);
                break;
            }
        }
    }
}

private string GetParagraphTextWithHiddenSymbols(Range paragraphRange, string initialText)
{
    var text = initialText;
    foreach (Field field in paragraphRange.Fields)
    {
        int index = text.IndexOf(field.Result.Text);
        if (index >= 0)
        {
            text = text.Replace(field.Result.Text, $"{{{field.Code.Text}}} {field.Result.Text}{(char)21}");
        }
    }
    return text;
}

The problem is that, in this case, not always foundText == searchText. Sometimes foundText is offset and I can't figure out how to fix it yet. And this way seems to me slow and suboptimal. Perhaps there is some way to correctly implement search and text replacement (it would be ideal through Find.Execute). I'm also wondering if there's any way to get the areas allowed for editing (or just find out if the current Range is allowed for editing or not)?

I tried to convert using Oscar's idea from the answer below. The code works much better, but it also bugs out on large paragraphs with lots of unprotected input fields.

Thanks a lot for your help, friend!

答案1

得分: 0

以下是翻译好的部分:

"hidden text like Field's code text" 中隐藏的文本,会导致这个问题。不管是使用 NetOffice、Microsoft.Office.Interop.Word 还是 VBA,都可以尝试我的代码。虽然目前它还不是一个完美的解决方案,但请注意下面这个代码块:

if (range.Text != searchText)
{
    Console.WriteLine(range.Text);
    System.Diagnostics.Debugger.Break();
}

至少它指出了调试的方向,知道了问题所在。您可以按照这个方向进一步完善代码。

using NetOffice.WordApi.Enums;
using Word = NetOffice.WordApi;

Test();

// 以下的代码仅适用于文档正文内容,不包括脚注、批注、页眉、页脚等文档的其他部分。
void Test()
{
    // 仅供测试使用的文件路径
    const string fFullnameStr = @"C:\Users\oscar\Dropbox\VS\stackoverflow\VBA\Naive Bayes classifier.docx";
    Word.Application wordApplication = new Word.Application();
    wordApplication.DisplayAlerts = WdAlertLevel.wdAlertsNone;
    wordApplication.Visible = true; // 仅供测试查看
    Word.Document doc = wordApplication.Documents.Open(fFullnameStr);

    // 其他部分省略...
}

这是一种逻辑上的必要性,即在保护类型为 wdAllowOnlyFormFields 时,"Find" 对象无法执行搜索操作。我认为这是因为 "Find" 对象类不仅仅是一个查找类,还包括替换(编辑)功能。要么您需要取消保护,要么更改保护方式,要么选择使用当前的备选方案,上面的代码中都有相应的条件流程。除了使用 "foreach paragraph" 方法来定位,您还可以考虑使用正则表达式来实现这一点。无论使用哪种方法,都必须对隐藏文本(如 "Fields" 的代码文本)进行适当处理,以获得准确的结果。

.csproj 文件:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net6.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="NetOfficeFw.Core" Version="1.9.3" />
    <PackageReference Include="NetOfficeFw.Word" Version="1.9.3" />
  </ItemGroup>

  <ItemGroup>
    <FrameworkReference Include="Microsoft.WindowsDesktop.App.WindowsForms" />
  </ItemGroup>

</Project>

使用 ShowFieldCodes 来实现:

void Test_ShowFieldCodes()
{
    // 仅供测试使用的文件路径
    const string fFullnameStr = @"C:\Users\oscar\Dropbox\VS\VBA\stackoverflow.docm";
    Word.Application wordApplication = new Word.Application();
    wordApplication.DisplayAlerts = WdAlertLevel.wdAlertsNone;
    //wordApplication.Visible = true; // 仅供测试查看
    Word.Document doc = wordApplication.Documents.Open(fFullnameStr);

    int i = 0;
    var searchText = "smth text";
    var bookmarkName = "newBookmark";

    Word.Range rng = doc.Content;

    if (doc.ProtectionType != WdProtectionType.wdAllowOnlyFormFields)
    {
        while (rng.Find.Execute(findText: searchText, matchCase: true, matchWholeWord: true, matchWildcards: false,
                matchSoundsLike: false, matchAllWordForms: false, forward: true, wrap: WdFindWrap.wdFindStop))
        {
            if ((bool)rng.Information(WdInformation.wdInContentControl))
                rng.SetRange(rng.Paragraphs[1].Range.ContentControls[1].Range.End + 1,
                    rng.Paragraphs[1].Range.ContentControls[1].Range.End + 1);
            rng.Bookmarks.Add(bookmarkName + i++.ToString());
        }
    }
    else
    {
        // 其他部分省略...
    }
    
    // 其他部分省略...
}

最后,根据您提供的信息,您的文件中没有字段(Fields),而是包含大量的 "ContentControl"。感谢提供的文件链接,已经在代码中进行了更新。

英文:

It should be that hidden text like Field's code text in it results in this problem. Whatever NetOffice, Microsoft.Office.Interop.Word or VBA, etc.
You can try my code first. Although it's not a perfect solution so far, notice this block snippet:

if (range.Text != searchText)
                {

                    Console.WriteLine(range.Text);
                    System.Diagnostics.Debugger.Break();
                }

at least it points the way to debugging, knowing what the problem is. You can follow this direction for further refinement.

using NetOffice.WordApi.Enums;
using Word = NetOffice.WordApi;

Test();

//The following code applies only to the content( main body) of the document itself and does not include the footnote, comments, header, footer ......, and other parts of the document.
void Test()
{
    //just test file for me
    //const string fFullnameStr = @&quot;C:\Users\oscar\Dropbox\VS\VBA\stackoverflow.docm&quot;;
    const string fFullnameStr = @&quot;C:\Users\oscar\Dropbox\VS\stackoverflow\VBA\Naive Bayes classifier.docx&quot;;
    Word.Application wordApplication = new Word.Application();
    wordApplication.DisplayAlerts = WdAlertLevel.wdAlertsNone;
    wordApplication.Visible = true; //just for test to watch
    Word.Document doc = wordApplication.Documents.Open(fFullnameStr);//Context.WordDocument;

    /* for test
    if(doc.ProtectionType!= WdProtectionType.wdAllowOnlyFormFields)
        Console.WriteLine(doc.ProtectionType);
    doc.Close();
    doc.Protect(WdProtectionType.wdAllowOnlyFormFields);
    just for test */
    int i = 0;

    //var searchText = &quot;smth text&quot;;
    // https://github.com/Aldman/ProtectedRangeSearch/blob/main/FindTextTests.cs#L15
    var searchText = &quot;based on a common&quot;;//&quot;diameter features&quot;;//&quot;based on a common&quot;;//&quot;assume that the value&quot;;
    var bookmarkName = &quot;newBookmark&quot;;

    Word.Range rng = doc.Content;//doc.Content.Duplicate;

    if (doc.ProtectionType != WdProtectionType.wdAllowOnlyFormFields)
    {
        if (doc.ActiveWindow.View.ShowFieldCodes)
            doc.ActiveWindow.View.ShowFieldCodes = false;
        while (rng.Find.Execute(findText: searchText, matchCase: true, matchWholeWord: true, matchWildcards: false,
                matchSoundsLike: false, matchAllWordForms: false, forward: true, wrap: WdFindWrap.wdFindStop))
        {
            rng.Bookmarks.Add(bookmarkName + i++.ToString()); //rng.Select();//just for test
        }

    }
    else
    {
        foreach (var paragraph in rng.Paragraphs)//http://msdn.microsoft.com/en-us/en-us/Iibrary/office/ff837006.aspx 轉址為:https://learn.microsoft.com/en-us/office/vba/api/Word.Range.Paragraphs
        {
            Word.Range range = paragraph.Range;
            var text = range.Text;
            var index = text.IndexOf(searchText); int indexPre = index;
            var start = 0;


            #region GetParagraphTextWithHiddenSymbols
            foreach (Word.Field item in range.Fields)
            {

                index = text.IndexOf(item.Result.Text, start);
                if (index &gt;= 0)
                {
                    text = text.Substring(0, index) + &quot;{&quot; + item.Code.Text + &quot;}&quot; + item.Result.Text + ((char)21).ToString()
                        + text.Substring(index + item.Result.Text.Length);
                    start = (text.Substring(0, index) + &quot;{&quot; + item.Code.Text + &quot;}&quot; + item.Result.Text + ((char)21).ToString()).Length;
                }
                //text = text.Replace(item.Result.Text, 
                //&quot;{&quot; +item.Code.Text+&quot;}&quot;+ item.Result.Text + (char)21);
                //fieldsResultLength += item.Result.Text.Length + 2 + 1;//2=&quot;{}&quot; of field code,1=chr(21) placehold of the fields
            }

            start = 0;
            //there will be &quot;&quot; both the start and end of a ContentControl object, so have to plus 2 for the two placeholders
            foreach (Word.ContentControl item in range.ContentControls)
            {
                text = text.Substring(start, item.Range.Start - 1) + &quot; &quot; + item.Range.Text + &quot; &quot; + text.Substring(item.Range.End - 1);
            }
            #endregion


            while (index &gt;= 0)
            {

                index = text.IndexOf(searchText);

                start = range.Start;
                var end = range.End;

                start += index; //+ fieldsResultLength;
                end = start + searchText.Length;
                range.SetRange(start, end);

                while (range.Text != searchText &amp;&amp; end &lt;= range.End)
                {
                    range.SetRange(++start, ++end);
                    if (range.Text == searchText) break;
                }

                if (range.Text != searchText)
                {
                    Console.WriteLine(range.Text);
                    System.Diagnostics.Debugger.Break();
                }

                range.Bookmarks.Add(bookmarkName + i++.ToString());

                text = paragraph.Range.Text; start = 0;
                index = text.IndexOf(searchText, indexPre + 1);
                indexPre = index;
            }
        }
    }


    wordApplication.Visible = true; //just for test to watch
    doc.ActiveWindow.View.ReadingLayout = false;//just for test to watch
    if (doc.ProtectionType != WdProtectionType.wdNoProtection)
        doc.Unprotect(123.ToString());//just for test

}

It is a logical necessity that Find objects cannot execute searched when the protection type is like this wdAllowOnlyFormFields. I think it's because the Find object class is not just a find class, but also includes a replace (edit) facility. Either you need to unprotect it, or change the way it is protected, or choose to use the current alternative, both of which I have conditioned flows in the code above. In addition to using this foreach paragraph approach to locate, you can also consider using a regular expression to achieve this. No matter which method you use, you have to do proper processing of the hidden text such as Fields' code text in order to get accurate results.

  • .csproj file:
&lt;Project Sdk=&quot;Microsoft.NET.Sdk&quot;&gt;

  &lt;PropertyGroup&gt;
    &lt;OutputType&gt;Exe&lt;/OutputType&gt;
    &lt;TargetFramework&gt;net6.0&lt;/TargetFramework&gt;
    &lt;ImplicitUsings&gt;enable&lt;/ImplicitUsings&gt;
    &lt;Nullable&gt;enable&lt;/Nullable&gt;
  &lt;/PropertyGroup&gt;

  &lt;ItemGroup&gt;
    &lt;PackageReference Include=&quot;NetOfficeFw.Core&quot; Version=&quot;1.9.3&quot; /&gt;
    &lt;PackageReference Include=&quot;NetOfficeFw.Word&quot; Version=&quot;1.9.3&quot; /&gt;
  &lt;/ItemGroup&gt;

  &lt;ItemGroup&gt;
    &lt;FrameworkReference Include=&quot;Microsoft.WindowsDesktop.App.WindowsForms&quot; /&gt;
  &lt;/ItemGroup&gt;

&lt;/Project&gt;

void Test_ShowFieldCodes()
{
    //just test file for me
    const string fFullnameStr = @&quot;C:\Users\oscar\Dropbox\VS\VBA\stackoverflow.docm&quot;;
    Word.Application wordApplication = new Word.Application();
    wordApplication.DisplayAlerts = WdAlertLevel.wdAlertsNone;
    //wordApplication.Visible = true; //just for test to watch
    Word.Document doc = wordApplication.Documents.Open(fFullnameStr);//Context.WordDocument;


    int i = 0;
    var searchText = &quot;smth text&quot;;
    var bookmarkName = &quot;newBookmark&quot;;

    Word.Range rng = doc.Content;//doc.Content.Duplicate;

    if (doc.ProtectionType != WdProtectionType.wdAllowOnlyFormFields)
    {

        while (rng.Find.Execute(findText: searchText, matchCase: true, matchWholeWord: true, matchWildcards: false,
                matchSoundsLike: false, matchAllWordForms: false, forward: true, wrap: WdFindWrap.wdFindStop))
        {

            if ((bool)rng.Information(WdInformation.wdInContentControl))
                rng.SetRange(rng.Paragraphs[1].Range.ContentControls[1].Range.End + 1,
                    rng.Paragraphs[1].Range.ContentControls[1].Range.End + 1);
            rng.Bookmarks.Add(bookmarkName + i++.ToString());
        }

    }
    else
    {        //rng = doc.Content.Duplicate;
        foreach (var paragraph in rng.Paragraphs)//http://msdn.microsoft.com/en-us/en-us/Iibrary/office/ff837006.aspx 轉址為:https://learn.microsoft.com/en-us/office/vba/api/Word.Range.Paragraphs
        {
            Word.Range range = paragraph.Range;
            var text = range.Text;
            var index = text.IndexOf(searchText); int indexPre = 0;
            var start = 0;

            while (index &gt;= 0)
            {

                if (paragraph.Range.Fields.Count &gt; 0)
                {

                    doc.ActiveWindow.View.ShowFieldCodes = true;
                    text = paragraph.Range.Text;
                    //if there are fields this will be the index of ShowFieldCodes=false + index of ShowFieldCodes=true and plus 1
                    index = index + text.IndexOf(searchText, indexPre) + 1;
                    doc.ActiveWindow.View.ShowFieldCodes = false;
                }

                start = range.Start;
                var end = range.End;

                start += index;
                end = start + searchText.Length;
                range.SetRange(start, end);

                while (range.Text != searchText &amp;&amp; end &lt;= range.End &amp;&amp; range.End &lt; doc.Content.End - 1)
                {
                    //range.Select();//just for test
                    range.SetRange(++start, ++end);
                    if (range.Text == searchText) break;
                }

                if (range.Text != searchText &amp;&amp; range.End &lt; doc.Content.End - 1)
                {
                    Console.WriteLine(range.Text);
                    System.Diagnostics.Debugger.Break();
                }

                if (range.Text == searchText)
                {
                    if ((bool)range.Information(WdInformation.wdInContentControl))
                        range.SetRange(range.Paragraphs[1].Range.ContentControls[1].Range.End + 1,
                            range.Paragraphs[1].Range.ContentControls[1].Range.End + 1);
                    range.Bookmarks.Add(bookmarkName + i++.ToString());
                }
                text = paragraph.Range.Text; start = 0;
                index = text.IndexOf(searchText, indexPre + 1);
                indexPre = index;
            }
        }
    }


    wordApplication.Visible = true; //just for test to watch
    //doc.Unprotect(1.ToString());//just for test

}

20230712 ContentControls ,either

So the answer is in your file there is no field in it, and all of the file it has is plenty of ContentControl not Fields! ActiveDocument.ContentControls.Count is 3. ActiveDocument.Fields.Count is 0.
The new code is updated above.

huangapple
  • 本文由 发表于 2023年7月10日 21:09:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76654086.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定