2023年3月15日 18:19:03go评论128阅读模式

英文:

Cursor location for multiple matches (Officer Package in R)

问题

我正在尝试使用R中的Officer包来识别要插入/替换项目的确切位置

例如：

cursor_reach(document, "Adverse Events")

只会找到第一个不良事件的文字匹配。

所以，如果我有关于"Adverse Events"的文本，光标将指向第一个。但我真正想做的是找到带有"Adverse Events"的标题，以便我可以在此标题之后插入一些自动化文本。

如果在标题之前出现不良事件这个词，似乎没有办法将光标移动到标题之后？

谢谢！

我尝试了这个

cursor_reach(document, "Adverse Events")

cursor_reach(document, "Adverse Events")

但这不起作用...

英文:

I am trying to identify the exact location to insert/replace items using Officer package in R

For example:

cursor_reach(document, &quot;Adverse Events&quot;)

would find the first adverse event wording match only.

So if I had a text about Adverse Events, the cursor will point to that first. But what I really want is to find the heading with "Adverse Events" and so I can insert some automated text after this heading.

There does not seem to be a way to move cursor after a heading if the word adverse event appears before the heading?

Thanks!

I tried this

cursor_reach(document, &quot;Adverse Events&quot;)

cursor_reach(document, &quot;Adverse Events&quot;)

But this does not work...

答案1

得分: 1

你可以直接使用以下代码来设置光标位置：

# 假设你的rdocx对象叫做document，期望的光标位置是3
document$officer_cursor$which <- 3

以下是一个用于确定期望光标位置的包装函数示例：

require(dplyr)
require(xml2)

cursor_reach_list <- function(x, keyword) {
  
  nodes_with_text <- xml_find_all(x$doc_obj$get(), "/w:document/w:body/*|/w:ftr/*|/w:hdr/*")
  if (length(nodes_with_text) < 1) {
    stop("no text found in the document", call. = FALSE)
  }
  text_ <- xml_text(nodes_with_text)
  test_ <- grepl(pattern = keyword, x = text_)
  if (!any(test_)) {
    stop(keyword, " has not been found in the document", 
         call. = FALSE)
  }

  # 获取每个段落的段落样式
  style_ <- unlist(sapply(nodes_with_text,
                          function(x) {
                            ss <- xml_find_all(x, ".//w:pStyle")
                            if(length(ss) == 0) return("")
                            xml_attr(ss, "val", default = "")
                          }))

  # 将结果放入表格
  result <- data.frame(para = seq_along(text_),
                       keyword.found = test_,
                       style_id = style_) %>%
    left_join(styles_info(x) %>%
                filter(style_type == "paragraph") %>%
                select(style_id, style_name),
              by = "style_id") %>%
    select(-style_id)

  print(result)
}

使用一个简单的文档进行演示：

# 创建一个用于测试的简单文档
doc <- read_docx()
doc <- body_add_par(doc, "A paragraph of normal text that contains the keywords Adverse Events, and precedes any heading.")
doc <- body_add_par(doc, "Some other text.")
doc <- body_add_par(doc, "Header Adverse Events", style = "heading 1")
doc <- body_add_par(doc, "Another paragraph after the header, to beef up the document.")
print(doc, "temp_file.docx")
rm(doc)

# 加载文档并使用cursor_reach_list来确定期望位置
doc <- read_docx("temp_file.docx")
cursor_reach_list(doc, "Adverse Events")

# 结果：
#  para keyword.found style_name
#1    1          TRUE     Normal
#2    2         FALSE     Normal
#3    3          TRUE  heading 1
#4    4         FALSE     Normal
#5    5         FALSE           

# 段落1和3都包含关键词，但段落1的样式是Normal，而段落3的样式是heading 1。

# 将光标移动到段落3
doc$officer_cursor$which <- 3 

# 在标题后插入文本
doc <- body_add_par(doc, "additional text in next line", pos = "after")

# 将结果保存到不同位置以便验证
print(doc, "temp_file1.docx")

我不熟悉你的实际用例，因此实际的光标位置更改和新文本的插入需要在确定光标适当位置后手动执行。根据你的需求，你可能可以创建一个包装函数来自动化所有这些操作。

英文:

You can set your cursor location directly with the following:

# assuming your rdocx object is called document and the 
# desired cursor location is 3

document$officer_cursor$which &lt;- 3

Here's a wrapper function for figuring out where the desired cursor location should be:

require(dplyr)
require(xml2)

cursor_reach_list &lt;- function(x, keyword) {
  
  nodes_with_text &lt;- xml_find_all(x$doc_obj$get(), &quot;/w:document/w:body/*|/w:ftr/*|/w:hdr/*&quot;)
  if (length(nodes_with_text) &lt; 1) {
    stop(&quot;no text found in the document&quot;, call. = FALSE)
  }
  text_ &lt;- xml_text(nodes_with_text)
  test_ &lt;- grepl(pattern = keyword, x = text_)
  if (!any(test_)) {
    stop(keyword, &quot; has not been found in the document&quot;, 
         call. = FALSE)
  }
  # note: everything above was taken directly from officer&#39;s cursor_reach function

  # get the paragraph style associated with each paragraph
  style_ &lt;- unlist(sapply(nodes_with_text,
                          function(x) {
                            ss &lt;- xml_find_all(x, &quot;.//w:pStyle&quot;)
                            if(length(ss) == 0) return(&quot;&quot;)
                            xml_attr(ss, &quot;val&quot;, default = &quot;&quot;)
                          }))

  # put the results in a table
  result &lt;- data.frame(para = seq_along(text_),
                       keyword.found = test_,
                       style_id = style_) %&gt;%
    left_join(styles_info(x) %&gt;%
                filter(style_type == &quot;paragraph&quot;) %&gt;%
                select(style_id, style_name),
              by = &quot;style_id&quot;) %&gt;%
    select(-style_id)

  print(result)
}

Demonstration with a simple document:


# create simple document for testing #####
doc &lt;- read_docx()
doc &lt;- body_add_par(doc, &quot;A paragraph of normal text that contains the keywords Adverse Events, and precedes any heading.&quot;)
doc &lt;- body_add_par(doc, &quot;Some other text.&quot;)
doc &lt;- body_add_par(doc, &quot;Header Adverse Events&quot;, style = &quot;heading 1&quot;)
doc &lt;- body_add_par(doc, &quot;Another paragraph after the header, to beef up the document.&quot;)
print(doc, &quot;temp_file.docx&quot;)
rm(doc)

# load document &amp; use cursor_reach_list to identify desired location #####

doc &lt;- read_docx(&quot;temp_file.docx&quot;)
cursor_reach_list(doc, &quot;Adverse Events&quot;)

# Result:
#  para keyword.found style_name
#1    1          TRUE     Normal
#2    2         FALSE     Normal
#3    3          TRUE  heading 1
#4    4         FALSE     Normal
#5    5         FALSE           

# Both paragraphs 1 &amp; 3 contain the keywords, but para 1 follows Normal style
# while para 3 doesn&#39;t.

# move cursor to para 3
doc$officer_cursor$which &lt;- 3 

# insert text after heading
doc &lt;- body_add_par(doc, &quot;additional text in next line&quot;, pos = &quot;after&quot;)

# save result to different location for ease of verification
print(doc, &quot;temp_file1.docx&quot;)

I'm not familiar with your actual use case, so the actual changing of cursor location and insertion of new text are left as manual actions after ascertaining the appropriate location for the cursor. You can probably automate everything in a wrapper function, based on your needs.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

多个匹配项的光标位置（R中的Officer包）

问题

答案1

data.table如何在滚动连接中定义“nearest”？

在R中对具有两个虚拟变量的随机效应模型进行事后检验。

从不同的数据框中根据group_by函数的运行值获取数值。

LaTeX在R Markdown中表格列名中的应用

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论