多个匹配项的光标位置(R中的Officer包)

huangapple go评论123阅读模式
英文:

Cursor location for multiple matches (Officer Package in R)

问题

我正在尝试使用R中的Officer包来识别要插入/替换项目的确切位置

例如:

cursor_reach(document, "Adverse Events")

只会找到第一个不良事件的文字匹配。

所以,如果我有关于"Adverse Events"的文本,光标将指向第一个。但我真正想做的是找到带有"Adverse Events"的标题,以便我可以在此标题之后插入一些自动化文本。

如果在标题之前出现不良事件这个词,似乎没有办法将光标移动到标题之后?

谢谢!

我尝试了这个

cursor_reach(document, "Adverse Events")

cursor_reach(document, "Adverse Events")

但这不起作用...

英文:

I am trying to identify the exact location to insert/replace items using Officer package in R

For example:

cursor_reach(document, "Adverse Events")

would find the first adverse event wording match only.

So if I had a text about Adverse Events, the cursor will point to that first. But what I really want is to find the heading with "Adverse Events" and so I can insert some automated text after this heading.

There does not seem to be a way to move cursor after a heading if the word adverse event appears before the heading?

Thanks!

I tried this

cursor_reach(document, "Adverse Events")

cursor_reach(document, "Adverse Events")

But this does not work...

答案1

得分: 1

你可以直接使用以下代码来设置光标位置:

# 假设你的rdocx对象叫做document,期望的光标位置是3
document$officer_cursor$which <- 3

以下是一个用于确定期望光标位置的包装函数示例:

require(dplyr)
require(xml2)

cursor_reach_list <- function(x, keyword) {
  
  nodes_with_text <- xml_find_all(x$doc_obj$get(), "/w:document/w:body/*|/w:ftr/*|/w:hdr/*")
  if (length(nodes_with_text) < 1) {
    stop("no text found in the document", call. = FALSE)
  }
  text_ <- xml_text(nodes_with_text)
  test_ <- grepl(pattern = keyword, x = text_)
  if (!any(test_)) {
    stop(keyword, " has not been found in the document", 
         call. = FALSE)
  }

  # 获取每个段落的段落样式
  style_ <- unlist(sapply(nodes_with_text,
                          function(x) {
                            ss <- xml_find_all(x, ".//w:pStyle")
                            if(length(ss) == 0) return("")
                            xml_attr(ss, "val", default = "")
                          }))

  # 将结果放入表格
  result <- data.frame(para = seq_along(text_),
                       keyword.found = test_,
                       style_id = style_) %>%
    left_join(styles_info(x) %>%
                filter(style_type == "paragraph") %>%
                select(style_id, style_name),
              by = "style_id") %>%
    select(-style_id)

  print(result)
}

使用一个简单的文档进行演示:

# 创建一个用于测试的简单文档
doc <- read_docx()
doc <- body_add_par(doc, "A paragraph of normal text that contains the keywords Adverse Events, and precedes any heading.")
doc <- body_add_par(doc, "Some other text.")
doc <- body_add_par(doc, "Header Adverse Events", style = "heading 1")
doc <- body_add_par(doc, "Another paragraph after the header, to beef up the document.")
print(doc, "temp_file.docx")
rm(doc)

# 加载文档并使用cursor_reach_list来确定期望位置
doc <- read_docx("temp_file.docx")
cursor_reach_list(doc, "Adverse Events")

# 结果:
#  para keyword.found style_name
#1    1          TRUE     Normal
#2    2         FALSE     Normal
#3    3          TRUE  heading 1
#4    4         FALSE     Normal
#5    5         FALSE           

# 段落1和3都包含关键词,但段落1的样式是Normal,而段落3的样式是heading 1。

# 将光标移动到段落3
doc$officer_cursor$which <- 3 

# 在标题后插入文本
doc <- body_add_par(doc, "additional text in next line", pos = "after")

# 将结果保存到不同位置以便验证
print(doc, "temp_file1.docx")

我不熟悉你的实际用例,因此实际的光标位置更改和新文本的插入需要在确定光标适当位置后手动执行。根据你的需求,你可能可以创建一个包装函数来自动化所有这些操作。

英文:

You can set your cursor location directly with the following:

# assuming your rdocx object is called document and the 
# desired cursor location is 3

document$officer_cursor$which &lt;- 3

Here's a wrapper function for figuring out where the desired cursor location should be:

require(dplyr)
require(xml2)

cursor_reach_list &lt;- function(x, keyword) {
  
  nodes_with_text &lt;- xml_find_all(x$doc_obj$get(), &quot;/w:document/w:body/*|/w:ftr/*|/w:hdr/*&quot;)
  if (length(nodes_with_text) &lt; 1) {
    stop(&quot;no text found in the document&quot;, call. = FALSE)
  }
  text_ &lt;- xml_text(nodes_with_text)
  test_ &lt;- grepl(pattern = keyword, x = text_)
  if (!any(test_)) {
    stop(keyword, &quot; has not been found in the document&quot;, 
         call. = FALSE)
  }
  # note: everything above was taken directly from officer&#39;s cursor_reach function

  # get the paragraph style associated with each paragraph
  style_ &lt;- unlist(sapply(nodes_with_text,
                          function(x) {
                            ss &lt;- xml_find_all(x, &quot;.//w:pStyle&quot;)
                            if(length(ss) == 0) return(&quot;&quot;)
                            xml_attr(ss, &quot;val&quot;, default = &quot;&quot;)
                          }))

  # put the results in a table
  result &lt;- data.frame(para = seq_along(text_),
                       keyword.found = test_,
                       style_id = style_) %&gt;%
    left_join(styles_info(x) %&gt;%
                filter(style_type == &quot;paragraph&quot;) %&gt;%
                select(style_id, style_name),
              by = &quot;style_id&quot;) %&gt;%
    select(-style_id)

  print(result)
}

Demonstration with a simple document:


# create simple document for testing #####
doc &lt;- read_docx()
doc &lt;- body_add_par(doc, &quot;A paragraph of normal text that contains the keywords Adverse Events, and precedes any heading.&quot;)
doc &lt;- body_add_par(doc, &quot;Some other text.&quot;)
doc &lt;- body_add_par(doc, &quot;Header Adverse Events&quot;, style = &quot;heading 1&quot;)
doc &lt;- body_add_par(doc, &quot;Another paragraph after the header, to beef up the document.&quot;)
print(doc, &quot;temp_file.docx&quot;)
rm(doc)

# load document &amp; use cursor_reach_list to identify desired location #####

doc &lt;- read_docx(&quot;temp_file.docx&quot;)
cursor_reach_list(doc, &quot;Adverse Events&quot;)

# Result:
#  para keyword.found style_name
#1    1          TRUE     Normal
#2    2         FALSE     Normal
#3    3          TRUE  heading 1
#4    4         FALSE     Normal
#5    5         FALSE           

# Both paragraphs 1 &amp; 3 contain the keywords, but para 1 follows Normal style
# while para 3 doesn&#39;t.

# move cursor to para 3
doc$officer_cursor$which &lt;- 3 

# insert text after heading
doc &lt;- body_add_par(doc, &quot;additional text in next line&quot;, pos = &quot;after&quot;)

# save result to different location for ease of verification
print(doc, &quot;temp_file1.docx&quot;)

I'm not familiar with your actual use case, so the actual changing of cursor location and insertion of new text are left as manual actions after ascertaining the appropriate location for the cursor. You can probably automate everything in a wrapper function, based on your needs.

huangapple
  • 本文由 发表于 2023年3月15日 18:19:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75743350.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定