在R中,直到出现另一个文本时,请在相邻的空变量中填入文本。

huangapple go评论83阅读模式
英文:

Fill in text in adjacent blank variables till another text appears in R

问题

我试图将文本粘贴到填充空变量的位置,直到出现另一段文本。我要为特定行执行此操作。

当前表格:

var1 var2 var3 var3 var4
A textA textB
B 1 2 3 4
c 3 4 5 6

期望输出:

var1 var2 var3 var3 var4
A textA textA textB textB
B 1 2 3 4
c 3 4 5 6

有没有一种优雅的方法来实现这个?我的当前解决方案看起来有点像下面这样,但我想使用逻辑而不是指定变量名称:

  1. mutate(var3 = case_when(var1 == "A" & is.na(var3) ~ var2))
英文:

I'm trying to paste texts over fill in empty variables till another text appears. I want to do this for a specific row

Current table:

var1 var2 var3 var3 var4
A textA textB
B 1 2 3 4
c 3 4 5 6

Desired output:

var1 var2 var3 var3 var4
A textA textA textB textB
B 1 2 3 4
c 3 4 5 6

What's an elegant way to do this? My current solution looks something like this but I'd like to use a logic instead of specifying a variable name like below:

  1. mutate(var3=case_when(var1=="A" & is.na(var3) ~ var2))

答案1

得分: 4

以下是代码的翻译部分:

  1. # 使用 zoo 包提取 'var1' 为 "A" 的行,然后使用 na.locf0 函数替换NA值为前一个非NA值
  2. library(zoo)
  3. i1 <- df1$var1 == "A"
  4. df1[i1,-1] <- na.locf0(unlist(df1[i1,-1]))

输出:

  1. df1
  2. var1 var2 var3 var3.1 var4
  3. 1 A textA textA textB textB
  4. 2 B 1 2 3 4
  5. 3 c 3 4 5 6

或者使用基本的 R 方法,创建一个基于非NA元素的数值索引(使用 cumsum 函数),然后使用索引复制提取行中的非NA值:

  1. v1 <- unlist(df1[i1, -1])
  2. df1[i1, -1] <- na.omit(v1)[cumsum(!is.na(v1))]

或者使用 tidyverse 包,将数据重塑为 'long' 格式(使用 pivot_longer 函数),然后使用 fill 函数替换NA值为前一个非NA值,最后使用 pivot_wider 函数将数据重新转换为 'wide' 格式:

  1. library(dplyr)
  2. library(tidyr)
  3. df1 %>%
  4. pivot_longer(cols = -var1, values_transform = as.character) %>%
  5. fill(value) %>%
  6. pivot_wider(names_from = name, values_from = value)

如果只有交替的NA值,也可以考虑以下选项:

  1. library(dplyover)
  2. df1 %>%
  3. mutate(across2(c(3, 5), c(2, 4),
  4. ~ case_match(.x, NA ~ .y, .default = as.character(.x)),
  5. .names = "{xcol}"))

这些代码部分用于处理给定的数据框 df1 中的值和NA值。

英文:

We may extract the row where 'var1' is "A", unlist and apply na.locf0 from zoo to replace the NA values with the previous non-NA value

  1. library(zoo)
  2. i1 &lt;- df1$var1 == &quot;A&quot;
  3. df1[i1,-1] &lt;- na.locf0(unlist(df1[i1,-1]))

-output

  1. df1
  2. var1 var2 var3 var3.1 var4
  3. 1 A textA textA textB textB
  4. 2 B 1 2 3 4
  5. 3 c 3 4 5 6

Or with base R, create a numeric index based on the non-NA element (cumsum) and use the index to replicate the non-NA values from the extracted row

  1. v1 &lt;- unlist(df1[i1, -1])
  2. df1[i1, -1] &lt;- na.omit(v1)[cumsum(!is.na(v1))]

Or use tidyverse, to reshape to 'long' format (pivot_longer), apply fill to replace NA with previous non-NA and reshape back to wide with pivot_wider

  1. library(dplyr)
  2. library(tidyr)
  3. df1 %&gt;%
  4. pivot_longer(cols = -var1, values_transform = as.character) %&gt;%
  5. fill(value) %&gt;%
  6. pivot_wider(names_from = name, values_from = value)
  7. # A tibble: 3 &#215; 5
  8. var1 var2 var3 var3.1 var4
  9. &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  10. 1 A textA textA textB textB
  11. 2 B 1 2 3 4
  12. 3 c 3 4 5 6

If there are only alternate NAs, an option is also

  1. library(dplyover)
  2. df1 %&gt;%
  3. mutate(across2(c(3, 5), c(2, 4),
  4. ~ case_match(.x, NA ~ .y, .default = as.character(.x)),
  5. .names = &quot;{xcol}&quot;))

-output

  1. var1 var2 var3 var3.1 var4
  2. 1 A textA textA textB textB
  3. 2 B 1 2 3 4
  4. 3 c 3 4 5 6

data

  1. df1 &lt;- structure(list(var1 = c(&quot;A&quot;, &quot;B&quot;, &quot;c&quot;), var2 = c(&quot;textA&quot;, &quot;1&quot;,
  2. &quot;3&quot;), var3 = c(NA, 2L, 4L), var3.1 = c(&quot;textB&quot;, &quot;3&quot;, &quot;5&quot;), var4 = c(NA,
  3. 4L, 6L)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))

答案2

得分: 1

以下是代码部分的翻译:

  1. library(dplyr)
  2. df %>%
  3. mutate(var3 = ifelse(var3=="", var2, var3),
  4. var4 = ifelse(var4=="", var3.1, var4))

翻译结果如下:

  1. var1 var2 var3 var3.1 var4
  2. <chr> <chr> <chr> <chr> <chr>
  3. 1 A textA textA textB textB
  4. 2 B 1 2 3 4
  5. 3 c 3 4 5 6
英文:

Here is an option, but only for a few columns:

  1. library(dplyr)
  2. df %&gt;%
  3. mutate(var3 = ifelse(var3==&quot;&quot;, var2, var3),
  4. var4 = ifelse(var4==&quot;&quot;, var3.1, var4))
  1. var1 var2 var3 var3.1 var4
  2. &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  3. 1 A textA textA textB textB
  4. 2 B 1 2 3 4
  5. 3 c 3 4 5 6

huangapple
  • 本文由 发表于 2023年2月9日 01:48:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75389812.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定