tidyr::pivot_longer() 在 R 中出现重复问题,但似乎没有重复的列名或数据集。

huangapple go评论275阅读模式
英文:

tidyr::pivot_longer() with duplicate problems with no apparent duplicate column names or dataset in R

问题

我的目标是将值99999更改为其相邻的值,除非再次出现99999。
我之前从这里得到了建议,现在我遇到了一个新问题。

MRE:
'as'是一个包含9个不同的队列数据集的数据框;共有10030个观测和7060个变量。我主要(目前)正在处理as$AS1_WEIGHTas$AS9_WEIGHT...

  1. > as %>%
  2. + select(starts_with("AS") & ends_with("_WEIGHT")) %>%
  3. + head() %>%
  4. + dput()

结构如下:

  1. structure(list(AS1_WEIGHT = c(72, 59, 50, 55.2, 82.1, 50.4),
  2. AS2_WEIGHT = c(74.8, NA, NA, 54.8, 84.5, 52.5), AS3_WEIGHT = c(75.2,
  3. NA, NA, 55.9, 81.7, 54.6), AS4_WEIGHT = c(75, NA, NA, 55.1,
  4. 80.6, NA), AS5_WEIGHT = c(75.4, NA, NA, 58.8, 89.5, NA),
  5. AS6_WEIGHT = c(77.3, NA, NA, NA, NA, NA), AS7_WEIGHT = c(70.7,
  6. NA, NA, 56, NA, NA), AS8_WEIGHT = c(73.8, NA, NA, 55.5, NA,
  7. NA), AS9_WEIGHT = c(74.5, NA, NA, 54.8, NA, 52)), row.names = c(NA,
  8. -6L), class = c("tbl_df", "tbl", "data.frame"))
  1. as %>%
  2. mutate(row = row_number()) %>%
  3. tidyr::pivot_longer(starts_with("AS") & ends_with("_WEIGHT")) %>%
  4. mutate(value = if_else(value == '99999', lead(value), value), .by = row) %>%
  5. pivot_wider(names_from = name, values_from = value)

返回错误:

tidyr::pivot_longer() 中的错误:
!名称必须唯一。
✖ 这些名称重复出现:

  • "name" 位于位置7049和7053。
  • "value" 位于位置7050和7054。
    ℹ 使用参数 names_repair 来指定修复策略。
    运行 rlang::last_trace() 查看错误发生的位置。

所以我运行了以下代码来查看哪些列是重复的:

  1. > dup_col <- duplicated(base::as.list(as))
  2. colnames(as[dup_col])
  3. character(0)

我运行了另一个代码来查看我是否引用了正确的列:

  1. > as %>%
  2. select(starts_with("AS") & ends_with("_WEIGHT")) %>%
  3. colnames()
  4. [1] "AS1_WEIGHT" "AS2_WEIGHT" "AS3_WEIGHT" "AS4_WEIGHT" "AS5_WEIGHT" "AS6_WEIGHT" "AS7_WEIGHT" "AS8_WEIGHT"
  5. [9] "AS9_WEIGHT"

提前感谢您!

英文:

tidyr::pivot_longer() 在 R 中出现重复问题,但似乎没有重复的列名或数据集。My goal is to change value 99999 with the value adjacent to it unless it's 99999 again.
I took the advice from here before, now I am having a new problem.

MRE:
'as' is a dataframe with 9 different cohort datasets; 10030 obs of 7060 variables. I am mainly (as of now) dealing with as$AS1_WEIGHT ... as$AS9_WEIGHT

  1. &gt; as %&gt;%
  2. + select(starts_with(&quot;AS&quot;) &amp; ends_with(&quot;_WEIGHT&quot;)) %&gt;% head() %&gt;% dput()
  3. structure(list(AS1_WEIGHT = c(72, 59, 50, 55.2, 82.1, 50.4),
  4. AS2_WEIGHT = c(74.8, NA, NA, 54.8, 84.5, 52.5), AS3_WEIGHT = c(75.2,
  5. NA, NA, 55.9, 81.7, 54.6), AS4_WEIGHT = c(75, NA, NA, 55.1,
  6. 80.6, NA), AS5_WEIGHT = c(75.4, NA, NA, 58.8, 89.5, NA),
  7. AS6_WEIGHT = c(77.3, NA, NA, NA, NA, NA), AS7_WEIGHT = c(70.7,
  8. NA, NA, 56, NA, NA), AS8_WEIGHT = c(73.8, NA, NA, 55.5, NA,
  9. NA), AS9_WEIGHT = c(74.5, NA, NA, 54.8, NA, 52)), row.names = c(NA,
  10. -6L), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;))
  1. as %&lt;&gt;%
  2. mutate(row = row_number()) %&gt;%
  3. tidyr::pivot_longer(starts_with(&quot;AS&quot;) &amp; ends_with(&quot;_WEIGHT&quot;)) %&gt;%
  4. mutate(value = if_else(value == &#39;99999&#39;, lead(value), value), .by = row) %&gt;%
  5. pivot_wider(names_from = name, values_from = value)

returns:

Error in tidyr::pivot_longer():
! Names must be unique.
✖ These names are duplicated:

  • "name" at locations 7049 and 7053.
  • "value" at locations 7050 and 7054.
    ℹ Use argument names_repair to specify repair strategy.
    Run rlang::last_trace() to see where the error occurred.

So I ran this code to see which columns are duplicated:

  1. &gt; dup_col &lt;- duplicated(base::as.list(as))
  2. colnames(as[dup_col])
  3. character(0)

I ran another code to see if I am referring to the right columns

  1. &gt; as %&gt;%
  2. select(starts_with(&quot;AS&quot;) &amp; ends_with(&quot;_WEIGHT&quot;)) %&gt;%
  3. colnames()
  4. [1] &quot;AS1_WEIGHT&quot; &quot;AS2_WEIGHT&quot; &quot;AS3_WEIGHT&quot; &quot;AS4_WEIGHT&quot; &quot;AS5_WEIGHT&quot; &quot;AS6_WEIGHT&quot; &quot;AS7_WEIGHT&quot; &quot;AS8_WEIGHT&quot;
  5. [9] &quot;AS9_WEIGHT&quot;

>Thank you in advance!

答案1

得分: 1

我怀疑在运行pivot_longer之前,您已经有一个名为namevalue的列,默认情况下pivot_longer尝试创建这些名称的列。正如这里所提到的,错误消息并不一定清楚这就是问题所在。

尝试使用grep("name", colnames(as))grep("value", colnames(as))来查找这些列。

要么在您的数据框中重命名它们,要么使用pivot_longer( ... names_to = "a_new_name_col", values_to = "a_new_value_col")

  1. data.frame(a = 1:2, name = 3:4, value = 7:8) %>%
  2. tidyr::pivot_longer(a)
  3. #Error in `vec_cbind()`:
  4. #! Names must be unique.
  5. #✖ These names are duplicated:
  6. # * "name" at locations 1 and 3.
  7. # * "value" at locations 2 and 4.
  8. #ℹ Use argument `names_repair` to specify repair strategy.
  9. #Run `rlang::last_trace()` to see where the error occurred.
  10. data.frame(a = 1:2, name2 = 3:4, value2 = 7:8) %>%
  11. tidyr::pivot_longer(a)
  12. ## A tibble: 2 × 4
  13. # name2 value2 name value
  14. # <int> <int> <chr> <int>
  15. #1 3 7 a 1
  16. #2 4 8 a 2
英文:

I suspect you already have a column named name or value before you run pivot_longer, which by default tries to create columns with those names. As noted here, the error message isn't necessarily clear that's the problem.

Try grep(&quot;name&quot;, colnames(as)) and grep(&quot;value&quot;, colnames(as)) to find those columns.

Either rename in your data frame or use pivot_longer( ... names_to = &quot;a_new_name_col&quot;, values_to = &quot;a_new_value_col&quot;)

  1. data.frame(a = 1:2, name = 3:4, value = 7:8) %&gt;%
  2. tidyr::pivot_longer(a)
  3. #Error in `vec_cbind()`:
  4. #! Names must be unique.
  5. #✖ These names are duplicated:
  6. # * &quot;name&quot; at locations 1 and 3.
  7. # * &quot;value&quot; at locations 2 and 4.
  8. #ℹ Use argument `names_repair` to specify repair strategy.
  9. #Run `rlang::last_trace()` to see where the error occurred.
  10. data.frame(a = 1:2, name2 = 3:4, value2 = 7:8) %&gt;%
  11. tidyr::pivot_longer(a)
  12. ## A tibble: 2 &#215; 4
  13. # name2 value2 name value
  14. # &lt;int&gt; &lt;int&gt; &lt;chr&gt; &lt;int&gt;
  15. #1 3 7 a 1
  16. #2 4 8 a 2

huangapple
  • 本文由 发表于 2023年6月2日 13:57:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76387472.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定