tidyr::pivot_longer() 在 R 中出现重复问题,但似乎没有重复的列名或数据集。

huangapple go评论216阅读模式
英文:

tidyr::pivot_longer() with duplicate problems with no apparent duplicate column names or dataset in R

问题

我的目标是将值99999更改为其相邻的值,除非再次出现99999。
我之前从这里得到了建议,现在我遇到了一个新问题。

MRE:
'as'是一个包含9个不同的队列数据集的数据框;共有10030个观测和7060个变量。我主要(目前)正在处理as$AS1_WEIGHTas$AS9_WEIGHT...

> as %>%
+     select(starts_with("AS") & ends_with("_WEIGHT")) %>%
+     head() %>%
+     dput()

结构如下:

structure(list(AS1_WEIGHT = c(72, 59, 50, 55.2, 82.1, 50.4), 
    AS2_WEIGHT = c(74.8, NA, NA, 54.8, 84.5, 52.5), AS3_WEIGHT = c(75.2, 
    NA, NA, 55.9, 81.7, 54.6), AS4_WEIGHT = c(75, NA, NA, 55.1, 
    80.6, NA), AS5_WEIGHT = c(75.4, NA, NA, 58.8, 89.5, NA), 
    AS6_WEIGHT = c(77.3, NA, NA, NA, NA, NA), AS7_WEIGHT = c(70.7, 
    NA, NA, 56, NA, NA), AS8_WEIGHT = c(73.8, NA, NA, 55.5, NA, 
    NA), AS9_WEIGHT = c(74.5, NA, NA, 54.8, NA, 52)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))
as %>%
  mutate(row = row_number()) %>%
  tidyr::pivot_longer(starts_with("AS") & ends_with("_WEIGHT")) %>%
  mutate(value = if_else(value == '99999', lead(value), value), .by = row) %>%
  pivot_wider(names_from = name, values_from = value)

返回错误:

tidyr::pivot_longer() 中的错误:
!名称必须唯一。
✖ 这些名称重复出现:

  • "name" 位于位置7049和7053。
  • "value" 位于位置7050和7054。
    ℹ 使用参数 names_repair 来指定修复策略。
    运行 rlang::last_trace() 查看错误发生的位置。

所以我运行了以下代码来查看哪些列是重复的:

> dup_col <- duplicated(base::as.list(as))
colnames(as[dup_col])

character(0)

我运行了另一个代码来查看我是否引用了正确的列:

> as %>%
  select(starts_with("AS") & ends_with("_WEIGHT")) %>%
  colnames()

[1] "AS1_WEIGHT" "AS2_WEIGHT" "AS3_WEIGHT" "AS4_WEIGHT" "AS5_WEIGHT" "AS6_WEIGHT" "AS7_WEIGHT" "AS8_WEIGHT"
[9] "AS9_WEIGHT"

提前感谢您!

英文:

tidyr::pivot_longer() 在 R 中出现重复问题,但似乎没有重复的列名或数据集。My goal is to change value 99999 with the value adjacent to it unless it's 99999 again.
I took the advice from here before, now I am having a new problem.

MRE:
'as' is a dataframe with 9 different cohort datasets; 10030 obs of 7060 variables. I am mainly (as of now) dealing with as$AS1_WEIGHT ... as$AS9_WEIGHT

&gt; as %&gt;%
+     select(starts_with(&quot;AS&quot;) &amp; ends_with(&quot;_WEIGHT&quot;)) %&gt;% head() %&gt;% dput()

structure(list(AS1_WEIGHT = c(72, 59, 50, 55.2, 82.1, 50.4), 
    AS2_WEIGHT = c(74.8, NA, NA, 54.8, 84.5, 52.5), AS3_WEIGHT = c(75.2, 
    NA, NA, 55.9, 81.7, 54.6), AS4_WEIGHT = c(75, NA, NA, 55.1, 
    80.6, NA), AS5_WEIGHT = c(75.4, NA, NA, 58.8, 89.5, NA), 
    AS6_WEIGHT = c(77.3, NA, NA, NA, NA, NA), AS7_WEIGHT = c(70.7, 
    NA, NA, 56, NA, NA), AS8_WEIGHT = c(73.8, NA, NA, 55.5, NA, 
    NA), AS9_WEIGHT = c(74.5, NA, NA, 54.8, NA, 52)), row.names = c(NA, 
-6L), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;))


as %&lt;&gt;%
  mutate(row = row_number()) %&gt;%
  tidyr::pivot_longer(starts_with(&quot;AS&quot;) &amp; ends_with(&quot;_WEIGHT&quot;)) %&gt;%
  mutate(value = if_else(value == &#39;99999&#39;, lead(value), value), .by = row) %&gt;%
  pivot_wider(names_from = name, values_from = value)

returns:

Error in tidyr::pivot_longer():
! Names must be unique.
✖ These names are duplicated:

  • "name" at locations 7049 and 7053.
  • "value" at locations 7050 and 7054.
    ℹ Use argument names_repair to specify repair strategy.
    Run rlang::last_trace() to see where the error occurred.

So I ran this code to see which columns are duplicated:

&gt; dup_col &lt;- duplicated(base::as.list(as))
colnames(as[dup_col])

character(0)

I ran another code to see if I am referring to the right columns

&gt; as %&gt;%
  select(starts_with(&quot;AS&quot;) &amp; ends_with(&quot;_WEIGHT&quot;)) %&gt;%
  colnames()

[1] &quot;AS1_WEIGHT&quot; &quot;AS2_WEIGHT&quot; &quot;AS3_WEIGHT&quot; &quot;AS4_WEIGHT&quot; &quot;AS5_WEIGHT&quot; &quot;AS6_WEIGHT&quot; &quot;AS7_WEIGHT&quot; &quot;AS8_WEIGHT&quot;
[9] &quot;AS9_WEIGHT&quot;

>Thank you in advance!

答案1

得分: 1

我怀疑在运行pivot_longer之前,您已经有一个名为namevalue的列,默认情况下pivot_longer尝试创建这些名称的列。正如这里所提到的,错误消息并不一定清楚这就是问题所在。

尝试使用grep("name", colnames(as))grep("value", colnames(as))来查找这些列。

要么在您的数据框中重命名它们,要么使用pivot_longer( ... names_to = "a_new_name_col", values_to = "a_new_value_col")

data.frame(a = 1:2, name = 3:4, value = 7:8) %>%
  tidyr::pivot_longer(a)
#Error in `vec_cbind()`:
#! Names must be unique.
#✖ These names are duplicated:
#  * "name" at locations 1 and 3.
#  * "value" at locations 2 and 4.
#ℹ Use argument `names_repair` to specify repair strategy.
#Run `rlang::last_trace()` to see where the error occurred.

data.frame(a = 1:2, name2 = 3:4, value2 = 7:8) %>%
  tidyr::pivot_longer(a)
## A tibble: 2 × 4
#  name2 value2 name  value
#  <int>  <int> <chr> <int>
#1     3      7 a         1
#2     4      8 a         2
英文:

I suspect you already have a column named name or value before you run pivot_longer, which by default tries to create columns with those names. As noted here, the error message isn't necessarily clear that's the problem.

Try grep(&quot;name&quot;, colnames(as)) and grep(&quot;value&quot;, colnames(as)) to find those columns.

Either rename in your data frame or use pivot_longer( ... names_to = &quot;a_new_name_col&quot;, values_to = &quot;a_new_value_col&quot;)

data.frame(a = 1:2, name = 3:4, value = 7:8) %&gt;%
  tidyr::pivot_longer(a)
#Error in `vec_cbind()`:
#! Names must be unique.
#✖ These names are duplicated:
#  * &quot;name&quot; at locations 1 and 3.
#  * &quot;value&quot; at locations 2 and 4.
#ℹ Use argument `names_repair` to specify repair strategy.
#Run `rlang::last_trace()` to see where the error occurred.

data.frame(a = 1:2, name2 = 3:4, value2 = 7:8) %&gt;%
  tidyr::pivot_longer(a)
## A tibble: 2 &#215; 4
#  name2 value2 name  value
#  &lt;int&gt;  &lt;int&gt; &lt;chr&gt; &lt;int&gt;
#1     3      7 a         1
#2     4      8 a         2

huangapple
  • 本文由 发表于 2023年6月2日 13:57:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76387472.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定