2023年6月2日 13:57:47go评论276阅读模式

英文:

tidyr::pivot_longer() with duplicate problems with no apparent duplicate column names or dataset in R

问题

我的目标是将值99999更改为其相邻的值，除非再次出现99999。
我之前从这里得到了建议，现在我遇到了一个新问题。

MRE:
'as'是一个包含9个不同的队列数据集的数据框；共有10030个观测和7060个变量。我主要（目前）正在处理as$AS1_WEIGHT到as$AS9_WEIGHT...

> as %>%
+     select(starts_with("AS") & ends_with("_WEIGHT")) %>%
+     head() %>%
+     dput()

结构如下：

structure(list(AS1_WEIGHT = c(72, 59, 50, 55.2, 82.1, 50.4), 
    AS2_WEIGHT = c(74.8, NA, NA, 54.8, 84.5, 52.5), AS3_WEIGHT = c(75.2, 
    NA, NA, 55.9, 81.7, 54.6), AS4_WEIGHT = c(75, NA, NA, 55.1, 
    80.6, NA), AS5_WEIGHT = c(75.4, NA, NA, 58.8, 89.5, NA), 
    AS6_WEIGHT = c(77.3, NA, NA, NA, NA, NA), AS7_WEIGHT = c(70.7, 
    NA, NA, 56, NA, NA), AS8_WEIGHT = c(73.8, NA, NA, 55.5, NA, 
    NA), AS9_WEIGHT = c(74.5, NA, NA, 54.8, NA, 52)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

as %>%
  mutate(row = row_number()) %>%
  tidyr::pivot_longer(starts_with("AS") & ends_with("_WEIGHT")) %>%
  mutate(value = if_else(value == '99999', lead(value), value), .by = row) %>%
  pivot_wider(names_from = name, values_from = value)

返回错误：

在 tidyr::pivot_longer() 中的错误：
！名称必须唯一。
✖ 这些名称重复出现：

"name" 位于位置7049和7053。
"value" 位于位置7050和7054。
ℹ 使用参数 names_repair 来指定修复策略。
运行 rlang::last_trace() 查看错误发生的位置。

所以我运行了以下代码来查看哪些列是重复的：

> dup_col <- duplicated(base::as.list(as))
colnames(as[dup_col])
character(0)

我运行了另一个代码来查看我是否引用了正确的列：

> as %>%
  select(starts_with("AS") & ends_with("_WEIGHT")) %>%
  colnames()
[1] "AS1_WEIGHT" "AS2_WEIGHT" "AS3_WEIGHT" "AS4_WEIGHT" "AS5_WEIGHT" "AS6_WEIGHT" "AS7_WEIGHT" "AS8_WEIGHT"
[9] "AS9_WEIGHT"

提前感谢您！

英文:

My goal is to change value 99999 with the value adjacent to it unless it's 99999 again.
I took the advice from here before, now I am having a new problem.

MRE:
'as' is a dataframe with 9 different cohort datasets; 10030 obs of 7060 variables. I am mainly (as of now) dealing with as$AS1_WEIGHT ... as$AS9_WEIGHT

&gt; as %&gt;%
+     select(starts_with(&quot;AS&quot;) &amp; ends_with(&quot;_WEIGHT&quot;)) %&gt;% head() %&gt;% dput()
structure(list(AS1_WEIGHT = c(72, 59, 50, 55.2, 82.1, 50.4), 
    AS2_WEIGHT = c(74.8, NA, NA, 54.8, 84.5, 52.5), AS3_WEIGHT = c(75.2, 
    NA, NA, 55.9, 81.7, 54.6), AS4_WEIGHT = c(75, NA, NA, 55.1, 
    80.6, NA), AS5_WEIGHT = c(75.4, NA, NA, 58.8, 89.5, NA), 
    AS6_WEIGHT = c(77.3, NA, NA, NA, NA, NA), AS7_WEIGHT = c(70.7, 
    NA, NA, 56, NA, NA), AS8_WEIGHT = c(73.8, NA, NA, 55.5, NA, 
    NA), AS9_WEIGHT = c(74.5, NA, NA, 54.8, NA, 52)), row.names = c(NA, 
-6L), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;))

as %&lt;&gt;%
  mutate(row = row_number()) %&gt;%
  tidyr::pivot_longer(starts_with(&quot;AS&quot;) &amp; ends_with(&quot;_WEIGHT&quot;)) %&gt;%
  mutate(value = if_else(value == &#39;99999&#39;, lead(value), value), .by = row) %&gt;%
  pivot_wider(names_from = name, values_from = value)

returns:

Error in tidyr::pivot_longer():
! Names must be unique.
✖ These names are duplicated:

"name" at locations 7049 and 7053.
"value" at locations 7050 and 7054.
ℹ Use argument names_repair to specify repair strategy.
Run rlang::last_trace() to see where the error occurred.

So I ran this code to see which columns are duplicated:

&gt; dup_col &lt;- duplicated(base::as.list(as))
colnames(as[dup_col])
character(0)

I ran another code to see if I am referring to the right columns

&gt; as %&gt;%
  select(starts_with(&quot;AS&quot;) &amp; ends_with(&quot;_WEIGHT&quot;)) %&gt;%
  colnames()
[1] &quot;AS1_WEIGHT&quot; &quot;AS2_WEIGHT&quot; &quot;AS3_WEIGHT&quot; &quot;AS4_WEIGHT&quot; &quot;AS5_WEIGHT&quot; &quot;AS6_WEIGHT&quot; &quot;AS7_WEIGHT&quot; &quot;AS8_WEIGHT&quot;
[9] &quot;AS9_WEIGHT&quot;

>Thank you in advance!

答案1

得分: 1

我怀疑在运行pivot_longer之前，您已经有一个名为name或value的列，默认情况下pivot_longer尝试创建这些名称的列。正如这里所提到的，错误消息并不一定清楚这就是问题所在。

尝试使用grep("name", colnames(as))和grep("value", colnames(as))来查找这些列。

要么在您的数据框中重命名它们，要么使用pivot_longer( ... names_to = "a_new_name_col", values_to = "a_new_value_col")。

data.frame(a = 1:2, name = 3:4, value = 7:8) %>%
  tidyr::pivot_longer(a)
#Error in `vec_cbind()`:
#! Names must be unique.
#✖ These names are duplicated:
#  * "name" at locations 1 and 3.
#  * "value" at locations 2 and 4.
#ℹ Use argument `names_repair` to specify repair strategy.
#Run `rlang::last_trace()` to see where the error occurred.
data.frame(a = 1:2, name2 = 3:4, value2 = 7:8) %>%
  tidyr::pivot_longer(a)
## A tibble: 2 × 4
#  name2 value2 name  value
#  <int>  <int> <chr> <int>
#1     3      7 a         1
#2     4      8 a         2

英文:

I suspect you already have a column named name or value before you run pivot_longer, which by default tries to create columns with those names. As noted here, the error message isn't necessarily clear that's the problem.

Try grep("name", colnames(as)) and grep("value", colnames(as)) to find those columns.

Either rename in your data frame or use pivot_longer( ... names_to = "a_new_name_col", values_to = "a_new_value_col")

data.frame(a = 1:2, name = 3:4, value = 7:8) %&gt;%
  tidyr::pivot_longer(a)
#Error in `vec_cbind()`:
#! Names must be unique.
#✖ These names are duplicated:
#  * &quot;name&quot; at locations 1 and 3.
#  * &quot;value&quot; at locations 2 and 4.
#ℹ Use argument `names_repair` to specify repair strategy.
#Run `rlang::last_trace()` to see where the error occurred.
data.frame(a = 1:2, name2 = 3:4, value2 = 7:8) %&gt;%
  tidyr::pivot_longer(a)
## A tibble: 2 &#215; 4
#  name2 value2 name  value
#  &lt;int&gt;  &lt;int&gt; &lt;chr&gt; &lt;int&gt;
#1     3      7 a         1
#2     4      8 a         2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

tidyr::pivot_longer() 在 R 中出现重复问题，但似乎没有重复的列名或数据集。

问题

答案1

有没有一种方法可以计算两个具有不同范围的单独数据集的检测概率？

根据列的唯一级别修改数据框，然后将其中的2个其他列的值合并。

Missing data using plot() in R: should I use na.omit(), !is.na(), approx()? If so then how?

在RStudio Cloud中通过swirl安装dplyr。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。