英文:
tidyr::pivot_longer() with duplicate problems with no apparent duplicate column names or dataset in R
问题
我的目标是将值99999更改为其相邻的值,除非再次出现99999。
我之前从这里得到了建议,现在我遇到了一个新问题。
MRE:
'as'是一个包含9个不同的队列数据集的数据框;共有10030个观测和7060个变量。我主要(目前)正在处理as$AS1_WEIGHT
到as$AS9_WEIGHT
...
> as %>%
+ select(starts_with("AS") & ends_with("_WEIGHT")) %>%
+ head() %>%
+ dput()
结构如下:
structure(list(AS1_WEIGHT = c(72, 59, 50, 55.2, 82.1, 50.4),
AS2_WEIGHT = c(74.8, NA, NA, 54.8, 84.5, 52.5), AS3_WEIGHT = c(75.2,
NA, NA, 55.9, 81.7, 54.6), AS4_WEIGHT = c(75, NA, NA, 55.1,
80.6, NA), AS5_WEIGHT = c(75.4, NA, NA, 58.8, 89.5, NA),
AS6_WEIGHT = c(77.3, NA, NA, NA, NA, NA), AS7_WEIGHT = c(70.7,
NA, NA, 56, NA, NA), AS8_WEIGHT = c(73.8, NA, NA, 55.5, NA,
NA), AS9_WEIGHT = c(74.5, NA, NA, 54.8, NA, 52)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
as %>%
mutate(row = row_number()) %>%
tidyr::pivot_longer(starts_with("AS") & ends_with("_WEIGHT")) %>%
mutate(value = if_else(value == '99999', lead(value), value), .by = row) %>%
pivot_wider(names_from = name, values_from = value)
返回错误:
在 tidyr::pivot_longer()
中的错误:
!名称必须唯一。
✖ 这些名称重复出现:
- "name" 位于位置7049和7053。
- "value" 位于位置7050和7054。
ℹ 使用参数names_repair
来指定修复策略。
运行rlang::last_trace()
查看错误发生的位置。
所以我运行了以下代码来查看哪些列是重复的:
> dup_col <- duplicated(base::as.list(as))
colnames(as[dup_col])
character(0)
我运行了另一个代码来查看我是否引用了正确的列:
> as %>%
select(starts_with("AS") & ends_with("_WEIGHT")) %>%
colnames()
[1] "AS1_WEIGHT" "AS2_WEIGHT" "AS3_WEIGHT" "AS4_WEIGHT" "AS5_WEIGHT" "AS6_WEIGHT" "AS7_WEIGHT" "AS8_WEIGHT"
[9] "AS9_WEIGHT"
提前感谢您!
英文:
My goal is to change value 99999 with the value adjacent to it unless it's 99999 again.
I took the advice from here before, now I am having a new problem.
MRE:
'as' is a dataframe with 9 different cohort datasets; 10030 obs of 7060 variables. I am mainly (as of now) dealing with as$AS1_WEIGHT
... as$AS9_WEIGHT
> as %>%
+ select(starts_with("AS") & ends_with("_WEIGHT")) %>% head() %>% dput()
structure(list(AS1_WEIGHT = c(72, 59, 50, 55.2, 82.1, 50.4),
AS2_WEIGHT = c(74.8, NA, NA, 54.8, 84.5, 52.5), AS3_WEIGHT = c(75.2,
NA, NA, 55.9, 81.7, 54.6), AS4_WEIGHT = c(75, NA, NA, 55.1,
80.6, NA), AS5_WEIGHT = c(75.4, NA, NA, 58.8, 89.5, NA),
AS6_WEIGHT = c(77.3, NA, NA, NA, NA, NA), AS7_WEIGHT = c(70.7,
NA, NA, 56, NA, NA), AS8_WEIGHT = c(73.8, NA, NA, 55.5, NA,
NA), AS9_WEIGHT = c(74.5, NA, NA, 54.8, NA, 52)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
as %<>%
mutate(row = row_number()) %>%
tidyr::pivot_longer(starts_with("AS") & ends_with("_WEIGHT")) %>%
mutate(value = if_else(value == '99999', lead(value), value), .by = row) %>%
pivot_wider(names_from = name, values_from = value)
returns:
Error in tidyr::pivot_longer()
:
! Names must be unique.
✖ These names are duplicated:
- "name" at locations 7049 and 7053.
- "value" at locations 7050 and 7054.
ℹ Use argumentnames_repair
to specify repair strategy.
Runrlang::last_trace()
to see where the error occurred.
So I ran this code to see which columns are duplicated:
> dup_col <- duplicated(base::as.list(as))
colnames(as[dup_col])
character(0)
I ran another code to see if I am referring to the right columns
> as %>%
select(starts_with("AS") & ends_with("_WEIGHT")) %>%
colnames()
[1] "AS1_WEIGHT" "AS2_WEIGHT" "AS3_WEIGHT" "AS4_WEIGHT" "AS5_WEIGHT" "AS6_WEIGHT" "AS7_WEIGHT" "AS8_WEIGHT"
[9] "AS9_WEIGHT"
>Thank you in advance!
答案1
得分: 1
我怀疑在运行pivot_longer之前,您已经有一个名为name
或value
的列,默认情况下pivot_longer尝试创建这些名称的列。正如这里所提到的,错误消息并不一定清楚这就是问题所在。
尝试使用grep("name", colnames(as))
和grep("value", colnames(as))
来查找这些列。
要么在您的数据框中重命名它们,要么使用pivot_longer( ... names_to = "a_new_name_col", values_to = "a_new_value_col")
。
data.frame(a = 1:2, name = 3:4, value = 7:8) %>%
tidyr::pivot_longer(a)
#Error in `vec_cbind()`:
#! Names must be unique.
#✖ These names are duplicated:
# * "name" at locations 1 and 3.
# * "value" at locations 2 and 4.
#ℹ Use argument `names_repair` to specify repair strategy.
#Run `rlang::last_trace()` to see where the error occurred.
data.frame(a = 1:2, name2 = 3:4, value2 = 7:8) %>%
tidyr::pivot_longer(a)
## A tibble: 2 × 4
# name2 value2 name value
# <int> <int> <chr> <int>
#1 3 7 a 1
#2 4 8 a 2
英文:
I suspect you already have a column named name
or value
before you run pivot_longer, which by default tries to create columns with those names. As noted here, the error message isn't necessarily clear that's the problem.
Try grep("name", colnames(as))
and grep("value", colnames(as))
to find those columns.
Either rename in your data frame or use pivot_longer( ... names_to = "a_new_name_col", values_to = "a_new_value_col")
data.frame(a = 1:2, name = 3:4, value = 7:8) %>%
tidyr::pivot_longer(a)
#Error in `vec_cbind()`:
#! Names must be unique.
#✖ These names are duplicated:
# * "name" at locations 1 and 3.
# * "value" at locations 2 and 4.
#ℹ Use argument `names_repair` to specify repair strategy.
#Run `rlang::last_trace()` to see where the error occurred.
data.frame(a = 1:2, name2 = 3:4, value2 = 7:8) %>%
tidyr::pivot_longer(a)
## A tibble: 2 × 4
# name2 value2 name value
# <int> <int> <chr> <int>
#1 3 7 a 1
#2 4 8 a 2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论