英文:
Differences between two columns having missing values in one
问题
这是我的数据库示例,以纵向方式包含以下内容:
- id = 个体
- grup_int = 分组的分类变量
- gen = 要测量的变量
- time = id和基因的时间点
- value
我想在整个数据库中执行以下操作:从time = 2减去time = 1的变量,按id和grup_int分组,类似于以下方式:
df %>% dplyr::group_by(id, gen) %>% dplyr::mutate(d_post_pre = value[time == 2] - value[time == 1])
d_post_pre
必须是大小为1,不是0。
ℹ 错误发生在组1中:id = 60801001,gen = "adrb2"。
如您所见,对于"adrb2"基因,甚至没有time == 1的条目,因此会引发此错误。在value中可能出现的可能性有:
- 任何时间点的缺失值,如"adrb2"
- 任何时间点的NAs,如"ccl3",这不应该有关
您能否建议执行此行以免受到缺失值的影响,或者只是粘贴一些文本或字符串以进行过滤?
谢谢!
英文:
This is my database example in longitudinal way containing:
- id = Individuals
- grup_int = categorical variable of group
- gen = variables to be measured
- time = point in time for id and gene
- value
id grup_int gen time value
<dbl> <dbl> <chr> <chr> <dbl>
1 60801001 1 adrb2 2 2.11
2 60801001 1 ccl2 1 0.941
3 60801001 1 ccl2 2 0.248
4 60801001 1 ccl3 1 5.65
5 60801001 1 ccl3 2 NA
What I want to do in my whole database is substracting variables from time = 2 minus time = 1 grouped by id and grup_int --> something like this:
df %>% dplyr::group_by(id, gen) %>% dplyr::mutate(d_post_pre = value [time == 2] - value [time == 1])
`d_post_pre` must be size 1, not 0.
ℹ The error occurred in group 1: id = 60801001, gen = "adrb2".
As you can see for the adrb2 gen, there is no even entry for time == 1, and because of that is throwing this error. The possibilities to be found in value are:
- Missing values in any time point as in "adrb2"
- NAs in any time point as in "ccl3" , this shouldn't matter
Can you suggest any option to perform the line to be shielded from missing values or just paste some text or string to be filtered out?
Thanks in advance!
df <- structure(list(id = structure(c(60801001, 60801001, 60801001,
60801001, 60801001), label = "Identificador", format.spss = "F9.0", display_width = 14L),
grup_int = structure(c(1, 1, 1, 1, 1), format.spss = "F2.0"),
gen = c("adrb2", "ccl2", "ccl2", "ccl3", "ccl3"), time = c("2",
"1", "2", "1", "2"), value = c(2.1098254, 0.94088, 0.24778089,
5.6529145, 0.06939283)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))
答案1
得分: 2
使用 pivot_wider
将 time == 1
和 time == 2
的value
列定义为列,然后你可以简单地相减这些列。然后可以再次转换为长格式。
df |>
tidyr::pivot_wider(
id_cols = c("id", "grup_int", "gen"),
names_from = "time",
names_prefix = "time_",
values_from = "value"
) |>
dplyr::mutate(d_post_pre = time_2 - time_1) |>
tidyr::pivot_longer(
cols = c("time_1", "time_2"),
names_to = "time",
names_prefix = "time_"
) |>
dplyr::filter(!is.na(value)) |>
dplyr::select(id, grup_int, gen, time, value, d_post_pre, everything())
英文:
What about something like this? Use pivot_wider
to define value
columns for time == 1
and time == 2
, and then you can simply subtract the columns. Then you can convert to long format again.
df |>
tidyr::pivot_wider(
id_cols = c("id", "grup_int", "gen"),
names_from = "time",
names_prefix = "time_",
values_from = "value"
) |>
dplyr::mutate(d_post_pre = time_2 - time_1) |>
tidyr::pivot_longer(
cols = c("time_1", "time_2"),
names_to = "time",
names_prefix = "time_"
) |>
dplyr::filter(!is.na(value)) |>
dplyr::select(id, grup_int, gen, time, value, d_post_pre, everything())
# # A tibble: 6 × 6
# id grup_int gen time value d_post_pre
# <dbl> <dbl> <chr> <chr> <dbl> <dbl>
# 1 60801001 1 adrb2 2 2.11 NA
# 2 60801001 1 ccl2 1 0.941 -0.693
# 3 60801001 1 ccl2 2 0.248 -0.693
# 4 60801001 1 ccl3 1 5.65 -5.58
# 5 60801001 1 ccl3 2 0.0694 -5.58
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论