两列中存在缺失值的差异

huangapple go评论54阅读模式
英文:

Differences between two columns having missing values in one

问题

这是我的数据库示例,以纵向方式包含以下内容:

  1. id = 个体
  2. grup_int = 分组的分类变量
  3. gen = 要测量的变量
  4. time = id和基因的时间点
  5. value

我想在整个数据库中执行以下操作:从time = 2减去time = 1的变量,按id和grup_int分组,类似于以下方式:

df %>% dplyr::group_by(id, gen) %>% dplyr::mutate(d_post_pre = value[time == 2] - value[time == 1])

d_post_pre 必须是大小为1,不是0。
ℹ 错误发生在组1中:id = 60801001,gen = "adrb2"。

如您所见,对于"adrb2"基因,甚至没有time == 1的条目,因此会引发此错误。在value中可能出现的可能性有:

  • 任何时间点的缺失值,如"adrb2"
  • 任何时间点的NAs,如"ccl3",这不应该有关

您能否建议执行此行以免受到缺失值的影响,或者只是粘贴一些文本或字符串以进行过滤?

谢谢!

英文:

This is my database example in longitudinal way containing:

  1. id = Individuals
  2. grup_int = categorical variable of group
  3. gen = variables to be measured
  4. time = point in time for id and gene
  5. value

        id grup_int gen   time   value
     <dbl>    <dbl> <chr> <chr>  <dbl>
1 60801001        1 adrb2 2     2.11  
2 60801001        1 ccl2  1     0.941 
3 60801001        1 ccl2  2     0.248 
4 60801001        1 ccl3  1     5.65  
5 60801001        1 ccl3  2     NA

What I want to do in my whole database is substracting variables from time = 2 minus time = 1 grouped by id and grup_int --> something like this:


df %>% dplyr::group_by(id, gen) %>%  dplyr::mutate(d_post_pre =  value [time == 2] - value [time == 1])

`d_post_pre` must be size 1, not 0.
ℹ The error occurred in group 1: id = 60801001, gen = "adrb2".

As you can see for the adrb2 gen, there is no even entry for time == 1, and because of that is throwing this error. The possibilities to be found in value are:

  • Missing values in any time point as in "adrb2"
  • NAs in any time point as in "ccl3" , this shouldn't matter

Can you suggest any option to perform the line to be shielded from missing values or just paste some text or string to be filtered out?

Thanks in advance!


df <- structure(list(id = structure(c(60801001, 60801001, 60801001, 
60801001, 60801001), label = "Identificador", format.spss = "F9.0", display_width = 14L), 
    grup_int = structure(c(1, 1, 1, 1, 1), format.spss = "F2.0"), 
    gen = c("adrb2", "ccl2", "ccl2", "ccl3", "ccl3"), time = c("2", 
    "1", "2", "1", "2"), value = c(2.1098254, 0.94088, 0.24778089, 
    5.6529145, 0.06939283)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L))

答案1

得分: 2

使用 pivot_widertime == 1time == 2value列定义为列,然后你可以简单地相减这些列。然后可以再次转换为长格式。

df |>
  tidyr::pivot_wider(
    id_cols = c("id", "grup_int", "gen"),
    names_from = "time",
    names_prefix = "time_",
    values_from = "value"
  ) |>
  dplyr::mutate(d_post_pre = time_2 - time_1) |>
  tidyr::pivot_longer(
    cols = c("time_1", "time_2"),
    names_to = "time",
    names_prefix = "time_"
  ) |>
  dplyr::filter(!is.na(value)) |>
  dplyr::select(id, grup_int, gen, time, value, d_post_pre, everything())
英文:

What about something like this? Use pivot_wider to define value columns for time == 1 and time == 2, and then you can simply subtract the columns. Then you can convert to long format again.

df |> 
  tidyr::pivot_wider(
    id_cols = c("id", "grup_int", "gen"),
    names_from = "time",
    names_prefix = "time_",
    values_from = "value"
  ) |> 
  dplyr::mutate(d_post_pre = time_2 - time_1) |> 
  tidyr::pivot_longer(
    cols = c("time_1", "time_2"),
    names_to = "time",
    names_prefix = "time_"
  ) |>
  dplyr::filter(!is.na(value)) |> 
  dplyr::select(id, grup_int, gen, time, value, d_post_pre, everything())

# # A tibble: 6 × 6
#         id grup_int gen   time    value d_post_pre
#      <dbl>    <dbl> <chr> <chr>   <dbl>      <dbl>
# 1 60801001        1 adrb2 2      2.11       NA    
# 2 60801001        1 ccl2  1      0.941      -0.693
# 3 60801001        1 ccl2  2      0.248      -0.693
# 4 60801001        1 ccl3  1      5.65       -5.58 
# 5 60801001        1 ccl3  2      0.0694     -5.58 

huangapple
  • 本文由 发表于 2023年4月4日 04:18:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/75923462.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定