2023年4月4日 04:18:59go评论147阅读模式

英文:

Differences between two columns having missing values in one

问题

这是我的数据库示例，以纵向方式包含以下内容：

id = 个体
grup_int = 分组的分类变量
gen = 要测量的变量
time = id和基因的时间点
value

我想在整个数据库中执行以下操作：从time = 2减去time = 1的变量，按id和grup_int分组，类似于以下方式：

df %>% dplyr::group_by(id, gen) %>% dplyr::mutate(d_post_pre = value[time == 2] - value[time == 1])

d_post_pre 必须是大小为1，不是0。
ℹ 错误发生在组1中：id = 60801001，gen = "adrb2"。

如您所见，对于"adrb2"基因，甚至没有time == 1的条目，因此会引发此错误。在value中可能出现的可能性有：

任何时间点的缺失值，如"adrb2"
任何时间点的NAs，如"ccl3"，这不应该有关

您能否建议执行此行以免受到缺失值的影响，或者只是粘贴一些文本或字符串以进行过滤？

谢谢！

英文:

This is my database example in longitudinal way containing:

id = Individuals
grup_int = categorical variable of group
gen = variables to be measured
time = point in time for id and gene
value


        id grup_int gen   time   value
     &lt;dbl&gt;    &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;  &lt;dbl&gt;
1 60801001        1 adrb2 2     2.11  
2 60801001        1 ccl2  1     0.941 
3 60801001        1 ccl2  2     0.248 
4 60801001        1 ccl3  1     5.65  
5 60801001        1 ccl3  2     NA

What I want to do in my whole database is substracting variables from time = 2 minus time = 1 grouped by id and grup_int --> something like this:


df %&gt;% dplyr::group_by(id, gen) %&gt;%  dplyr::mutate(d_post_pre =  value [time == 2] - value [time == 1])

`d_post_pre` must be size 1, not 0.
ℹ The error occurred in group 1: id = 60801001, gen = &quot;adrb2&quot;.

As you can see for the adrb2 gen, there is no even entry for time == 1, and because of that is throwing this error. The possibilities to be found in value are:

Missing values in any time point as in "adrb2"
NAs in any time point as in "ccl3" , this shouldn't matter

Can you suggest any option to perform the line to be shielded from missing values or just paste some text or string to be filtered out?

Thanks in advance!


df &lt;- structure(list(id = structure(c(60801001, 60801001, 60801001, 
60801001, 60801001), label = &quot;Identificador&quot;, format.spss = &quot;F9.0&quot;, display_width = 14L), 
    grup_int = structure(c(1, 1, 1, 1, 1), format.spss = &quot;F2.0&quot;), 
    gen = c(&quot;adrb2&quot;, &quot;ccl2&quot;, &quot;ccl2&quot;, &quot;ccl3&quot;, &quot;ccl3&quot;), time = c(&quot;2&quot;, 
    &quot;1&quot;, &quot;2&quot;, &quot;1&quot;, &quot;2&quot;), value = c(2.1098254, 0.94088, 0.24778089, 
    5.6529145, 0.06939283)), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;
), row.names = c(NA, -5L))

答案1

得分: 2

使用 pivot_wider 将 time == 1 和 time == 2 的value列定义为列，然后你可以简单地相减这些列。然后可以再次转换为长格式。

df |&gt;
  tidyr::pivot_wider(
    id_cols = c("id", "grup_int", "gen"),
    names_from = "time",
    names_prefix = "time_",
    values_from = "value"
  ) |&gt;
  dplyr::mutate(d_post_pre = time_2 - time_1) |&gt;
  tidyr::pivot_longer(
    cols = c("time_1", "time_2"),
    names_to = "time",
    names_prefix = "time_"
  ) |&gt;
  dplyr::filter(!is.na(value)) |&gt;
  dplyr::select(id, grup_int, gen, time, value, d_post_pre, everything())

英文:

What about something like this? Use pivot_wider to define value columns for time == 1 and time == 2, and then you can simply subtract the columns. Then you can convert to long format again.

df |&gt; 
  tidyr::pivot_wider(
    id_cols = c(&quot;id&quot;, &quot;grup_int&quot;, &quot;gen&quot;),
    names_from = &quot;time&quot;,
    names_prefix = &quot;time_&quot;,
    values_from = &quot;value&quot;
  ) |&gt; 
  dplyr::mutate(d_post_pre = time_2 - time_1) |&gt; 
  tidyr::pivot_longer(
    cols = c(&quot;time_1&quot;, &quot;time_2&quot;),
    names_to = &quot;time&quot;,
    names_prefix = &quot;time_&quot;
  ) |&gt;
  dplyr::filter(!is.na(value)) |&gt; 
  dplyr::select(id, grup_int, gen, time, value, d_post_pre, everything())

# # A tibble: 6 &#215; 6
#         id grup_int gen   time    value d_post_pre
#      &lt;dbl&gt;    &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt;   &lt;dbl&gt;      &lt;dbl&gt;
# 1 60801001        1 adrb2 2      2.11       NA    
# 2 60801001        1 ccl2  1      0.941      -0.693
# 3 60801001        1 ccl2  2      0.248      -0.693
# 4 60801001        1 ccl3  1      5.65       -5.58 
# 5 60801001        1 ccl3  2      0.0694     -5.58

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

两列中存在缺失值的差异

问题

答案1

如何基于独占的共享值在R中选择列？

在If语句中可以使用变量吗？

比较 size_t 和 -1

Iterate over each row to obtain matches between row values and the rownames of another dataframe df2, then subset df2

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论