在R中为具有多个测量的参与者保留一个体重测量

huangapple go评论97阅读模式
英文:

Keeping one weight measurement for participants with multiple measurements in R

问题

所以我正在努力找出解决这个问题的最佳方法。我正在处理一个纵向数据集,每个参与者在同一天有多个体重测量。我想要做的是只保留每个参与者在那一天的第一次观测(测量)。我正在使用R。
这是数据的样本。

  1. ID Date Weight
  2. 1 2/1 160
  3. 1 2/1 159
  4. 1 2/1 160.5
  5. 2 2/1 200
  6. 2 2/1 198
  7. 2 2/1 201

我还不确定如何处理这个问题。我的期望是使数据集看起来像这样(只保留第一次观测)。

  1. ID Date Weight
  2. 1 2/1 160
  3. 2 2/1 200
英文:

So I am trying to figure out the best way to deal with this issue. I am working with a longitudinal dataset that has multiple weight measurements for each participant on the same day. What I want to do is to only keep the first observation (measurement) for each participant on that day. I am using R.
This is an example of how the data looks like.

  1. ID Date Weight
  2. 1 2/1 160
  3. 1 2/1 159
  4. 1 2/1 160.5
  5. 2 2/1 200
  6. 2 2/1 198
  7. 2 2/1 201

I am not sure how to deal with this yet.
My expectation is to have the dataset look like this (only keeping the first observation)

  1. ID Date Weight
  2. 1 2/1 160
  3. 2 2/1 200

答案1

得分: 1

  1. # 在按 'ID' 和 'Date' 分组后,我们可以使用 `slice_head` 来筛选数据
  2. library(dplyr)
  3. df1 %>%
  4. group_by(ID, Date) %>%
  5. slice_head(n = 1) %>%
  6. ungroup

-output

  1. # 一个 tibble: 2 × 3
  2. ID Date Weight
  3. <int> <chr> <dbl>
  4. 1 1 2/1 160
  5. 2 2 2/1 200

或者使用 base R 中的 duplicated

  1. df1[!duplicated(df1[c("ID", "Date")]),]

数据

  1. df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), Date = c("2/1",
  2. "2/1", "2/1", "2/1", "2/1", "2/1"), Weight = c(160, 159, 160.5,
  3. 200, 198, 201)), class = "data.frame", row.names = c(NA, -6L))
英文:

We can use slice_head after grouping by 'ID' and 'Date'

  1. library(dplyr)
  2. df1 %&gt;%
  3. group_by(ID, Date) %&gt;%
  4. slice_head(n = 1) %&gt;%
  5. ungroup

-output

  1. # A tibble: 2 &#215; 3
  2. ID Date Weight
  3. &lt;int&gt; &lt;chr&gt; &lt;dbl&gt;
  4. 1 1 2/1 160
  5. 2 2 2/1 200

Or with duplicated in base R

  1. df1[!duplicated(df1[c(&quot;ID&quot;, &quot;Date&quot;)],]

data

  1. df1 &lt;- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), Date = c(&quot;2/1&quot;,
  2. &quot;2/1&quot;, &quot;2/1&quot;, &quot;2/1&quot;, &quot;2/1&quot;, &quot;2/1&quot;), Weight = c(160, 159, 160.5,
  3. 200, 198, 201)), class = &quot;data.frame&quot;, row.names = c(NA, -6L))

答案2

得分: 1

如果您喜欢data.table(特别适用于大型数据集),您可以使用以下代码:

  1. library(data.table)
  2. df1 <- as.data.table(df1)
  3. df1[ , rowNum := seq_len(.N), by = .(ID, Date)]
  4. df1 <- df1[rowNum == 1]
英文:

If you like data.table (especially fast for large data sets) you could go with:

  1. library(data.table)
  2. df1 &lt;- as.data.table(df1)
  3. df1[ , rowNum := seq_len(.N), by = .(ID, Date)]
  4. df1 &lt;- df1[rowNum == 1]

答案3

得分: 1

另一种方法是使用 filter 结合 row_number()

  1. library(dplyr)
  2. df1 %>%
  3. group_by(ID) %>%
  4. filter(row_number() == 1) %>%
  5. ungroup
  1. ID Date Weight
  2. <int> <chr> <dbl>
  3. 1 1 2/1 160
  4. 2 2 2/1 200
英文:

Another way using filter combined with row_number():

  1. library(dplyr)
  2. df1 %&gt;%
  3. group_by(ID) %&gt;%
  4. filter(row_number() == 1) %&gt;%
  5. ungroup
  1. ID Date Weight
  2. &lt;int&gt; &lt;chr&gt; &lt;dbl&gt;
  3. 1 1 2/1 160
  4. 2 2 2/1 200

答案4

得分: 0

如果您只想保留第一个测量值,请使用否定的 duplicated

  1. dat[!duplicated(dat$ID), ]
  2. # ID Date Weight
  3. # 1 1 2/1 160
  4. # 4 2 2/1 200

数据:

  1. dat <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), Date = c("2/1",
  2. "2/1", "2/1", "2/1", "2/1", "2/1"), Weight = c(160, 159, 160.5,
  3. 200, 198, 201)), class = "data.frame", row.names = c(NA, -6L))
英文:

If you simply want to keep only the first measurement, use negated duplicated.

  1. dat[!duplicated(dat$ID), ]
  2. # ID Date Weight
  3. # 1 1 2/1 160
  4. # 4 2 2/1 200

Data:

  1. dat &lt;- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), Date = c(&quot;2/1&quot;,
  2. &quot;2/1&quot;, &quot;2/1&quot;, &quot;2/1&quot;, &quot;2/1&quot;, &quot;2/1&quot;), Weight = c(160, 159, 160.5,
  3. 200, 198, 201)), class = &quot;data.frame&quot;, row.names = c(NA, -6L))

huangapple
  • 本文由 发表于 2023年2月9日 02:15:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/75390091.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定