英文:
Keeping one weight measurement for participants with multiple measurements in R
问题
所以我正在努力找出解决这个问题的最佳方法。我正在处理一个纵向数据集,每个参与者在同一天有多个体重测量。我想要做的是只保留每个参与者在那一天的第一次观测(测量)。我正在使用R。
这是数据的样本。
ID Date Weight
1 2/1 160
1 2/1 159
1 2/1 160.5
2 2/1 200
2 2/1 198
2 2/1 201
我还不确定如何处理这个问题。我的期望是使数据集看起来像这样(只保留第一次观测)。
ID Date Weight
1 2/1 160
2 2/1 200
英文:
So I am trying to figure out the best way to deal with this issue. I am working with a longitudinal dataset that has multiple weight measurements for each participant on the same day. What I want to do is to only keep the first observation (measurement) for each participant on that day. I am using R.
This is an example of how the data looks like.
ID Date Weight
1 2/1 160
1 2/1 159
1 2/1 160.5
2 2/1 200
2 2/1 198
2 2/1 201
I am not sure how to deal with this yet.
My expectation is to have the dataset look like this (only keeping the first observation)
ID Date Weight
1 2/1 160
2 2/1 200
答案1
得分: 1
# 在按 'ID' 和 'Date' 分组后,我们可以使用 `slice_head` 来筛选数据
library(dplyr)
df1 %>%
group_by(ID, Date) %>%
slice_head(n = 1) %>%
ungroup
-output
# 一个 tibble: 2 × 3
ID Date Weight
<int> <chr> <dbl>
1 1 2/1 160
2 2 2/1 200
或者使用 base R
中的 duplicated
df1[!duplicated(df1[c("ID", "Date")]),]
数据
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), Date = c("2/1",
"2/1", "2/1", "2/1", "2/1", "2/1"), Weight = c(160, 159, 160.5,
200, 198, 201)), class = "data.frame", row.names = c(NA, -6L))
英文:
We can use slice_head
after grouping by 'ID' and 'Date'
library(dplyr)
df1 %>%
group_by(ID, Date) %>%
slice_head(n = 1) %>%
ungroup
-output
# A tibble: 2 × 3
ID Date Weight
<int> <chr> <dbl>
1 1 2/1 160
2 2 2/1 200
Or with duplicated
in base R
df1[!duplicated(df1[c("ID", "Date")],]
data
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), Date = c("2/1",
"2/1", "2/1", "2/1", "2/1", "2/1"), Weight = c(160, 159, 160.5,
200, 198, 201)), class = "data.frame", row.names = c(NA, -6L))
答案2
得分: 1
如果您喜欢data.table(特别适用于大型数据集),您可以使用以下代码:
library(data.table)
df1 <- as.data.table(df1)
df1[ , rowNum := seq_len(.N), by = .(ID, Date)]
df1 <- df1[rowNum == 1]
英文:
If you like data.table (especially fast for large data sets) you could go with:
library(data.table)
df1 <- as.data.table(df1)
df1[ , rowNum := seq_len(.N), by = .(ID, Date)]
df1 <- df1[rowNum == 1]
答案3
得分: 1
另一种方法是使用 filter
结合 row_number()
:
library(dplyr)
df1 %>%
group_by(ID) %>%
filter(row_number() == 1) %>%
ungroup
ID Date Weight
<int> <chr> <dbl>
1 1 2/1 160
2 2 2/1 200
英文:
Another way using filter
combined with row_number()
:
library(dplyr)
df1 %>%
group_by(ID) %>%
filter(row_number() == 1) %>%
ungroup
ID Date Weight
<int> <chr> <dbl>
1 1 2/1 160
2 2 2/1 200
答案4
得分: 0
如果您只想保留第一个测量值,请使用否定的 duplicated
。
dat[!duplicated(dat$ID), ]
# ID Date Weight
# 1 1 2/1 160
# 4 2 2/1 200
数据:
dat <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), Date = c("2/1",
"2/1", "2/1", "2/1", "2/1", "2/1"), Weight = c(160, 159, 160.5,
200, 198, 201)), class = "data.frame", row.names = c(NA, -6L))
英文:
If you simply want to keep only the first measurement, use negated duplicated
.
dat[!duplicated(dat$ID), ]
# ID Date Weight
# 1 1 2/1 160
# 4 2 2/1 200
Data:
dat <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), Date = c("2/1",
"2/1", "2/1", "2/1", "2/1", "2/1"), Weight = c(160, 159, 160.5,
200, 198, 201)), class = "data.frame", row.names = c(NA, -6L))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论