2023年2月9日 02:15:54go评论97阅读模式

英文:

Keeping one weight measurement for participants with multiple measurements in R

问题

所以我正在努力找出解决这个问题的最佳方法。我正在处理一个纵向数据集，每个参与者在同一天有多个体重测量。我想要做的是只保留每个参与者在那一天的第一次观测（测量）。我正在使用R。
这是数据的样本。

ID     Date    Weight
1     2/1     160
1     2/1     159
1     2/1     160.5
2     2/1     200
2     2/1     198
2     2/1     201

我还不确定如何处理这个问题。我的期望是使数据集看起来像这样（只保留第一次观测）。

ID     Date    Weight
1     2/1     160
2     2/1     200

英文:

So I am trying to figure out the best way to deal with this issue. I am working with a longitudinal dataset that has multiple weight measurements for each participant on the same day. What I want to do is to only keep the first observation (measurement) for each participant on that day. I am using R.
This is an example of how the data looks like.

ID     Date    Weight
1     2/1     160
1     2/1     159
1     2/1     160.5
2     2/1     200
2     2/1     198
2     2/1     201

I am not sure how to deal with this yet.
My expectation is to have the dataset look like this (only keeping the first observation)

ID     Date    Weight
1     2/1     160
2     2/1     200

答案1

得分: 1

# 在按 'ID' 和 'Date' 分组后，我们可以使用 `slice_head` 来筛选数据
library(dplyr)
df1 %>%
   group_by(ID, Date) %>%
   slice_head(n = 1) %>%
   ungroup

-output

# 一个 tibble: 2 × 3
     ID Date  Weight
  <int> <chr>  <dbl>
1     1 2/1      160
2     2 2/1      200

或者使用 base R 中的 duplicated

df1[!duplicated(df1[c("ID", "Date")]),]

数据

df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), Date = c("2/1", 
"2/1", "2/1", "2/1", "2/1", "2/1"), Weight = c(160, 159, 160.5, 
200, 198, 201)), class = "data.frame", row.names = c(NA, -6L))

英文:

We can use slice_head after grouping by 'ID' and 'Date'

library(dplyr)
df1 %&gt;%
   group_by(ID, Date) %&gt;%
   slice_head(n = 1) %&gt;%
   ungroup

-output

# A tibble: 2 &#215; 3
     ID Date  Weight
  &lt;int&gt; &lt;chr&gt;  &lt;dbl&gt;
1     1 2/1      160
2     2 2/1      200

Or with duplicated in base R

df1[!duplicated(df1[c(&quot;ID&quot;, &quot;Date&quot;)],]

data

df1 &lt;- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), Date = c(&quot;2/1&quot;, 
&quot;2/1&quot;, &quot;2/1&quot;, &quot;2/1&quot;, &quot;2/1&quot;, &quot;2/1&quot;), Weight = c(160, 159, 160.5, 
200, 198, 201)), class = &quot;data.frame&quot;, row.names = c(NA, -6L))

答案2

得分: 1

如果您喜欢data.table（特别适用于大型数据集），您可以使用以下代码：

library(data.table)
df1 <- as.data.table(df1)
df1[ , rowNum := seq_len(.N),  by = .(ID, Date)]
df1 <- df1[rowNum == 1]

英文:

If you like data.table (especially fast for large data sets) you could go with:

library(data.table)
df1 &lt;- as.data.table(df1)
df1[ , rowNum := seq_len(.N),  by = .(ID, Date)]
df1 &lt;- df1[rowNum == 1]

答案3

得分: 1

另一种方法是使用 filter 结合 row_number()：

library(dplyr)
df1 %>%
  group_by(ID) %>%
  filter(row_number() == 1) %>%
  ungroup

     ID Date  Weight
  <int> <chr>  <dbl>
1     1 2/1      160
2     2 2/1      200

英文:

Another way using filter combined with row_number():

library(dplyr)
df1 %&gt;%
  group_by(ID) %&gt;%
  filter(row_number() == 1) %&gt;%
  ungroup

     ID Date  Weight
  &lt;int&gt; &lt;chr&gt;  &lt;dbl&gt;
1     1 2/1      160
2     2 2/1      200

答案4

得分: 0

如果您只想保留第一个测量值，请使用否定的 duplicated。

dat[!duplicated(dat$ID), ]
#   ID Date Weight
# 1  1  2/1    160
# 4  2  2/1    200

数据：

dat <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), Date = c("2/1", 
"2/1", "2/1", "2/1", "2/1", "2/1"), Weight = c(160, 159, 160.5, 
200, 198, 201)), class = "data.frame", row.names = c(NA, -6L))

英文:

If you simply want to keep only the first measurement, use negated duplicated.

dat[!duplicated(dat$ID), ]
#   ID Date Weight
# 1  1  2/1    160
# 4  2  2/1    200

Data:

dat &lt;- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), Date = c(&quot;2/1&quot;, 
&quot;2/1&quot;, &quot;2/1&quot;, &quot;2/1&quot;, &quot;2/1&quot;, &quot;2/1&quot;), Weight = c(160, 159, 160.5, 
200, 198, 201)), class = &quot;data.frame&quot;, row.names = c(NA, -6L))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中为具有多个测量的参与者保留一个体重测量

问题

答案1

数据

data

答案2

答案3

答案4

移除数据框中行的特殊字符。

R/Shiny – 带有复选框和总计数的可滚动数据表格

在R中为Distill和/或Quarto网站创建用户/密码登录。

基于另一列删除数值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论