英文:
Filtering longitudinal data by last recorded event, and counting number of rows that meet criteria (and storing that value)
问题
Here's the translated code portion:
我有一些(医疗)纵向数据,用于检测(或未检测到)支气管扩张症。我希望统计最新随访/检查时有无支气管扩张症的参与者人数。然后,我希望存储有支气管扩张症和没有支气管扩张症的患者人数。这样,我可以轻松地将这个数字/值插入到医学报告中(我还有其他需要这种方法的结果)。
我正在寻找是否有更简单/更准确的方法来完成我正在做的事情。
示例数据框:
| id | 随访日期 | 支气管扩张症 |
| --- | --- | --- |
| 1 | 2022年01月01日 | 否 |
| 1 | 2022年01月12日 | 是 |
| 2 | 2022年02月07日 | 是 |
| 2 | 2023年05月06日 | 是 |
| 3 | 2021年01月05日 | 否 |
| 3 | 2022年03月12日 | 否 |
这是我的当前代码:
```R
df1 <- df %>%
group_by(id) %>%
filter(date_of_followup == max(date_of_followup)) %>%
filter(bronchiectasis == "是")
bronchiectasisyes <- sum(df1$bronchiectasis == "是")
df2 <- df %>%
group_by(id) %>%
filter(date_of_followup == max(date_of_followup)) %>%
filter(bronchiectasis == "否")
bronchiectasisno <- sum(df2$bronchiectasis == "否")
这段代码可以获得我想要的结果,但可能有点混乱,我怀疑是否有更优雅/更健壮的方法。
<details>
<summary>英文:</summary>
I have some (medical) longitudinal data on the detection (or not) of bronchiectasis. I wish to count the number of participants with and without bronchiectasis at their latest follow up/check. I then wish to store the number of patients who do have bronchiectasis and the number who don't. This is so I can then insert that number/value into a qmd report easily. (I have other outcomes where I need this method).
Is there a simpler/more accurate method of what I'm doing
Example df
| id | date_of_followup | bronchiectasis |
| --- | --- | --- |
| 1 | 01/01/2022 | N |
| 1 | 01/12/2022 | Y |
| 2 | 02/07/2022 | Y |
| 2 | 5/6/2023 | Y |
| 3 | 01/05/2021 | N |
| 3 | 3/12/2022 | N |
This is my current code
df1 <- df %>%
group_by(id) %>%
filter(date_of_followup == max(date_of_followup )) %>%
filter(bronchiectasis=="Y")
bronchiectasisyes <- sum(df1$bronchiectasis == "Y")
df2 <- df%>%
group_by(id) %>%
filter(date_of_followup == max(date_of_followup )) %>%
filter(bronchiectasis=="N")
bronchiectasisno <- sum(df2$bronchiectasis == "N")
This gets me the desired outcome, but perhaps is a bit messy and I suspect there is a more elegant/robust method
</details>
# 答案1
**得分**: 0
以下是翻译好的部分:
修复日期,按最大id筛选,然后进行汇总:
```r
library(dplyr)
df %>%
mutate(date_of_followup = as.Date(date_of_followup, format="%m/%d/%Y")) %>%
slice_max(date_of_followup, by = id) %>%
count(bronchiectasis)
# bronchiectasis n
# 1 N 1
# 2 Y 2
要确认这是否有效,请查看中间步骤:
df %>%
mutate(date_of_followup = as.Date(date_of_followup, format="%m/%d/%Y")) %>%
slice_max(date_of_followup, by = id)
# id date_of_followup bronchiectasis
# 1 1 2022-01-12 Y
# 2 2 2023-05-06 Y
# 3 3 2022-03-12 N
(我假设您的日期使用m/d/Y格式...如有需要请进行修复。)
英文:
Fixing the date, filtering for max-per-id, then summarizing:
library(dplyr)
df %>%
mutate(date_of_followup = as.Date(date_of_followup, format="%m/%d/%Y")) %>%
slice_max(date_of_followup, by = id) %>%
count(bronchiectasis)
# bronchiectasis n
# 1 N 1
# 2 Y 2
To confirm this is working, see the interim step:
df %>%
mutate(date_of_followup = as.Date(date_of_followup, format="%m/%d/%Y")) %>%
slice_max(date_of_followup, by = id)
# id date_of_followup bronchiectasis
# 1 1 2022-01-12 Y
# 2 2 2023-05-06 Y
# 3 3 2022-03-12 N
(I'm assuming m/d/Y format for your dates ... fix as appropriate.)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论