过滤纵向数据,按最后记录的事件,计算符合条件的行数(并存储该值)。

huangapple go评论124阅读模式
英文:

Filtering longitudinal data by last recorded event, and counting number of rows that meet criteria (and storing that value)

问题

Here's the translated code portion:

  1. 我有一些(医疗)纵向数据,用于检测(或未检测到)支气管扩张症。我希望统计最新随访/检查时有无支气管扩张症的参与者人数。然后,我希望存储有支气管扩张症和没有支气管扩张症的患者人数。这样,我可以轻松地将这个数字/值插入到医学报告中(我还有其他需要这种方法的结果)。
  2. 我正在寻找是否有更简单/更准确的方法来完成我正在做的事情。
  3. 示例数据框:
  4. | id | 随访日期 | 支气管扩张症 |
  5. | --- | --- | --- |
  6. | 1 | 20220101 | |
  7. | 1 | 20220112 | |
  8. | 2 | 20220207 | |
  9. | 2 | 20230506 | |
  10. | 3 | 20210105 | |
  11. | 3 | 20220312 | |
  12. 这是我的当前代码:
  13. ```R
  14. df1 <- df %>%
  15. group_by(id) %>%
  16. filter(date_of_followup == max(date_of_followup)) %>%
  17. filter(bronchiectasis == "是")
  18. bronchiectasisyes <- sum(df1$bronchiectasis == "是")
  19. df2 <- df %>%
  20. group_by(id) %>%
  21. filter(date_of_followup == max(date_of_followup)) %>%
  22. filter(bronchiectasis == "否")
  23. bronchiectasisno <- sum(df2$bronchiectasis == "否")

这段代码可以获得我想要的结果,但可能有点混乱,我怀疑是否有更优雅/更健壮的方法。

  1. <details>
  2. <summary>英文:</summary>
  3. I have some (medical) longitudinal data on the detection (or not) of bronchiectasis. I wish to count the number of participants with and without bronchiectasis at their latest follow up/check. I then wish to store the number of patients who do have bronchiectasis and the number who don&#39;t. This is so I can then insert that number/value into a qmd report easily. (I have other outcomes where I need this method).
  4. Is there a simpler/more accurate method of what I&#39;m doing
  5. Example df
  6. | id | date_of_followup | bronchiectasis |
  7. | --- | --- | --- |
  8. | 1 | 01/01/2022 | N |
  9. | 1 | 01/12/2022 | Y |
  10. | 2 | 02/07/2022 | Y |
  11. | 2 | 5/6/2023 | Y |
  12. | 3 | 01/05/2021 | N |
  13. | 3 | 3/12/2022 | N |
  14. This is my current code

df1 <- df %>%
group_by(id) %>%
filter(date_of_followup == max(date_of_followup )) %>%
filter(bronchiectasis=="Y")

bronchiectasisyes <- sum(df1$bronchiectasis == "Y")

df2 <- df%>%
group_by(id) %>%
filter(date_of_followup == max(date_of_followup )) %>%
filter(bronchiectasis=="N")

bronchiectasisno <- sum(df2$bronchiectasis == "N")

  1. This gets me the desired outcome, but perhaps is a bit messy and I suspect there is a more elegant/robust method
  2. </details>
  3. # 答案1
  4. **得分**: 0
  5. 以下是翻译好的部分:
  6. 修复日期,按最大id筛选,然后进行汇总:
  7. ```r
  8. library(dplyr)
  9. df %>%
  10. mutate(date_of_followup = as.Date(date_of_followup, format="%m/%d/%Y")) %>%
  11. slice_max(date_of_followup, by = id) %>%
  12. count(bronchiectasis)
  13. # bronchiectasis n
  14. # 1 N 1
  15. # 2 Y 2

要确认这是否有效,请查看中间步骤:

  1. df %>%
  2. mutate(date_of_followup = as.Date(date_of_followup, format="%m/%d/%Y")) %>%
  3. slice_max(date_of_followup, by = id)
  4. # id date_of_followup bronchiectasis
  5. # 1 1 2022-01-12 Y
  6. # 2 2 2023-05-06 Y
  7. # 3 3 2022-03-12 N

(我假设您的日期使用m/d/Y格式...如有需要请进行修复。)

英文:

Fixing the date, filtering for max-per-id, then summarizing:

  1. library(dplyr)
  2. df %&gt;%
  3. mutate(date_of_followup = as.Date(date_of_followup, format=&quot;%m/%d/%Y&quot;)) %&gt;%
  4. slice_max(date_of_followup, by = id) %&gt;%
  5. count(bronchiectasis)
  6. # bronchiectasis n
  7. # 1 N 1
  8. # 2 Y 2

To confirm this is working, see the interim step:

  1. df %&gt;%
  2. mutate(date_of_followup = as.Date(date_of_followup, format=&quot;%m/%d/%Y&quot;)) %&gt;%
  3. slice_max(date_of_followup, by = id)
  4. # id date_of_followup bronchiectasis
  5. # 1 1 2022-01-12 Y
  6. # 2 2 2023-05-06 Y
  7. # 3 3 2022-03-12 N

(I'm assuming m/d/Y format for your dates ... fix as appropriate.)

huangapple
  • 本文由 发表于 2023年8月4日 22:14:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76836731.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定