过滤纵向数据,按最后记录的事件,计算符合条件的行数(并存储该值)。

huangapple go评论86阅读模式
英文:

Filtering longitudinal data by last recorded event, and counting number of rows that meet criteria (and storing that value)

问题

Here's the translated code portion:

我有一些(医疗)纵向数据,用于检测(或未检测到)支气管扩张症。我希望统计最新随访/检查时有无支气管扩张症的参与者人数。然后,我希望存储有支气管扩张症和没有支气管扩张症的患者人数。这样,我可以轻松地将这个数字/值插入到医学报告中(我还有其他需要这种方法的结果)。

我正在寻找是否有更简单/更准确的方法来完成我正在做的事情。

示例数据框:

| id | 随访日期 | 支气管扩张症 |
| --- | --- | --- |
| 1 | 20220101||
| 1 | 20220112||
| 2 | 20220207||
| 2 | 20230506||
| 3 | 20210105||
| 3 | 20220312||

这是我的当前代码:

```R
df1 <- df %>%
    group_by(id) %>%
    filter(date_of_followup == max(date_of_followup)) %>%
    filter(bronchiectasis == "是")

bronchiectasisyes <- sum(df1$bronchiectasis == "是")

df2 <- df %>%
    group_by(id) %>%
    filter(date_of_followup == max(date_of_followup)) %>%
    filter(bronchiectasis == "否")

bronchiectasisno <- sum(df2$bronchiectasis == "否")

这段代码可以获得我想要的结果,但可能有点混乱,我怀疑是否有更优雅/更健壮的方法。


<details>
<summary>英文:</summary>

I have some (medical) longitudinal data on the detection (or not) of bronchiectasis.  I wish to count the number of participants with and without bronchiectasis at their latest follow up/check.  I then wish to store the number of patients who do have bronchiectasis and the number who don&#39;t.  This is so I can then insert that number/value into a qmd report easily.  (I have other outcomes where I need this method).

Is there a simpler/more accurate method of what I&#39;m doing

Example df

| id | date_of_followup | bronchiectasis |
| --- | --- | --- |
| 1 | 01/01/2022 | N |
| 1 | 01/12/2022 | Y |
| 2 | 02/07/2022 | Y |
| 2 | 5/6/2023 | Y |
| 3 | 01/05/2021 | N |
| 3 | 3/12/2022 | N |

This is my current code


df1 <- df %>%
group_by(id) %>%
filter(date_of_followup == max(date_of_followup )) %>%
filter(bronchiectasis=="Y")

bronchiectasisyes <- sum(df1$bronchiectasis == "Y")

df2 <- df%>%
group_by(id) %>%
filter(date_of_followup == max(date_of_followup )) %>%
filter(bronchiectasis=="N")

bronchiectasisno <- sum(df2$bronchiectasis == "N")

This gets me the desired outcome, but perhaps is a bit messy and I suspect there is a more elegant/robust method


</details>


# 答案1
**得分**: 0

以下是翻译好的部分:

修复日期,按最大id筛选,然后进行汇总:

```r
library(dplyr)
df %>%
  mutate(date_of_followup = as.Date(date_of_followup, format="%m/%d/%Y")) %>%
  slice_max(date_of_followup, by = id) %>%
  count(bronchiectasis)
#   bronchiectasis n
# 1              N 1
# 2              Y 2

要确认这是否有效,请查看中间步骤:

df %>%
  mutate(date_of_followup = as.Date(date_of_followup, format="%m/%d/%Y")) %>%
  slice_max(date_of_followup, by = id)
#   id date_of_followup bronchiectasis
# 1  1       2022-01-12              Y
# 2  2       2023-05-06              Y
# 3  3       2022-03-12              N

(我假设您的日期使用m/d/Y格式...如有需要请进行修复。)

英文:

Fixing the date, filtering for max-per-id, then summarizing:

library(dplyr)
df %&gt;%
  mutate(date_of_followup = as.Date(date_of_followup, format=&quot;%m/%d/%Y&quot;)) %&gt;%
  slice_max(date_of_followup, by = id) %&gt;%
  count(bronchiectasis)
#   bronchiectasis n
# 1              N 1
# 2              Y 2

To confirm this is working, see the interim step:

df %&gt;%
  mutate(date_of_followup = as.Date(date_of_followup, format=&quot;%m/%d/%Y&quot;)) %&gt;%
  slice_max(date_of_followup, by = id)
#   id date_of_followup bronchiectasis
# 1  1       2022-01-12              Y
# 2  2       2023-05-06              Y
# 3  3       2022-03-12              N

(I'm assuming m/d/Y format for your dates ... fix as appropriate.)

huangapple
  • 本文由 发表于 2023年8月4日 22:14:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76836731.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定