英文:
Create a table of percentages based on the values in two columns
问题
以下是翻译好的部分:
原始数据框如下所示。1表示是,0表示否
ever_eaten_banana <- c(1,1,0,0)
allergic_banana <- c(1,0,1,0)
ever_eaten_shellfish <- c(0,1,1,1)
allergic_shellfish <- c(0,1,0,0)
df <- data.frame(ever_eaten_banana,allergic_banana,ever_eaten_shellfish,allergic_shellfish)
df
ever_eaten_banana allergic_banana ever_eaten_shellfish allergic_shellfish
1 1 1 0 0
2 1 0 1 1
3 0 1 1 0
4 0 0 1 0
我的目标是创建一个表格,显示尝试过每种食物的人中,对该食物过敏的百分比,不包括那些声称对该食物过敏但从未尝试过的人。
因此,对于香蕉,只有1个尝试过香蕉的人对它过敏,而有2个尝试过香蕉的人,所以我要找的是50%。
我的目标是创建一个表格,如下所示
食物 | 百分比 |
---|---|
香蕉 | 50% |
贝壳类食物 | 33% |
我试图尝试创建一些函数,但始终没有取得任何进展,因此非常感谢任何帮助。
英文:
I am very lost here, so I apologize if this makes no sense
My original data frame looks like this. 1 indicates yes and 0 indicates no
ever_eaten_banana <- c(1,1,0,0)
allergic_banana <- c(1,0,1,0)
ever_eaten_shellfish <- c(0,1,1,1)
allergic_shellfish <- c(0,1,0,0)
df <- data.frame(ever_eaten_banana,allergic_banana,ever_eaten_shellfish,allergic_shellfish)
df
ever_eaten_banana allergic_banana ever_eaten_shellfish allergic_shellfish
1 1 1 0 0
2 1 0 1 1
3 0 1 1 0
4 0 0 1 0
>
My goal is a table that shows the percentage of people who have ever eaten each food who are also allergic to said food, excluding anyone who says they're allergic without ever eating the food.
So for bananas there is only 1 person who has tried bananas who is also allergic to them and 2 people who have tried bananas so I am looking for 50%.
My goal is a table that looks like this
Food | Percent |
---|---|
Bananas | 50% |
Shellfish | 33% |
I fiddled around with trying to make a handful of functions but never got anywhere close so any help would be greatly appreciated.
答案1
得分: 2
在 `tidyverse` 中的一个选项 - 将数据重塑为 '长' 格式,将所有没有同时具有 ever_eaten 和 allergic 值为1的情况下的 'allergic' 值替换为0,并根据 'Food' 对 'allergic' 与 'ever_eaten' 进行分组,获取 'allergic' 的比例。
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = everything(), names_to = c("value", "Food"),
names_pattern = "(.)([^]+)$") %>%
mutate(allergic =+(allergic & ever_eaten)) %>%
reframe(Percent = round(100 sum(allergic)/sum(ever_eaten)), .by = 'Food')
-output
A tibble: 2 × 2
Food Percent
1 banana 50
2 shellfish 33
英文:
One option in tidyverse
- reshape to 'long' format, replace
the 'allergic' to 0 on all cases that are not having both ever_eaten and allergic as 1 and get the proportions of allergic with ever_eaten grouped by 'Food'
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = everything(), names_to = c(".value", "Food"),
names_pattern = "(.*)_([^_]+)$") %>%
mutate(allergic =+(allergic & ever_eaten)) %>%
reframe(Percent = round(100* sum(allergic)/sum(ever_eaten)), .by = 'Food')
-output
# A tibble: 2 × 2
Food Percent
<chr> <dbl>
1 banana 50
2 shellfish 33
答案2
得分: 2
使用基础的 table
和 reshape
:
dflong <- reshape(
setNames(df, gsub("ever_eaten", "evereaten", names(df))),
varying=TRUE, timevar="food", sep="_", direction="long"
)
with(dflong[dflong$evereaten == 1,], prop.table(table(food, allergic),1) )[, 2, drop=FALSE]
# allergic
#food 1
# banana 0.5000000
# shellfish 0.3333333
英文:
Using base table
and reshape
:
dflong <- reshape(
setNames(df, gsub("ever_eaten", "evereaten", names(df))),
varying=TRUE, timevar="food", sep="_", direction="long"
)
with(dflong[dflong$evereaten == 1,], prop.table(table(food, allergic),1) )[, 2, drop=FALSE]
# allergic
#food 1
# banana 0.5000000
# shellfish 0.3333333
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论