2023年2月24日 05:19:30go评论92阅读模式

英文:

Create a table of percentages based on the values in two columns

问题

以下是翻译好的部分：

原始数据框如下所示。1表示是，0表示否

ever_eaten_banana <- c(1,1,0,0)
allergic_banana <- c(1,0,1,0)
ever_eaten_shellfish <- c(0,1,1,1)
allergic_shellfish <- c(0,1,0,0)
df <- data.frame(ever_eaten_banana,allergic_banana,ever_eaten_shellfish,allergic_shellfish)
df
   ever_eaten_banana allergic_banana ever_eaten_shellfish allergic_shellfish
1                 1               1                    0                  0
2                 1               0                    1                  1
3                 0               1                    1                  0
4                 0               0                    1                  0

我的目标是创建一个表格，显示尝试过每种食物的人中，对该食物过敏的百分比，不包括那些声称对该食物过敏但从未尝试过的人。

因此，对于香蕉，只有1个尝试过香蕉的人对它过敏，而有2个尝试过香蕉的人，所以我要找的是50%。

我的目标是创建一个表格，如下所示

食物	百分比
香蕉	50%
贝壳类食物	33%

我试图尝试创建一些函数，但始终没有取得任何进展，因此非常感谢任何帮助。

英文:

I am very lost here, so I apologize if this makes no sense

My original data frame looks like this. 1 indicates yes and 0 indicates no

ever_eaten_banana &lt;- c(1,1,0,0)
allergic_banana &lt;- c(1,0,1,0)
ever_eaten_shellfish &lt;- c(0,1,1,1)
allergic_shellfish &lt;- c(0,1,0,0)
df &lt;- data.frame(ever_eaten_banana,allergic_banana,ever_eaten_shellfish,allergic_shellfish)
df
   ever_eaten_banana allergic_banana ever_eaten_shellfish allergic_shellfish
1                 1               1                    0                  0
2                 1               0                    1                  1
3                 0               1                    1                  0
4                 0               0                    1                  0
&gt;

My goal is a table that shows the percentage of people who have ever eaten each food who are also allergic to said food, excluding anyone who says they're allergic without ever eating the food.

So for bananas there is only 1 person who has tried bananas who is also allergic to them and 2 people who have tried bananas so I am looking for 50%.

My goal is a table that looks like this

Food	Percent
Bananas	50%
Shellfish	33%

I fiddled around with trying to make a handful of functions but never got anywhere close so any help would be greatly appreciated.

答案1

得分: 2

在 `tidyverse` 中的一个选项 - 将数据重塑为 '长' 格式，将所有没有同时具有 ever_eaten 和 allergic 值为1的情况下的 'allergic' 值替换为0，并根据 'Food' 对 'allergic' 与 'ever_eaten' 进行分组，获取 'allergic' 的比例。

library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = everything(), names_to = c("value", "Food"),
names_pattern = "(.)([^]+)$") %>%
mutate(allergic =+(allergic & ever_eaten)) %>%
reframe(Percent = round(100 sum(allergic)/sum(ever_eaten)), .by = 'Food')


-output

A tibble: 2 × 2

Food Percent

1 banana 50
2 shellfish 33

英文:

One option in tidyverse - reshape to 'long' format, replace the 'allergic' to 0 on all cases that are not having both ever_eaten and allergic as 1 and get the proportions of allergic with ever_eaten grouped by 'Food'

library(dplyr)
library(tidyr)
df %&gt;%
   pivot_longer(cols = everything(), names_to = c(&quot;.value&quot;, &quot;Food&quot;), 
      names_pattern = &quot;(.*)_([^_]+)$&quot;) %&gt;%
   mutate(allergic =+(allergic &amp; ever_eaten)) %&gt;% 
   reframe(Percent = round(100* sum(allergic)/sum(ever_eaten)), .by = &#39;Food&#39;)

-output

# A tibble: 2 &#215; 2
  Food      Percent
  &lt;chr&gt;       &lt;dbl&gt;
1 banana         50
2 shellfish      33

答案2

得分: 2

使用基础的 table 和 reshape：

dflong <- reshape(
    setNames(df, gsub("ever_eaten", "evereaten", names(df))),
    varying=TRUE, timevar="food", sep="_", direction="long"
)
with(dflong[dflong$evereaten == 1,], prop.table(table(food, allergic),1) )[, 2, drop=FALSE]
#           allergic
#food                1
#  banana    0.5000000
#  shellfish 0.3333333

英文:

Using base table and reshape:

dflong &lt;- reshape(
    setNames(df, gsub(&quot;ever_eaten&quot;, &quot;evereaten&quot;, names(df))),
    varying=TRUE, timevar=&quot;food&quot;, sep=&quot;_&quot;, direction=&quot;long&quot;
)
with(dflong[dflong$evereaten == 1,], prop.table(table(food, allergic),1) )[, 2, drop=FALSE]
#           allergic
#food                1
#  banana    0.5000000
#  shellfish 0.3333333

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建一个基于两列数值的百分比表格。

问题

答案1

A tibble: 2 × 2

答案2

如何同时改变多个因素的水平

创建成对的分组并保留分组ID。

Polars相对于{data.table}的内存使用情况

如何使geom_function与两步函数配合使用？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。