从数据框中获取统计信息。

huangapple go评论63阅读模式
英文:

How to get stats from a dataframe?

问题

我有点生疏R语言,如果你能帮助我,我会很感激。

我有一个数据框,我需要从中获取一些统计信息。这个数据框(以更简单的方式)看起来像这样:

df <- data.frame(tech=c("Leonardo", "Leonardo", "Leonardo", "John", "John", "John", "Will", "Will", "Will", "Bob"),
         type=c("V", "P", "V", "V", "P", "V", "V", "P", "V", "V"),
         breed=c("A", "A", "A", "B", "B", "B", "C", "C", "A", "B"),
         central=c("J", "J", "K", "J", "K", "J", "K", "K", "K", "J"))

我需要在另一个数据框中获取每位技术人员按"type"的百分比。然后,另一个数据框包含每位技术人员按品种("breed")的"type"的百分比。(例如:如果一位技术人员只有一个类型(要么V,要么P),则反映为100%)

我看过其他主题,其中用户想要获取类似的信息(在R中按子组百分比汇总),但他们有一个数值值来获取这个百分比。在我的情况下,我有V或P。我想象思维方式是相同的,但我尝试了其他帖子上建议的解决方案,对我的情况不起作用。

有没有一种简单的方法可以做到这一点?我感谢你的帮助。

英文:

I am a little bit rusty in R, and appreciate if you could help me.

I have a dataframe that I need to get some stats from it. The data frame (in a simpler way) looks like this:

df &lt;- data.frame(tech=c(&quot;Leonardo&quot;, &quot;Leonardo&quot;, &quot;Leonardo&quot;, &quot;John&quot;, &quot;John&quot;, &quot;John&quot;, &quot;Will&quot;, &quot;Will&quot;, &quot;Will&quot;, &quot;Bob&quot;),
         type=c(&quot;V&quot;, &quot;P&quot;, &quot;V&quot;, &quot;V&quot;, &quot;P&quot;, &quot;V&quot;, &quot;V&quot;, &quot;P&quot;, &quot;V&quot;, &quot;V&quot;),
         breed=c(&quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;, &quot;B&quot;, &quot;C&quot;, &quot;C&quot;, &quot;A&quot;, &quot;B&quot;),
         central=c(&quot;J&quot;, &quot;J&quot;, &quot;K&quot;, &quot;J&quot;, &quot;K&quot;, &quot;J&quot;, &quot;K&quot;, &quot;K&quot;, &quot;K&quot;, &quot;J&quot;)

I need to get the percentage of "type" by each technician in another dataframe. And then other daframe containg the percentage of "type" by each technician by breed. (ex: If a technician has only one type (either V or P), it would reflect in 100%)

I have seen other topics where the user wanted to get similar information (Summarizing by subgroup percentage in R) but they had a numerical value to get this percentage. In my case, I have V or P. I imagine it's the same way of thinking but I tried the suggested solution on the other post and it's not working in my case.

Is there a simple way of doing this? I appreciate your help

答案1

得分: 1

如果我理解你的问题正确,这是我使用dplyr来执行的方法:

df %>%
dplyr::group_by(tech, breed) %>%
dplyr::summarize(pct_v = sum(type == "V")/dplyr::n() * 100, 
                 pct_p = sum(type == "P")/dplyr::n() * 100)
英文:

If I'm understanding your question right, this is how I'd do it using dplyr:

df |&gt; 
dplyr::group_by(tech, breed) |&gt;
dplyr::summarize(pct_v = sum(type == &quot;V&quot;)/dplyr::n() * 100, 
                 pct_p = sum(type == &quot;P&quot;)/dplyr::n() * 100)

huangapple
  • 本文由 发表于 2023年5月23日 00:54:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76308396.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定