英文:
How to get stats from a dataframe?
问题
我有点生疏R语言,如果你能帮助我,我会很感激。
我有一个数据框,我需要从中获取一些统计信息。这个数据框(以更简单的方式)看起来像这样:
df <- data.frame(tech=c("Leonardo", "Leonardo", "Leonardo", "John", "John", "John", "Will", "Will", "Will", "Bob"),
type=c("V", "P", "V", "V", "P", "V", "V", "P", "V", "V"),
breed=c("A", "A", "A", "B", "B", "B", "C", "C", "A", "B"),
central=c("J", "J", "K", "J", "K", "J", "K", "K", "K", "J"))
我需要在另一个数据框中获取每位技术人员按"type"的百分比。然后,另一个数据框包含每位技术人员按品种("breed")的"type"的百分比。(例如:如果一位技术人员只有一个类型(要么V,要么P),则反映为100%)
我看过其他主题,其中用户想要获取类似的信息(在R中按子组百分比汇总),但他们有一个数值值来获取这个百分比。在我的情况下,我有V或P。我想象思维方式是相同的,但我尝试了其他帖子上建议的解决方案,对我的情况不起作用。
有没有一种简单的方法可以做到这一点?我感谢你的帮助。
英文:
I am a little bit rusty in R, and appreciate if you could help me.
I have a dataframe that I need to get some stats from it. The data frame (in a simpler way) looks like this:
df <- data.frame(tech=c("Leonardo", "Leonardo", "Leonardo", "John", "John", "John", "Will", "Will", "Will", "Bob"),
type=c("V", "P", "V", "V", "P", "V", "V", "P", "V", "V"),
breed=c("A", "A", "A", "B", "B", "B", "C", "C", "A", "B"),
central=c("J", "J", "K", "J", "K", "J", "K", "K", "K", "J")
I need to get the percentage of "type" by each technician in another dataframe. And then other daframe containg the percentage of "type" by each technician by breed. (ex: If a technician has only one type (either V or P), it would reflect in 100%)
I have seen other topics where the user wanted to get similar information (Summarizing by subgroup percentage in R) but they had a numerical value to get this percentage. In my case, I have V or P. I imagine it's the same way of thinking but I tried the suggested solution on the other post and it's not working in my case.
Is there a simple way of doing this? I appreciate your help
答案1
得分: 1
如果我理解你的问题正确,这是我使用dplyr
来执行的方法:
df %>%
dplyr::group_by(tech, breed) %>%
dplyr::summarize(pct_v = sum(type == "V")/dplyr::n() * 100,
pct_p = sum(type == "P")/dplyr::n() * 100)
英文:
If I'm understanding your question right, this is how I'd do it using dplyr
:
df |>
dplyr::group_by(tech, breed) |>
dplyr::summarize(pct_v = sum(type == "V")/dplyr::n() * 100,
pct_p = sum(type == "P")/dplyr::n() * 100)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论