2023年8月8日 22:54:33go评论88阅读模式

英文:

Count valid/non-NA observations grouped by variable

问题

我已经谷歌了大约两个小时，试图找到一个简单问题的解决方案，但是我一无所获。我甚至找不到正确的函数来实现我想要的结果，所以我不得不求助，尽管这似乎是一个非常基本的问题。

我有一个跨越多个国家的调查，我正在创建一个包含从数据中派生的统计数据的数据框（例如给定变量的国家均值）。

假设数据框 df 看起来像这样：

country <- c(1,1,1,1,1,1,2,2,2,2,2,2)
var <- c(1,NA,2,1,2,2,3,3,1,3,4,NA)
df <- cbind.data.frame(country, var)

我可以很容易地计算名义 n 的数量：

df %>% group_by(country) %>% summarize(n=n())

但是如何计算变量 var 上的有效观测值的数量呢？

英文:

I have been googling for about two hours now trying to find the solution to a simple problem, but I am not getting anywhere. I am not even finding the right function I could use to get where I want to get, so I have to resort to asking for help even though it seems a very basic question.

I have a survey spanning various countries, and I am in the process of creating a dataframe with statistics derived from the data (such as national mean on a given variable).

Let's say the dataframe df looks like this:

country &lt;- c(1,1,1,1,1,1,2,2,2,2,2,2)
var &lt;- c(1,NA,2,1,2,2,3,3,1,3,4,NA)
df &lt;- cbind.data.frame(country, var)

I can easily count the nominal n's:

df %&gt;% group_by(country) %&gt;% summarize(n=n())

But how do I count valid observations on the variable var?

答案1

得分: 2

任何一个都可以。group_by()/summarise()/n()这三个函数调用可以被一个函数count()替代。

country <- c(1,1,1,1,1,1,2,2,2,2,2,2)
var <- c(1,NA,2,1,2,2,3,3,1,3,4,NA)
df <- cbind.data.frame(country, var)
suppressPackageStartupMessages(
  library(dplyr)
)
df %>% 
  na.exclude() %>%
  count(country)
#>   country n
#> 1       1 5
#> 2       2 5
df %>% 
  na.omit() %>%
  count(country) 
#>   country n
#> 1       1 5
#> 2       2 5
df %>% 
  tidyr::drop_na() %>%
  count(country)
#>   country n
#> 1       1 5
#> 2       2 5

^{创建于2023-08-08，使用 reprex v2.0.2}

英文:

Any of these will do it.
Three function calls, group_by()/summarise()/n() can be replaced by one only, count().

country &lt;- c(1,1,1,1,1,1,2,2,2,2,2,2)
var &lt;- c(1,NA,2,1,2,2,3,3,1,3,4,NA)
df &lt;- cbind.data.frame(country, var)
suppressPackageStartupMessages(
  library(dplyr)
)
df %&gt;% 
  na.exclude() %&gt;%
  count(country)
#&gt;   country n
#&gt; 1       1 5
#&gt; 2       2 5
df %&gt;% 
  na.omit() %&gt;%
  count(country) 
#&gt;   country n
#&gt; 1       1 5
#&gt; 2       2 5
df %&gt;% 
  tidyr::drop_na() %&gt;%
  count(country)
#&gt;   country n
#&gt; 1       1 5
#&gt; 2       2 5

<sup>Created on 2023-08-08 with reprex v2.0.2</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

按变量分组计算有效/非NA观测值的数量。

问题

答案1

可以在ggplot geom_density_ridges中可视化群组吗？

如何在LAScatalog内修改建筑类的高度值

Function conv_units() not working inside an ifelse() statement

在数据框中筛选符合多个条件的行。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。