2023年2月19日 01:25:23go评论109阅读模式

英文:

Apply filter to the table function

问题

我正在寻找一种比我当前能够执行的更快的方式来执行一个简单的任务。

我想在R中使用table函数处理数据框的一部分。当然，可以先使用subset，然后再使用table，但这有点繁琐。在我的情况下，在对数据进行首次检查时，我想要检查多国调查中25个参与国家的各个变量中NA的频率。所以我需要创建25个子集，制作表格，然后删除这些子集，因为我不再需要它们。

这是一些示例数据：

a <- c(1,1,1,1,1,2,2,2,2,2)
b <- c(1,3,99,99,2,3,2,99,1,1)
df <- cbind.data.frame(a,b)

这是解决方法的变通方案。

df1 <- subset(df, a == 1)
table(df1$b)
df2 <- subset(df, a == 2)
table(df2$b)
rm(df1, df2)

是否有更简单的方法？

此外，我觉得我在问这些非常基础的问题时感到有点烦人。如果有人有建议，告诉我如何直接找到答案，我会很高兴听到。除了自己尝试一些代码之外，我还搜索了诸如“r apply filter to table”，“r filter table function”，“r table subset dataframe”等词汇。

英文:

I'm looking for a way to execute a simple task faster than I am currently able to.

I want to use the table function in R on part of a dataframe. Of course it would be possible to first use subset and then table, but this is a bit tedious. (In my case, during a first inspection of the data, I want to check the frequency of NAs on individual variables in a multi-national survey for each of the 25 participating countries. So I'd need to create 25 subsets, make the table, and then remove the subsets again because I don't need them anymore.)

Here is some example data:

a &lt;- c(1,1,1,1,1,2,2,2,2,2)
b &lt;- c(1,3,99,99,2,3,2,99,1,1)
df &lt;- cbind.data.frame(a,b)

And this is the workaround solution.

df1 &lt;- subset(df, a == 1)
table(df1$b)
df2 &lt;- subset(df, a == 2)
table(df2$b)
rm(df1, df2)

Is there a simpler way?

Also, I feel like I am spamming with ultra-basic questions like these. If anyone has a suggestion on how I could have found the answer directly I'd be happy to hear it. Other than trying some code myself, I googled terms like 'r apply filter to table', 'r filter table function', 'r table subset dataframe', etc.

答案1

得分: 2

假设99是你的NA，那么可以使用purrr包的方法来查看每列中有多少个NA:

library(purrr)
df %>%
  map_df(~sum(. == 99))

      a     b
  <int> <int>
1     0     3

英文:

Assuming 99 are your NAs then there is a way using purrr package, which I find is excellent to see how many NAs there are in each column:

library(purrr)
df |&gt; 
  map_df(~sum(. == 99))

      a     b
  &lt;int&gt; &lt;int&gt;
1     0     3

答案2

得分: 2

你能提供原始数据结构（跨国调查）的示例吗？

可能您可以使用dplyr包和诸如以下函数更整洁的代码来回答您的问题：

survey_data %>%
  select(column1, column2, country, etc) %>%  #选择您需要的列
  group_by(country) %>%
  summarise_all(funs(sum(is.na(.))))

英文:

Can you provide an example of the structure of the original data (multi-national survey)?

Probably you would be able to answer your question with a much tidier code using the package dplyr with functions such as

survey_data %&gt;%
  select(column1, column2, country, etc) %&gt;%  #choose your desired columns
  group_by(country) %&gt;%
  summarise_all(funs(sum(is.na(.))))

答案3

得分: 2

你可以在你的变量上使用 split，然后使用 lapply 在每个列表上使用 table，像这样：

lapply(split(df, df$a), \(x) table(x))
#> $`1`
#>    b
#> a   1 2 3 99
#>   1 1 1 1  2
#> 
#> $`2`
#>    b
#> a   1 2 3 99
#>   2 2 1 1  1

^{此内容创建于2023-02-18，使用 reprex v2.0.2。}

英文:

You could split on your a variable and use lapply to use table on each list like this:

lapply(split(df, df$a), \(x) table(x))
#&gt; $`1`
#&gt;    b
#&gt; a   1 2 3 99
#&gt;   1 1 1 1  2
#&gt; 
#&gt; $`2`
#&gt;    b
#&gt; a   1 2 3 99
#&gt;   2 2 1 1  1

<sup>Created on 2023-02-18 with reprex v2.0.2</sup>

答案4

得分: 2

只需在 lapply 中使用它。

alv <- unique(df$a)
lapply(alv, \(x) table(subset(df, a == x, b))) |&gt; setNames(alv)
# $`1`
# b
# 1  2  3 99 
# 1  1  1  2 
# 
# $`2`
# b
# 1  2  3 99 
# 2  1  1  1

但是，将 99（可能还有其他值）编码为 NA 可能会更好：

df[] <- lapply(df, \(x) replace(x, x %in% c(99), NA))

然后，对每个不同的 a 计算 b 中的 NA 数目：

with(df, tapply(b, a, \(x) sum(is.na(x))))
# 1 2 
# 2 1

英文:

Just use it in an lapply.

alv &lt;- unique(df$a)
lapply(alv, \(x) table(subset(df, a == x, b))) |&gt; setNames(alv)
# $`1`
# b
# 1  2  3 99 
# 1  1  1  2 
# 
# $`2`
# b
# 1  2  3 99 
# 2  1  1  1

However, it might be better to code 99 (and probably others) as NA,

df[] &lt;- lapply(df, \(x) replace(x, x %in% c(99), NA))

and count the NAs in b for each individual a.

with(df, tapply(b, a, \(x) sum(is.na(x))))
# 1 2 
# 2 1

答案5

得分: 2

只需在整个数据框上使用 table()，然后在之后提取你想要的部分。当索引两向表时，将 a 和 b 值转换为字符值。例如，

a <- c(1,1,1,1,1,2,2,2,2,2)
b <- c(1,3,99,99,2,3,2,99,1,1)
df <- cbind.data.frame(a,b)
full <- table(df$a, df$b)
full["1",] # 对应于子集 a == 1
#>  1  2  3 99 
#>  1  1  1  2
full["2",] # 对应于子集 a == 2
#>  1  2  3 99 
#>  2  1  1  1
full[, "99"] # 对应于子集 b == 99
#> 1 2 
#> 2 1

^{创建于2023-02-18，使用 reprex v2.0.2}

英文:

Just use table() on the whole dataframe, and pull out the parts you want afterwards. You convert the a and b values to character values when indexing into the two-way table. For example,

a &lt;- c(1,1,1,1,1,2,2,2,2,2)
b &lt;- c(1,3,99,99,2,3,2,99,1,1)
df &lt;- cbind.data.frame(a,b)
full &lt;- table(df$a, df$b)
full[&quot;1&quot;,] # corresponds to subset a == 1
#&gt;  1  2  3 99 
#&gt;  1  1  1  2
full[&quot;2&quot;,] # corresponds to subset a == 2
#&gt;  1  2  3 99 
#&gt;  2  1  1  1
full[, &quot;99&quot;] # corresponds to subset b == 99
#&gt; 1 2 
#&gt; 2 1

<sup>Created on 2023-02-18 with reprex v2.0.2</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将筛选器应用于表格函数。

问题

答案1

答案2

答案3

答案4

答案5

修改多个txt文件的数据框代码

堆叠密度图上的标签

计算两个或更多时间点的分数变化

How to check if a variable is available in the data else use a value provided by user in a R custom function?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论