将筛选器应用于表格函数。

huangapple go评论109阅读模式
英文:

Apply filter to the table function

问题

我正在寻找一种比我当前能够执行的更快的方式来执行一个简单的任务。

我想在R中使用table函数处理数据框的一部分。当然,可以先使用subset,然后再使用table,但这有点繁琐。在我的情况下,在对数据进行首次检查时,我想要检查多国调查中25个参与国家的各个变量中NA的频率。所以我需要创建25个子集,制作表格,然后删除这些子集,因为我不再需要它们。

这是一些示例数据:

  1. a <- c(1,1,1,1,1,2,2,2,2,2)
  2. b <- c(1,3,99,99,2,3,2,99,1,1)
  3. df <- cbind.data.frame(a,b)

这是解决方法的变通方案。

  1. df1 <- subset(df, a == 1)
  2. table(df1$b)
  3. df2 <- subset(df, a == 2)
  4. table(df2$b)
  5. rm(df1, df2)

是否有更简单的方法?

此外,我觉得我在问这些非常基础的问题时感到有点烦人。如果有人有建议,告诉我如何直接找到答案,我会很高兴听到。除了自己尝试一些代码之外,我还搜索了诸如“r apply filter to table”,“r filter table function”,“r table subset dataframe”等词汇。

英文:

I'm looking for a way to execute a simple task faster than I am currently able to.

I want to use the table function in R on part of a dataframe. Of course it would be possible to first use subset and then table, but this is a bit tedious. (In my case, during a first inspection of the data, I want to check the frequency of NAs on individual variables in a multi-national survey for each of the 25 participating countries. So I'd need to create 25 subsets, make the table, and then remove the subsets again because I don't need them anymore.)

Here is some example data:

  1. a &lt;- c(1,1,1,1,1,2,2,2,2,2)
  2. b &lt;- c(1,3,99,99,2,3,2,99,1,1)
  3. df &lt;- cbind.data.frame(a,b)

And this is the workaround solution.

  1. df1 &lt;- subset(df, a == 1)
  2. table(df1$b)
  3. df2 &lt;- subset(df, a == 2)
  4. table(df2$b)
  5. rm(df1, df2)

Is there a simpler way?

Also, I feel like I am spamming with ultra-basic questions like these. If anyone has a suggestion on how I could have found the answer directly I'd be happy to hear it. Other than trying some code myself, I googled terms like 'r apply filter to table', 'r filter table function', 'r table subset dataframe', etc.

答案1

得分: 2

假设99是你的NA,那么可以使用purrr包的方法来查看每列中有多少个NA:

  1. library(purrr)
  2. df %>%
  3. map_df(~sum(. == 99))
  1. a b
  2. <int> <int>
  3. 1 0 3
英文:

Assuming 99 are your NAs then there is a way using purrr package, which I find is excellent to see how many NAs there are in each column:

  1. library(purrr)
  2. df |&gt;
  3. map_df(~sum(. == 99))
  1. a b
  2. &lt;int&gt; &lt;int&gt;
  3. 1 0 3

答案2

得分: 2

你能提供原始数据结构(跨国调查)的示例吗?

可能您可以使用dplyr包和诸如以下函数更整洁的代码来回答您的问题:

  1. survey_data %>%
  2. select(column1, column2, country, etc) %>% #选择您需要的列
  3. group_by(country) %>%
  4. summarise_all(funs(sum(is.na(.))))
英文:

Can you provide an example of the structure of the original data (multi-national survey)?

Probably you would be able to answer your question with a much tidier code using the package dplyr with functions such as

  1. survey_data %&gt;%
  2. select(column1, column2, country, etc) %&gt;% #choose your desired columns
  3. group_by(country) %&gt;%
  4. summarise_all(funs(sum(is.na(.))))

答案3

得分: 2

你可以在你的变量上使用 split,然后使用 lapply 在每个列表上使用 table,像这样:

  1. lapply(split(df, df$a), \(x) table(x))
  2. #> $`1`
  3. #> b
  4. #> a 1 2 3 99
  5. #> 1 1 1 1 2
  6. #>
  7. #> $`2`
  8. #> b
  9. #> a 1 2 3 99
  10. #> 2 2 1 1 1

此内容创建于2023-02-18,使用 reprex v2.0.2

英文:

You could split on your a variable and use lapply to use table on each list like this:

  1. lapply(split(df, df$a), \(x) table(x))
  2. #&gt; $`1`
  3. #&gt; b
  4. #&gt; a 1 2 3 99
  5. #&gt; 1 1 1 1 2
  6. #&gt;
  7. #&gt; $`2`
  8. #&gt; b
  9. #&gt; a 1 2 3 99
  10. #&gt; 2 2 1 1 1

<sup>Created on 2023-02-18 with reprex v2.0.2</sup>

答案4

得分: 2

只需在 lapply 中使用它。

  1. alv <- unique(df$a)
  2. lapply(alv, \(x) table(subset(df, a == x, b))) |&gt; setNames(alv)
  3. # $`1`
  4. # b
  5. # 1 2 3 99
  6. # 1 1 1 2
  7. #
  8. # $`2`
  9. # b
  10. # 1 2 3 99
  11. # 2 1 1 1

但是,将 99(可能还有其他值)编码为 NA 可能会更好:

  1. df[] <- lapply(df, \(x) replace(x, x %in% c(99), NA))

然后,对每个不同的 a 计算 b 中的 NA 数目:

  1. with(df, tapply(b, a, \(x) sum(is.na(x))))
  2. # 1 2
  3. # 2 1
英文:

Just use it in an lapply.

  1. alv &lt;- unique(df$a)
  2. lapply(alv, \(x) table(subset(df, a == x, b))) |&gt; setNames(alv)
  3. # $`1`
  4. # b
  5. # 1 2 3 99
  6. # 1 1 1 2
  7. #
  8. # $`2`
  9. # b
  10. # 1 2 3 99
  11. # 2 1 1 1

However, it might be better to code 99 (and probably others) as NA,

  1. df[] &lt;- lapply(df, \(x) replace(x, x %in% c(99), NA))

and count the NAs in b for each individual a.

  1. with(df, tapply(b, a, \(x) sum(is.na(x))))
  2. # 1 2
  3. # 2 1

答案5

得分: 2

只需在整个数据框上使用 table(),然后在之后提取你想要的部分。当索引两向表时,将 ab 值转换为字符值。例如,

  1. a <- c(1,1,1,1,1,2,2,2,2,2)
  2. b <- c(1,3,99,99,2,3,2,99,1,1)
  3. df <- cbind.data.frame(a,b)
  4. full <- table(df$a, df$b)
  5. full["1",] # 对应于子集 a == 1
  6. #> 1 2 3 99
  7. #> 1 1 1 2
  8. full["2",] # 对应于子集 a == 2
  9. #> 1 2 3 99
  10. #> 2 1 1 1
  11. full[, "99"] # 对应于子集 b == 99
  12. #> 1 2
  13. #> 2 1

创建于2023-02-18,使用 reprex v2.0.2

英文:

Just use table() on the whole dataframe, and pull out the parts you want afterwards. You convert the a and b values to character values when indexing into the two-way table. For example,

  1. a &lt;- c(1,1,1,1,1,2,2,2,2,2)
  2. b &lt;- c(1,3,99,99,2,3,2,99,1,1)
  3. df &lt;- cbind.data.frame(a,b)
  4. full &lt;- table(df$a, df$b)
  5. full[&quot;1&quot;,] # corresponds to subset a == 1
  6. #&gt; 1 2 3 99
  7. #&gt; 1 1 1 2
  8. full[&quot;2&quot;,] # corresponds to subset a == 2
  9. #&gt; 1 2 3 99
  10. #&gt; 2 1 1 1
  11. full[, &quot;99&quot;] # corresponds to subset b == 99
  12. #&gt; 1 2
  13. #&gt; 2 1

<sup>Created on 2023-02-18 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年2月19日 01:25:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75495086.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定