
huangapple go评论69阅读模式

Apply filter to the table function





a <- c(1,1,1,1,1,2,2,2,2,2)
b <- c(1,3,99,99,2,3,2,99,1,1)
df <- cbind.data.frame(a,b)


df1 <- subset(df, a == 1)
df2 <- subset(df, a == 2)
rm(df1, df2)


此外,我觉得我在问这些非常基础的问题时感到有点烦人。如果有人有建议,告诉我如何直接找到答案,我会很高兴听到。除了自己尝试一些代码之外,我还搜索了诸如“r apply filter to table”,“r filter table function”,“r table subset dataframe”等词汇。


I'm looking for a way to execute a simple task faster than I am currently able to.

I want to use the table function in R on part of a dataframe. Of course it would be possible to first use subset and then table, but this is a bit tedious. (In my case, during a first inspection of the data, I want to check the frequency of NAs on individual variables in a multi-national survey for each of the 25 participating countries. So I'd need to create 25 subsets, make the table, and then remove the subsets again because I don't need them anymore.)

Here is some example data:

a &lt;- c(1,1,1,1,1,2,2,2,2,2)
b &lt;- c(1,3,99,99,2,3,2,99,1,1)
df &lt;- cbind.data.frame(a,b)

And this is the workaround solution.

df1 &lt;- subset(df, a == 1)
df2 &lt;- subset(df, a == 2)
rm(df1, df2)

Is there a simpler way?

Also, I feel like I am spamming with ultra-basic questions like these. If anyone has a suggestion on how I could have found the answer directly I'd be happy to hear it. Other than trying some code myself, I googled terms like 'r apply filter to table', 'r filter table function', 'r table subset dataframe', etc.


得分: 2


df %>%
  map_df(~sum(. == 99))
      a     b
  <int> <int>
1     0     3

Assuming 99 are your NAs then there is a way using purrr package, which I find is excellent to see how many NAs there are in each column:

df |&gt; 
  map_df(~sum(. == 99))
      a     b
  &lt;int&gt; &lt;int&gt;
1     0     3


得分: 2



survey_data %>%
  select(column1, column2, country, etc) %>%  #选择您需要的列
  group_by(country) %>%

Can you provide an example of the structure of the original data (multi-national survey)?

Probably you would be able to answer your question with a much tidier code using the package dplyr with functions such as

survey_data %&gt;%
  select(column1, column2, country, etc) %&gt;%  #choose your desired columns
  group_by(country) %&gt;%


得分: 2

你可以在你的变量上使用 split,然后使用 lapply 在每个列表上使用 table,像这样:

lapply(split(df, df$a), \(x) table(x))
#> $`1`
#>    b
#> a   1 2 3 99
#>   1 1 1 1  2
#> $`2`
#>    b
#> a   1 2 3 99
#>   2 2 1 1  1

此内容创建于2023-02-18,使用 reprex v2.0.2


You could split on your a variable and use lapply to use table on each list like this:

lapply(split(df, df$a), \(x) table(x))
#&gt; $`1`
#&gt;    b
#&gt; a   1 2 3 99
#&gt;   1 1 1 1  2
#&gt; $`2`
#&gt;    b
#&gt; a   1 2 3 99
#&gt;   2 2 1 1  1

<sup>Created on 2023-02-18 with reprex v2.0.2</sup>


得分: 2

只需在 lapply 中使用它。

alv <- unique(df$a)
lapply(alv, \(x) table(subset(df, a == x, b))) |&gt; setNames(alv)
# $`1`
# b
# 1  2  3 99 
# 1  1  1  2 
# $`2`
# b
# 1  2  3 99 
# 2  1  1  1 

但是,将 99(可能还有其他值)编码为 NA 可能会更好:

df[] <- lapply(df, \(x) replace(x, x %in% c(99), NA))

然后,对每个不同的 a 计算 b 中的 NA 数目:

with(df, tapply(b, a, \(x) sum(is.na(x))))
# 1 2 
# 2 1 

Just use it in an lapply.

alv &lt;- unique(df$a)
lapply(alv, \(x) table(subset(df, a == x, b))) |&gt; setNames(alv)
# $`1`
# b
# 1  2  3 99 
# 1  1  1  2 
# $`2`
# b
# 1  2  3 99 
# 2  1  1  1 

However, it might be better to code 99 (and probably others) as NA,

df[] &lt;- lapply(df, \(x) replace(x, x %in% c(99), NA))

and count the NAs in b for each individual a.

with(df, tapply(b, a, \(x) sum(is.na(x))))
# 1 2 
# 2 1 


得分: 2

只需在整个数据框上使用 table(),然后在之后提取你想要的部分。当索引两向表时,将 ab 值转换为字符值。例如,

a <- c(1,1,1,1,1,2,2,2,2,2)
b <- c(1,3,99,99,2,3,2,99,1,1)
df <- cbind.data.frame(a,b)

full <- table(df$a, df$b)
full["1",] # 对应于子集 a == 1
#>  1  2  3 99 
#>  1  1  1  2
full["2",] # 对应于子集 a == 2
#>  1  2  3 99 
#>  2  1  1  1

full[, "99"] # 对应于子集 b == 99
#> 1 2 
#> 2 1

创建于2023-02-18,使用 reprex v2.0.2


Just use table() on the whole dataframe, and pull out the parts you want afterwards. You convert the a and b values to character values when indexing into the two-way table. For example,

a &lt;- c(1,1,1,1,1,2,2,2,2,2)
b &lt;- c(1,3,99,99,2,3,2,99,1,1)
df &lt;- cbind.data.frame(a,b)

full &lt;- table(df$a, df$b)
full[&quot;1&quot;,] # corresponds to subset a == 1
#&gt;  1  2  3 99 
#&gt;  1  1  1  2
full[&quot;2&quot;,] # corresponds to subset a == 2
#&gt;  1  2  3 99 
#&gt;  2  1  1  1

full[, &quot;99&quot;] # corresponds to subset b == 99
#&gt; 1 2 
#&gt; 2 1

<sup>Created on 2023-02-18 with reprex v2.0.2</sup>

  • 本文由 发表于 2023年2月19日 01:25:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75495086.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
