将数值列分组后进行分箱以计算每组的出现次数。

huangapple go评论131阅读模式
英文:

(R) Bin a numeric column to count occurrences after group by

问题

抱歉,代码部分不要翻译。以下是翻译好的内容:

"Apologies if the title of the post is a bit confusing. Let's say I have the following data frame:

I want to bin the position column. Let's say with a size of 10. Then based on the condition (either soft or stiff), I would like to count the occurrences in the info column. So the data would look something like this (not the actual result from the data above):

To make it easier, it is probably better to create two data frames based on condition and then apply the binning and counting, but I am stuck on this part. Any help is appreciated. Many thanks."

英文:

Apologies if the title of the post is a bit confusing. Let's say I have the following data frame:

set.seed(123)
test <- data.frame("chr" = rep("chr1",30), "position" = sample(c(1:50), 30, replace = F) , 
         "info" = sample(c("X","Y"), 30, replace = T), 
         "condition"= sample(c("soft","stiff"), 30, replace = T) )

## head(test)
   chr position info condition
1 chr1       31    Y      soft
2 chr1       15    Y      soft
3 chr1       14    X      soft
4 chr1        3    X      soft
5 chr1       42    X     stiff
6 chr1       43    X     stiff

I want to bin the position column. Let's say with a size of 10. Then based on the condition (either soft or stiff), I would like to count the occurrences in the info column. So the data would look something like this (not the actual result from the data above)

   chr start end condition count_Y count_X
1 chr1   1    10    soft      2       3
2 chr1   1    10    stiff     0       2
3 chr1   11   20    soft      2       5
4 chr1   11   20    soft      1       2
5 chr1   21   30    soft      2       0
6 chr1   21   30    stiff     0       4

To make it easier, it is probably better to create two data frames based on condition and then apply the binning and counting, but I am stuck on this part. Any help is appreciated. Many thanks.

答案1

得分: 3

使用cut或更简单的整数除法%/%进行分箱(感谢@MrFlick的提示),dplyr::counttidyr::pivot_wider,您可以执行以下操作:

library(dplyr, warn=FALSE)
library(tidyr)

test |>
  mutate(
    bin = position %/% 10 + 1,
    start = (bin - 1) * 10 + 1,
    end = bin * 10
  ) |>
  count(chr, start, end, condition, info) |>
  tidyr::pivot_wider(
    names_from = info, 
    values_from = n, 
    names_prefix = "count_",
    values_fill = 0
  )
#> # A tibble: 9 × 6
#>   chr   start   end condition count_X count_Y
#>   <chr> <dbl> <dbl> <chr>       <int>   <int>
#> 1 chr1      1    10 soft            4       0
#> 2 chr1      1    10 stiff           2       1
#> 3 chr1     11    20 soft            3       3
#> 4 chr1     21    30 soft            1       1
#> 5 chr1     21    30 stiff           3       1
#> 6 chr1     31    40 soft            0       2
#> 7 chr1     31    40 stiff           2       1
#> 8 chr1     41    50 soft            0       1
#> 9 chr1     41    50 stiff           4       1
英文:

Using cut or even easier using integer division %/% for the binning (Thx to @MrFlick for the hint), dplyr::count and tidyr::pivot_wider you could do:

library(dplyr, warn=FALSE)
library(tidyr)

test |>
  mutate(
    bin = position %/% 10 + 1,
    start = (bin - 1) * 10 + 1,
    end = bin * 10
  ) |>
  count(chr, start, end, condition, info) |>
  tidyr::pivot_wider(
    names_from = info, 
    values_from = n, 
    names_prefix = "count_",
    values_fill = 0
  )
#> # A tibble: 9 × 6
#>   chr   start   end condition count_X count_Y
#>   <chr> <dbl> <dbl> <chr>       <int>   <int>
#> 1 chr1      1    10 soft            4       0
#> 2 chr1      1    10 stiff           2       1
#> 3 chr1     11    20 soft            3       3
#> 4 chr1     21    30 soft            1       1
#> 5 chr1     21    30 stiff           3       1
#> 6 chr1     31    40 soft            0       2
#> 7 chr1     31    40 stiff           2       1
#> 8 chr1     41    50 soft            0       1
#> 9 chr1     41    50 stiff           4       1

答案2

得分: 1

以下是翻译好的部分:

"Alternatively please check the below code base r approach" 可选择请检查下面的base r代码方法。

"# Bin the "position" column with a bin size of 10" 使用大小为10的箱子对“position”列进行分箱。

"test$position_bin <- cut(test$position, breaks = seq(0, 50, by = 10), include.lowest = TRUE)" 使用大小为10的箱子对“position”列进行分箱。

"# Count occurrences in the "info" column based on the "condition"" 基于“condition”计算“info”列中的发生次数。

"count_result <- table(test$position_bin, test$condition, test$info) %>% as.data.frame() %>%" 计算表中“position_bin”、“condition”和“info”的值,并转换为数据框。

"setNames(c('position_bin','condition','info','Freq')) %>%" 设置数据框的列名为“position_bin”、“condition”、“info”和“Freq”。

"reshape(idvar = c('position_bin','condition'), timevar = 'info', v.names = 'Freq', direction = 'wide')" 重新塑造数据框,将“position_bin”和“condition”作为标识变量,将“info”作为时间变量,将“Freq”作为值的名称,方向为“wide”。

"<sup>Created on 2023-08-10 with reprex v2.0.2</sup>" 创建于2023年08月10日,使用reprex v2.0.2

英文:

Alternatively please check the below code base r approach

# Bin the &quot;position&quot; column with a bin size of 10
test$position_bin &lt;- cut(test$position, breaks = seq(0, 50, by = 10), include.lowest = TRUE)

# Count occurrences in the &quot;info&quot; column based on the &quot;condition&quot;
count_result &lt;- table(test$position_bin, test$condition, test$info) %&gt;% as.data.frame() %&gt;% 
  setNames(c(&#39;position_bin&#39;,&#39;condition&#39;,&#39;info&#39;,&#39;Freq&#39;)) %&gt;% 
  reshape(idvar = c(&#39;position_bin&#39;,&#39;condition&#39;), timevar = &#39;info&#39;, v.names = &#39;Freq&#39;, direction = &#39;wide&#39;)

<sup>Created on 2023-08-10 with reprex v2.0.2</sup>

   position_bin condition Freq.X Freq.Y
1        [0,10]      soft      4      0
2       (10,20]      soft      3      3
3       (20,30]      soft      1      1
4       (30,40]      soft      0      2
5       (40,50]      soft      0      1
6        [0,10]     stiff      2      1
7       (10,20]     stiff      0      0
8       (20,30]     stiff      3      1
9       (30,40]     stiff      2      1
10      (40,50]     stiff      4      1

huangapple
  • 本文由 发表于 2023年8月11日 03:19:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76878751.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定