2023年8月11日 03:19:19go评论131阅读模式

英文:

(R) Bin a numeric column to count occurrences after group by

问题

抱歉，代码部分不要翻译。以下是翻译好的内容：

"Apologies if the title of the post is a bit confusing. Let's say I have the following data frame:

I want to bin the position column. Let's say with a size of 10. Then based on the condition (either soft or stiff), I would like to count the occurrences in the info column. So the data would look something like this (not the actual result from the data above):

To make it easier, it is probably better to create two data frames based on condition and then apply the binning and counting, but I am stuck on this part. Any help is appreciated. Many thanks."

英文:

Apologies if the title of the post is a bit confusing. Let's say I have the following data frame:

set.seed(123)
test &lt;- data.frame(&quot;chr&quot; = rep(&quot;chr1&quot;,30), &quot;position&quot; = sample(c(1:50), 30, replace = F) , 
         &quot;info&quot; = sample(c(&quot;X&quot;,&quot;Y&quot;), 30, replace = T), 
         &quot;condition&quot;= sample(c(&quot;soft&quot;,&quot;stiff&quot;), 30, replace = T) )

## head(test)
   chr position info condition
1 chr1       31    Y      soft
2 chr1       15    Y      soft
3 chr1       14    X      soft
4 chr1        3    X      soft
5 chr1       42    X     stiff
6 chr1       43    X     stiff

   chr start end condition count_Y count_X
1 chr1   1    10    soft      2       3
2 chr1   1    10    stiff     0       2
3 chr1   11   20    soft      2       5
4 chr1   11   20    soft      1       2
5 chr1   21   30    soft      2       0
6 chr1   21   30    stiff     0       4

To make it easier, it is probably better to create two data frames based on condition and then apply the binning and counting, but I am stuck on this part. Any help is appreciated. Many thanks.

答案1

得分: 3

使用cut或更简单的整数除法%/%进行分箱（感谢@MrFlick的提示），dplyr::count和tidyr::pivot_wider，您可以执行以下操作：

library(dplyr, warn=FALSE)
library(tidyr)

test |&gt;
  mutate(
    bin = position %/% 10 + 1,
    start = (bin - 1) * 10 + 1,
    end = bin * 10
  ) |&gt;
  count(chr, start, end, condition, info) |&gt;
  tidyr::pivot_wider(
    names_from = info, 
    values_from = n, 
    names_prefix = &quot;count_&quot;,
    values_fill = 0
  )
#&gt; # A tibble: 9 &#215; 6
#&gt;   chr   start   end condition count_X count_Y
#&gt;   &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;       &lt;int&gt;   &lt;int&gt;
#&gt; 1 chr1      1    10 soft            4       0
#&gt; 2 chr1      1    10 stiff           2       1
#&gt; 3 chr1     11    20 soft            3       3
#&gt; 4 chr1     21    30 soft            1       1
#&gt; 5 chr1     21    30 stiff           3       1
#&gt; 6 chr1     31    40 soft            0       2
#&gt; 7 chr1     31    40 stiff           2       1
#&gt; 8 chr1     41    50 soft            0       1
#&gt; 9 chr1     41    50 stiff           4       1

英文:

Using cut or even easier using integer division %/% for the binning (Thx to @MrFlick for the hint), dplyr::count and tidyr::pivot_wider you could do:

library(dplyr, warn=FALSE)
library(tidyr)

test |&gt;
  mutate(
    bin = position %/% 10 + 1,
    start = (bin - 1) * 10 + 1,
    end = bin * 10
  ) |&gt;
  count(chr, start, end, condition, info) |&gt;
  tidyr::pivot_wider(
    names_from = info, 
    values_from = n, 
    names_prefix = &quot;count_&quot;,
    values_fill = 0
  )
#&gt; # A tibble: 9 &#215; 6
#&gt;   chr   start   end condition count_X count_Y
#&gt;   &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;       &lt;int&gt;   &lt;int&gt;
#&gt; 1 chr1      1    10 soft            4       0
#&gt; 2 chr1      1    10 stiff           2       1
#&gt; 3 chr1     11    20 soft            3       3
#&gt; 4 chr1     21    30 soft            1       1
#&gt; 5 chr1     21    30 stiff           3       1
#&gt; 6 chr1     31    40 soft            0       2
#&gt; 7 chr1     31    40 stiff           2       1
#&gt; 8 chr1     41    50 soft            0       1
#&gt; 9 chr1     41    50 stiff           4       1

答案2

得分: 1

以下是翻译好的部分：

"Alternatively please check the below code base r approach" 可选择请检查下面的base r代码方法。

"# Bin the "position" column with a bin size of 10" 使用大小为10的箱子对“position”列进行分箱。

"test$position_bin <- cut(test$position, breaks = seq(0, 50, by = 10), include.lowest = TRUE)" 使用大小为10的箱子对“position”列进行分箱。

"# Count occurrences in the "info" column based on the "condition"" 基于“condition”计算“info”列中的发生次数。

"count_result <- table(test$position_bin, test$condition, test$info) %>% as.data.frame() %>%" 计算表中“position_bin”、“condition”和“info”的值，并转换为数据框。

"setNames(c('position_bin','condition','info','Freq')) %>%" 设置数据框的列名为“position_bin”、“condition”、“info”和“Freq”。

"reshape(idvar = c('position_bin','condition'), timevar = 'info', v.names = 'Freq', direction = 'wide')" 重新塑造数据框，将“position_bin”和“condition”作为标识变量，将“info”作为时间变量，将“Freq”作为值的名称，方向为“wide”。

"<sup>Created on 2023-08-10 with reprex v2.0.2</sup>" 创建于2023年08月10日，使用reprex v2.0.2。

英文:

Alternatively please check the below code base r approach

# Bin the &quot;position&quot; column with a bin size of 10
test$position_bin &lt;- cut(test$position, breaks = seq(0, 50, by = 10), include.lowest = TRUE)

# Count occurrences in the &quot;info&quot; column based on the &quot;condition&quot;
count_result &lt;- table(test$position_bin, test$condition, test$info) %&gt;% as.data.frame() %&gt;% 
  setNames(c(&#39;position_bin&#39;,&#39;condition&#39;,&#39;info&#39;,&#39;Freq&#39;)) %&gt;% 
  reshape(idvar = c(&#39;position_bin&#39;,&#39;condition&#39;), timevar = &#39;info&#39;, v.names = &#39;Freq&#39;, direction = &#39;wide&#39;)

<sup>Created on 2023-08-10 with reprex v2.0.2</sup>

   position_bin condition Freq.X Freq.Y
1        [0,10]      soft      4      0
2       (10,20]      soft      3      3
3       (20,30]      soft      1      1
4       (30,40]      soft      0      2
5       (40,50]      soft      0      1
6        [0,10]     stiff      2      1
7       (10,20]     stiff      0      0
8       (20,30]     stiff      3      1
9       (30,40]     stiff      2      1
10      (40,50]     stiff      4      1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将数值列分组后进行分箱以计算每组的出现次数。

问题

答案1

答案2

如何在Snowpark（Python）数据框中实现explode功能，而不使用explode()函数？

如何计算每个组的百分比取决于不同的变量？

删除的列重新出现在列级别中

根据列表项的行拆分列表。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论