英文:
(R) Bin a numeric column to count occurrences after group by
问题
抱歉,代码部分不要翻译。以下是翻译好的内容:
"Apologies if the title of the post is a bit confusing. Let's say I have the following data frame:
I want to bin the position
column. Let's say with a size of 10. Then based on the condition (either soft or stiff), I would like to count the occurrences in the info
column. So the data would look something like this (not the actual result from the data above):
To make it easier, it is probably better to create two data frames based on condition and then apply the binning and counting, but I am stuck on this part. Any help is appreciated. Many thanks."
英文:
Apologies if the title of the post is a bit confusing. Let's say I have the following data frame:
set.seed(123)
test <- data.frame("chr" = rep("chr1",30), "position" = sample(c(1:50), 30, replace = F) ,
"info" = sample(c("X","Y"), 30, replace = T),
"condition"= sample(c("soft","stiff"), 30, replace = T) )
## head(test)
chr position info condition
1 chr1 31 Y soft
2 chr1 15 Y soft
3 chr1 14 X soft
4 chr1 3 X soft
5 chr1 42 X stiff
6 chr1 43 X stiff
I want to bin the position
column. Let's say with a size of 10. Then based on the condition (either soft or stiff), I would like to count the occurrences in the info
column. So the data would look something like this (not the actual result from the data above)
chr start end condition count_Y count_X
1 chr1 1 10 soft 2 3
2 chr1 1 10 stiff 0 2
3 chr1 11 20 soft 2 5
4 chr1 11 20 soft 1 2
5 chr1 21 30 soft 2 0
6 chr1 21 30 stiff 0 4
To make it easier, it is probably better to create two data frames based on condition and then apply the binning and counting, but I am stuck on this part. Any help is appreciated. Many thanks.
答案1
得分: 3
使用cut
或更简单的整数除法%/%
进行分箱(感谢@MrFlick的提示),dplyr::count
和tidyr::pivot_wider
,您可以执行以下操作:
library(dplyr, warn=FALSE)
library(tidyr)
test |>
mutate(
bin = position %/% 10 + 1,
start = (bin - 1) * 10 + 1,
end = bin * 10
) |>
count(chr, start, end, condition, info) |>
tidyr::pivot_wider(
names_from = info,
values_from = n,
names_prefix = "count_",
values_fill = 0
)
#> # A tibble: 9 × 6
#> chr start end condition count_X count_Y
#> <chr> <dbl> <dbl> <chr> <int> <int>
#> 1 chr1 1 10 soft 4 0
#> 2 chr1 1 10 stiff 2 1
#> 3 chr1 11 20 soft 3 3
#> 4 chr1 21 30 soft 1 1
#> 5 chr1 21 30 stiff 3 1
#> 6 chr1 31 40 soft 0 2
#> 7 chr1 31 40 stiff 2 1
#> 8 chr1 41 50 soft 0 1
#> 9 chr1 41 50 stiff 4 1
英文:
Using cut
or even easier using integer division %/%
for the binning (Thx to @MrFlick for the hint), dplyr::count
and tidyr::pivot_wider
you could do:
library(dplyr, warn=FALSE)
library(tidyr)
test |>
mutate(
bin = position %/% 10 + 1,
start = (bin - 1) * 10 + 1,
end = bin * 10
) |>
count(chr, start, end, condition, info) |>
tidyr::pivot_wider(
names_from = info,
values_from = n,
names_prefix = "count_",
values_fill = 0
)
#> # A tibble: 9 × 6
#> chr start end condition count_X count_Y
#> <chr> <dbl> <dbl> <chr> <int> <int>
#> 1 chr1 1 10 soft 4 0
#> 2 chr1 1 10 stiff 2 1
#> 3 chr1 11 20 soft 3 3
#> 4 chr1 21 30 soft 1 1
#> 5 chr1 21 30 stiff 3 1
#> 6 chr1 31 40 soft 0 2
#> 7 chr1 31 40 stiff 2 1
#> 8 chr1 41 50 soft 0 1
#> 9 chr1 41 50 stiff 4 1
答案2
得分: 1
以下是翻译好的部分:
"Alternatively please check the below code base r
approach" 可选择请检查下面的base r
代码方法。
"# Bin the "position" column with a bin size of 10" 使用大小为10的箱子对“position”列进行分箱。
"test$position_bin <- cut(test$position, breaks = seq(0, 50, by = 10), include.lowest = TRUE)" 使用大小为10的箱子对“position”列进行分箱。
"# Count occurrences in the "info" column based on the "condition"" 基于“condition”计算“info”列中的发生次数。
"count_result <- table(test$position_bin, test$condition, test$info) %>% as.data.frame() %>%" 计算表中“position_bin”、“condition”和“info”的值,并转换为数据框。
"setNames(c('position_bin','condition','info','Freq')) %>%" 设置数据框的列名为“position_bin”、“condition”、“info”和“Freq”。
"reshape(idvar = c('position_bin','condition'), timevar = 'info', v.names = 'Freq', direction = 'wide')" 重新塑造数据框,将“position_bin”和“condition”作为标识变量,将“info”作为时间变量,将“Freq”作为值的名称,方向为“wide”。
"<sup>Created on 2023-08-10 with reprex v2.0.2</sup>" 创建于2023年08月10日,使用reprex v2.0.2。
英文:
Alternatively please check the below code base r
approach
# Bin the "position" column with a bin size of 10
test$position_bin <- cut(test$position, breaks = seq(0, 50, by = 10), include.lowest = TRUE)
# Count occurrences in the "info" column based on the "condition"
count_result <- table(test$position_bin, test$condition, test$info) %>% as.data.frame() %>%
setNames(c('position_bin','condition','info','Freq')) %>%
reshape(idvar = c('position_bin','condition'), timevar = 'info', v.names = 'Freq', direction = 'wide')
<sup>Created on 2023-08-10 with reprex v2.0.2</sup>
position_bin condition Freq.X Freq.Y
1 [0,10] soft 4 0
2 (10,20] soft 3 3
3 (20,30] soft 1 1
4 (30,40] soft 0 2
5 (40,50] soft 0 1
6 [0,10] stiff 2 1
7 (10,20] stiff 0 0
8 (20,30] stiff 3 1
9 (30,40] stiff 2 1
10 (40,50] stiff 4 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论