英文:
Return all combinations of rows where the sum of two number columns each sum to >0 combined and have common index columns [R]
问题
Sure, here's the translated portion of your text:
我想要返回 comboindex(第一列)的所有组合,其中 rowA 的 number1
和 rowB 的 number1
之和 >0
,并且 rowA 的 number2
和 rowB 的 number2
之和 >0
,并且 indexa
、indexb
和 indexc
用于索引两行(indexa
、indexb
和 indexc
在两行中都相同)。
例如:
# 创建示例数据
example_df <- data.frame(
comboindex=c(LETTERS[1:4],LETTERS[1:6]),
indexa=c(rep("A",4),rep("CCC",6)),
indexb=c(rep("B",4),rep("DDD",6)),
indexc=c(rep("C",4),rep("EEE",6)),
number1=c(290, 340, -200, -108, 150, -190, 500, 1000, -300, 128),
number2=c(-120, 100, -400, 180, -140, 200, -155, -900, 22000, 175),
stringsAsFactors = F
)
期望的数据(手动创建),这将是给定上述 example_df 的期望结果:
# 添加计算和的计算(这样你可以看到在我的示例中加到+的数字)
desired_df <- data.frame(
indexcombo=c("A_D","B_D","B_C","D_E","C_F","A_F","C_E"),
indexa=c("A","A","CCC","CCC","CCC","CCC","CCC"),
indexb=c("B","B","DDD","DDD","DDD","DDD","DDD"),
indexc=c("C","C","EEE","EEE","EEE","EEE","EEE"),
number1a=c(290,340,340,1000,-300,150,500),
number1b=c(-108,-108,-200,1000,128,128,-300),
sumnumber1=c(sum(290,-108),sum(340,-108),sum(340,-200),sum(1000,-300),sum(500,128),sum(150,128),sum(500,-300)),
number2a=c(-120,100,200,-900,-155,-140,-155),
number2b=c(180,180,-155,22000,175,175,22000),
sumnumber2=c(sum(-120,180),sum(100,180),sum(200,-155),sum(-900,22000),sum(-155,175),sum(-140,175),sum(-155,22000)),
stringsAsFactors = F
)
对于数千行的真实数据,使用循环可能不太实际。你可以考虑使用 data.table、dplyr/tidyverse 或矩阵函数来提高效率。希望这有助于你解决问题!
英文:
I'd like to return all combinations of the comboindex (column 1) where the sum of number1
in rowA and number1
in rowB >0
AND the sum of number2
in rowA and number2
in rowB >0
, and indexa,indexb
, and indexc
are used to index both rows (indexa, indexb, and indexc are common to both rows).
For example:
#create example data
example_df <- data.frame(
comboindex=c(LETTERS[1:4],LETTERS[1:6]),
indexa=c(rep("A",4),rep("CCC",6)),
indexb=c(rep("B",4),rep("DDD",6)),
indexc=c(rep("C",4),rep("EEE",6)),
number1=c(290, 340, -200, -108, 150, -190, 500, 1000, -300, 128),
number2=c(-120, 100, -400, 180, -140, 200, -155, -900, 22000, 175),
stringsAsFactors = F
)
Desired data (manually created) which would be the results as desired given the above example_df:
# adding the sums as calculations (so you can see the numbers that add to + in my example)
desired_df <- data.frame(
indexcombo=c("A_D","B_D","B_C","D_E","C_F","A_F","C_E"),
indexa=c("A","A","CCC","CCC","CCC","CCC","CCC"),
indexb=c("B","B","DDD","DDD","DDD","DDD","DDD"),
indexc=c("C","C","EEE","EEE","EEE","EEE","EEE"),
number1a=c(290,340,340,1000,-300,150,500),
number1b=c(-108,-108,-200,1000,128,128,-300),
sumnumber1=c(sum(290,-108),sum(340,-108),sum(340,-200),sum(1000,-300),sum(500,128),sum(150,128),sum(500,-300)),
number2a=c(-120,100,200,-900,-155,-140,-155),
number2b=c(180,180,-155,22000,175,175,22000),
sumnumber2=c(sum(-120,180),sum(100,180),sum(200,-155),sum(-900,22000),sum(-155,175),sum(-140,175),sum(-155,22000)),
stringsAsFactors = F
)
My real data are thousands of lines long so a loop isnt practical. Can anyone help me with an efficient data.table, dplyr/tidyverse, or matrix function? Thanks!
答案1
得分: 2
library(dplyr)
df %>%
left_join(df, by = join_by(indexa, indexb, indexc, comboindex < comboindex)) %>%
mutate(indexcombo = paste(comboindex.x, comboindex.y, sep = "_"), .before = 0) %>%
mutate(sumnumber1 = number1.x + number1.y,
sumnumber2 = number2.x + number2.y) %>%
filter(sumnumber1 > 0, sumnumber2 > 0)
英文:
library(dplyr)
df %>%
left_join(df, by = join_by(indexa, indexb, indexc, comboindex < comboindex)) %>%
mutate(indexcombo = paste(comboindex.x, comboindex.y, sep = "_"), .before = 0) %>%
mutate(sumnumber1 = number1.x + number1.y,
sumnumber2 = number2.x + number2.y) %>%
filter(sumnumber1 > 0, sumnumber2 > 0)
Result
indexcombo comboindex.x indexa indexb indexc number1.x number2.x comboindex.y number1.y number2.y sumnumber1 sumnumber2
1 A_D A A B C 290 -120 D -108 180 182 60
2 B_D B A B C 340 100 D -108 180 232 280
3 A_F A CCC DDD EEE 150 -140 F 128 175 278 35
4 B_C B CCC DDD EEE -190 200 C 500 -155 310 45
5 C_E C CCC DDD EEE 500 -155 E -300 22000 200 21845
6 C_F C CCC DDD EEE 500 -155 F 128 175 628 20
7 D_E D CCC DDD EEE 1000 -900 E -300 22000 700 21100
答案2
得分: 1
试图使用 *data.table* 的方法:
library(data.table)
setDT(example_df)
example_df[, cmb := as.integer(as.factor(comboindex))]
example_df[
example_df,
on=.(indexa, indexb, indexc, cmb < cmb),
{
s1 = x.number1 + i.number1
s2 = x.number2 + i.number2
data.table(x.comboindex, i.comboindex, s1, s2)[s1 > 0 & s2 > 0]
},
by=.EACHI,
allow.cartesian=TRUE
]
## indexa indexb indexc cmb x.comboindex i.comboindex s1 s2
## <char> <char> <char> <int> <char> <char> <num> <num>
##1: A B C 4 A D 182 60
##2: A B C 4 B D 232 280
##3: CCC DDD EEE 3 B C 310 45
##4: CCC DDD EEE 5 C E 200 21845
##5: CCC DDD EEE 5 D E 700 21100
##6: CCC DDD EEE 6 A F 278 35
##7: CCC DDD EEE 6 C F 628 20
英文:
An attempt in data.table:
library(data.table)
setDT(example_df)
example_df[, cmb := as.integer(as.factor(comboindex))]
example_df[
example_df,
on=.(indexa, indexb, indexc, cmb < cmb),
{
s1 = x.number1 + i.number1
s2 = x.number2 + i.number2
data.table(x.comboindex, i.comboindex, s1, s2)[s1 > 0 & s2 > 0]
},
by=.EACHI,
allow.cartesian=TRUE
]
## indexa indexb indexc cmb x.comboindex i.comboindex s1 s2
## <char> <char> <char> <int> <char> <char> <num> <num>
##1: A B C 4 A D 182 60
##2: A B C 4 B D 232 280
##3: CCC DDD EEE 3 B C 310 45
##4: CCC DDD EEE 5 C E 200 21845
##5: CCC DDD EEE 5 D E 700 21100
##6: CCC DDD EEE 6 A F 278 35
##7: CCC DDD EEE 6 C F 628 20
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论