Return all combinations of rows where the sum of two number columns each sum to >0 combined and have common index columns [R]

huangapple go评论89阅读模式
英文:

Return all combinations of rows where the sum of two number columns each sum to >0 combined and have common index columns [R]

问题

Sure, here's the translated portion of your text:

我想要返回 comboindex(第一列)的所有组合,其中 rowA 的 number1 和 rowB 的 number1 之和 >0,并且 rowA 的 number2 和 rowB 的 number2 之和 >0,并且 indexaindexbindexc 用于索引两行(indexaindexbindexc 在两行中都相同)。

例如:

  1. # 创建示例数据
  2. example_df <- data.frame(
  3. comboindex=c(LETTERS[1:4],LETTERS[1:6]),
  4. indexa=c(rep("A",4),rep("CCC",6)),
  5. indexb=c(rep("B",4),rep("DDD",6)),
  6. indexc=c(rep("C",4),rep("EEE",6)),
  7. number1=c(290, 340, -200, -108, 150, -190, 500, 1000, -300, 128),
  8. number2=c(-120, 100, -400, 180, -140, 200, -155, -900, 22000, 175),
  9. stringsAsFactors = F
  10. )

期望的数据(手动创建),这将是给定上述 example_df 的期望结果:

  1. # 添加计算和的计算(这样你可以看到在我的示例中加到+的数字)
  2. desired_df <- data.frame(
  3. indexcombo=c("A_D","B_D","B_C","D_E","C_F","A_F","C_E"),
  4. indexa=c("A","A","CCC","CCC","CCC","CCC","CCC"),
  5. indexb=c("B","B","DDD","DDD","DDD","DDD","DDD"),
  6. indexc=c("C","C","EEE","EEE","EEE","EEE","EEE"),
  7. number1a=c(290,340,340,1000,-300,150,500),
  8. number1b=c(-108,-108,-200,1000,128,128,-300),
  9. sumnumber1=c(sum(290,-108),sum(340,-108),sum(340,-200),sum(1000,-300),sum(500,128),sum(150,128),sum(500,-300)),
  10. number2a=c(-120,100,200,-900,-155,-140,-155),
  11. number2b=c(180,180,-155,22000,175,175,22000),
  12. sumnumber2=c(sum(-120,180),sum(100,180),sum(200,-155),sum(-900,22000),sum(-155,175),sum(-140,175),sum(-155,22000)),
  13. stringsAsFactors = F
  14. )

对于数千行的真实数据,使用循环可能不太实际。你可以考虑使用 data.table、dplyr/tidyverse 或矩阵函数来提高效率。希望这有助于你解决问题!

英文:

I'd like to return all combinations of the comboindex (column 1) where the sum of number1 in rowA and number1 in rowB &gt;0 AND the sum of number2 in rowA and number2 in rowB &gt;0, and indexa,indexb, and indexc are used to index both rows (indexa, indexb, and indexc are common to both rows).

For example:

  1. #create example data
  2. example_df &lt;- data.frame(
  3. comboindex=c(LETTERS[1:4],LETTERS[1:6]),
  4. indexa=c(rep(&quot;A&quot;,4),rep(&quot;CCC&quot;,6)),
  5. indexb=c(rep(&quot;B&quot;,4),rep(&quot;DDD&quot;,6)),
  6. indexc=c(rep(&quot;C&quot;,4),rep(&quot;EEE&quot;,6)),
  7. number1=c(290, 340, -200, -108, 150, -190, 500, 1000, -300, 128),
  8. number2=c(-120, 100, -400, 180, -140, 200, -155, -900, 22000, 175),
  9. stringsAsFactors = F
  10. )

Desired data (manually created) which would be the results as desired given the above example_df:

  1. # adding the sums as calculations (so you can see the numbers that add to + in my example)
  2. desired_df &lt;- data.frame(
  3. indexcombo=c(&quot;A_D&quot;,&quot;B_D&quot;,&quot;B_C&quot;,&quot;D_E&quot;,&quot;C_F&quot;,&quot;A_F&quot;,&quot;C_E&quot;),
  4. indexa=c(&quot;A&quot;,&quot;A&quot;,&quot;CCC&quot;,&quot;CCC&quot;,&quot;CCC&quot;,&quot;CCC&quot;,&quot;CCC&quot;),
  5. indexb=c(&quot;B&quot;,&quot;B&quot;,&quot;DDD&quot;,&quot;DDD&quot;,&quot;DDD&quot;,&quot;DDD&quot;,&quot;DDD&quot;),
  6. indexc=c(&quot;C&quot;,&quot;C&quot;,&quot;EEE&quot;,&quot;EEE&quot;,&quot;EEE&quot;,&quot;EEE&quot;,&quot;EEE&quot;),
  7. number1a=c(290,340,340,1000,-300,150,500),
  8. number1b=c(-108,-108,-200,1000,128,128,-300),
  9. sumnumber1=c(sum(290,-108),sum(340,-108),sum(340,-200),sum(1000,-300),sum(500,128),sum(150,128),sum(500,-300)),
  10. number2a=c(-120,100,200,-900,-155,-140,-155),
  11. number2b=c(180,180,-155,22000,175,175,22000),
  12. sumnumber2=c(sum(-120,180),sum(100,180),sum(200,-155),sum(-900,22000),sum(-155,175),sum(-140,175),sum(-155,22000)),
  13. stringsAsFactors = F
  14. )

My real data are thousands of lines long so a loop isnt practical. Can anyone help me with an efficient data.table, dplyr/tidyverse, or matrix function? Thanks!

答案1

得分: 2

  1. library(dplyr)
  2. df %>%
  3. left_join(df, by = join_by(indexa, indexb, indexc, comboindex < comboindex)) %>%
  4. mutate(indexcombo = paste(comboindex.x, comboindex.y, sep = "_"), .before = 0) %>%
  5. mutate(sumnumber1 = number1.x + number1.y,
  6. sumnumber2 = number2.x + number2.y) %>%
  7. filter(sumnumber1 > 0, sumnumber2 > 0)
英文:
  1. library(dplyr)
  2. df %&gt;%
  3. left_join(df, by = join_by(indexa, indexb, indexc, comboindex &lt; comboindex)) %&gt;%
  4. mutate(indexcombo = paste(comboindex.x, comboindex.y, sep = &quot;_&quot;), .before = 0) %&gt;%
  5. mutate(sumnumber1 = number1.x + number1.y,
  6. sumnumber2 = number2.x + number2.y) %&gt;%
  7. filter(sumnumber1 &gt; 0, sumnumber2 &gt; 0)

Result

  1. indexcombo comboindex.x indexa indexb indexc number1.x number2.x comboindex.y number1.y number2.y sumnumber1 sumnumber2
  2. 1 A_D A A B C 290 -120 D -108 180 182 60
  3. 2 B_D B A B C 340 100 D -108 180 232 280
  4. 3 A_F A CCC DDD EEE 150 -140 F 128 175 278 35
  5. 4 B_C B CCC DDD EEE -190 200 C 500 -155 310 45
  6. 5 C_E C CCC DDD EEE 500 -155 E -300 22000 200 21845
  7. 6 C_F C CCC DDD EEE 500 -155 F 128 175 628 20
  8. 7 D_E D CCC DDD EEE 1000 -900 E -300 22000 700 21100

答案2

得分: 1

  1. 试图使用 *data.table* 的方法:
  2. library(data.table)
  3. setDT(example_df)
  4. example_df[, cmb := as.integer(as.factor(comboindex))]
  5. example_df[
  6. example_df,
  7. on=.(indexa, indexb, indexc, cmb < cmb),
  8. {
  9. s1 = x.number1 + i.number1
  10. s2 = x.number2 + i.number2
  11. data.table(x.comboindex, i.comboindex, s1, s2)[s1 > 0 & s2 > 0]
  12. },
  13. by=.EACHI,
  14. allow.cartesian=TRUE
  15. ]
  16. ## indexa indexb indexc cmb x.comboindex i.comboindex s1 s2
  17. ## <char> <char> <char> <int> <char> <char> <num> <num>
  18. ##1: A B C 4 A D 182 60
  19. ##2: A B C 4 B D 232 280
  20. ##3: CCC DDD EEE 3 B C 310 45
  21. ##4: CCC DDD EEE 5 C E 200 21845
  22. ##5: CCC DDD EEE 5 D E 700 21100
  23. ##6: CCC DDD EEE 6 A F 278 35
  24. ##7: CCC DDD EEE 6 C F 628 20
英文:

An attempt in data.table:

  1. library(data.table)
  2. setDT(example_df)
  3. example_df[, cmb := as.integer(as.factor(comboindex))]
  4. example_df[
  5. example_df,
  6. on=.(indexa, indexb, indexc, cmb &lt; cmb),
  7. {
  8. s1 = x.number1 + i.number1
  9. s2 = x.number2 + i.number2
  10. data.table(x.comboindex, i.comboindex, s1, s2)[s1 &gt; 0 &amp; s2 &gt; 0]
  11. },
  12. by=.EACHI,
  13. allow.cartesian=TRUE
  14. ]
  15. ## indexa indexb indexc cmb x.comboindex i.comboindex s1 s2
  16. ## &lt;char&gt; &lt;char&gt; &lt;char&gt; &lt;int&gt; &lt;char&gt; &lt;char&gt; &lt;num&gt; &lt;num&gt;
  17. ##1: A B C 4 A D 182 60
  18. ##2: A B C 4 B D 232 280
  19. ##3: CCC DDD EEE 3 B C 310 45
  20. ##4: CCC DDD EEE 5 C E 200 21845
  21. ##5: CCC DDD EEE 5 D E 700 21100
  22. ##6: CCC DDD EEE 6 A F 278 35
  23. ##7: CCC DDD EEE 6 C F 628 20

huangapple
  • 本文由 发表于 2023年5月11日 05:14:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76222601.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定