Return all combinations of rows where the sum of two number columns each sum to >0 combined and have common index columns [R]

huangapple go评论60阅读模式
英文:

Return all combinations of rows where the sum of two number columns each sum to >0 combined and have common index columns [R]

问题

Sure, here's the translated portion of your text:

我想要返回 comboindex(第一列)的所有组合,其中 rowA 的 number1 和 rowB 的 number1 之和 >0,并且 rowA 的 number2 和 rowB 的 number2 之和 >0,并且 indexaindexbindexc 用于索引两行(indexaindexbindexc 在两行中都相同)。

例如:

# 创建示例数据
example_df <- data.frame(
  comboindex=c(LETTERS[1:4],LETTERS[1:6]),
  indexa=c(rep("A",4),rep("CCC",6)),
  indexb=c(rep("B",4),rep("DDD",6)),
  indexc=c(rep("C",4),rep("EEE",6)),
  number1=c(290, 340, -200, -108, 150, -190, 500, 1000, -300, 128),
  number2=c(-120, 100, -400, 180, -140, 200, -155, -900, 22000, 175),
  stringsAsFactors = F
)

期望的数据(手动创建),这将是给定上述 example_df 的期望结果:

# 添加计算和的计算(这样你可以看到在我的示例中加到+的数字)
desired_df <- data.frame(
  indexcombo=c("A_D","B_D","B_C","D_E","C_F","A_F","C_E"),
  indexa=c("A","A","CCC","CCC","CCC","CCC","CCC"),
  indexb=c("B","B","DDD","DDD","DDD","DDD","DDD"),
  indexc=c("C","C","EEE","EEE","EEE","EEE","EEE"),
  number1a=c(290,340,340,1000,-300,150,500),
  number1b=c(-108,-108,-200,1000,128,128,-300),
  sumnumber1=c(sum(290,-108),sum(340,-108),sum(340,-200),sum(1000,-300),sum(500,128),sum(150,128),sum(500,-300)),
  number2a=c(-120,100,200,-900,-155,-140,-155),
  number2b=c(180,180,-155,22000,175,175,22000),
  sumnumber2=c(sum(-120,180),sum(100,180),sum(200,-155),sum(-900,22000),sum(-155,175),sum(-140,175),sum(-155,22000)),
  stringsAsFactors = F
)

对于数千行的真实数据,使用循环可能不太实际。你可以考虑使用 data.table、dplyr/tidyverse 或矩阵函数来提高效率。希望这有助于你解决问题!

英文:

I'd like to return all combinations of the comboindex (column 1) where the sum of number1 in rowA and number1 in rowB &gt;0 AND the sum of number2 in rowA and number2 in rowB &gt;0, and indexa,indexb, and indexc are used to index both rows (indexa, indexb, and indexc are common to both rows).

For example:

#create example data

example_df &lt;- data.frame(
  comboindex=c(LETTERS[1:4],LETTERS[1:6]),
  indexa=c(rep(&quot;A&quot;,4),rep(&quot;CCC&quot;,6)),
  indexb=c(rep(&quot;B&quot;,4),rep(&quot;DDD&quot;,6)),
  indexc=c(rep(&quot;C&quot;,4),rep(&quot;EEE&quot;,6)),
  number1=c(290, 340, -200, -108, 150, -190, 500, 1000, -300, 128),
  number2=c(-120, 100, -400, 180, -140, 200, -155, -900, 22000, 175),
  stringsAsFactors = F
)

Desired data (manually created) which would be the results as desired given the above example_df:

# adding the sums as calculations (so you can see the numbers that add to + in my example)

desired_df &lt;- data.frame(
  indexcombo=c(&quot;A_D&quot;,&quot;B_D&quot;,&quot;B_C&quot;,&quot;D_E&quot;,&quot;C_F&quot;,&quot;A_F&quot;,&quot;C_E&quot;),
  indexa=c(&quot;A&quot;,&quot;A&quot;,&quot;CCC&quot;,&quot;CCC&quot;,&quot;CCC&quot;,&quot;CCC&quot;,&quot;CCC&quot;),
  indexb=c(&quot;B&quot;,&quot;B&quot;,&quot;DDD&quot;,&quot;DDD&quot;,&quot;DDD&quot;,&quot;DDD&quot;,&quot;DDD&quot;),
  indexc=c(&quot;C&quot;,&quot;C&quot;,&quot;EEE&quot;,&quot;EEE&quot;,&quot;EEE&quot;,&quot;EEE&quot;,&quot;EEE&quot;),
  number1a=c(290,340,340,1000,-300,150,500),
  number1b=c(-108,-108,-200,1000,128,128,-300),
  sumnumber1=c(sum(290,-108),sum(340,-108),sum(340,-200),sum(1000,-300),sum(500,128),sum(150,128),sum(500,-300)),
  number2a=c(-120,100,200,-900,-155,-140,-155),
  number2b=c(180,180,-155,22000,175,175,22000),
  sumnumber2=c(sum(-120,180),sum(100,180),sum(200,-155),sum(-900,22000),sum(-155,175),sum(-140,175),sum(-155,22000)),
  stringsAsFactors = F
)

My real data are thousands of lines long so a loop isnt practical. Can anyone help me with an efficient data.table, dplyr/tidyverse, or matrix function? Thanks!

答案1

得分: 2

library(dplyr)
df %>%
  left_join(df, by = join_by(indexa, indexb, indexc, comboindex < comboindex)) %>%
  mutate(indexcombo = paste(comboindex.x, comboindex.y, sep = "_"), .before = 0) %>%
  mutate(sumnumber1 = number1.x + number1.y,
         sumnumber2 = number2.x + number2.y) %>%
  filter(sumnumber1 > 0, sumnumber2 > 0)
英文:
library(dplyr)
df %&gt;%
  left_join(df, by = join_by(indexa, indexb, indexc, comboindex &lt; comboindex)) %&gt;%
  mutate(indexcombo = paste(comboindex.x, comboindex.y, sep = &quot;_&quot;), .before = 0) %&gt;%
  mutate(sumnumber1 = number1.x + number1.y,
         sumnumber2 = number2.x + number2.y) %&gt;%
  filter(sumnumber1 &gt; 0, sumnumber2 &gt; 0)

Result

  indexcombo comboindex.x indexa indexb indexc number1.x number2.x comboindex.y number1.y number2.y sumnumber1 sumnumber2
1        A_D            A      A      B      C       290      -120            D      -108       180        182         60
2        B_D            B      A      B      C       340       100            D      -108       180        232        280
3        A_F            A    CCC    DDD    EEE       150      -140            F       128       175        278         35
4        B_C            B    CCC    DDD    EEE      -190       200            C       500      -155        310         45
5        C_E            C    CCC    DDD    EEE       500      -155            E      -300     22000        200      21845
6        C_F            C    CCC    DDD    EEE       500      -155            F       128       175        628         20
7        D_E            D    CCC    DDD    EEE      1000      -900            E      -300     22000        700      21100

答案2

得分: 1

试图使用 *data.table* 的方法:

    library(data.table)
    setDT(example_df)
    
    example_df[, cmb := as.integer(as.factor(comboindex))]
    example_df[
        example_df,
        on=.(indexa, indexb, indexc, cmb < cmb),
        {
            s1 = x.number1 + i.number1
            s2 = x.number2 + i.number2
            data.table(x.comboindex, i.comboindex, s1, s2)[s1 > 0 & s2 > 0]
        },
        by=.EACHI,
        allow.cartesian=TRUE
    ]
    
    ##   indexa indexb indexc   cmb x.comboindex i.comboindex    s1    s2
    ##   <char> <char> <char> <int>       <char>       <char> <num> <num>
    ##1:      A      B      C     4            A            D   182    60
    ##2:      A      B      C     4            B            D   232   280
    ##3:    CCC    DDD    EEE     3            B            C   310    45
    ##4:    CCC    DDD    EEE     5            C            E   200 21845
    ##5:    CCC    DDD    EEE     5            D            E   700 21100
    ##6:    CCC    DDD    EEE     6            A            F   278    35
    ##7:    CCC    DDD    EEE     6            C            F   628    20
英文:

An attempt in data.table:

library(data.table)
setDT(example_df)

example_df[, cmb := as.integer(as.factor(comboindex))]
example_df[
    example_df,
    on=.(indexa, indexb, indexc, cmb &lt; cmb),
    {
        s1 = x.number1 + i.number1
        s2 = x.number2 + i.number2
        data.table(x.comboindex, i.comboindex, s1, s2)[s1 &gt; 0 &amp; s2 &gt; 0]
    },
    by=.EACHI,
    allow.cartesian=TRUE
]

##   indexa indexb indexc   cmb x.comboindex i.comboindex    s1    s2
##   &lt;char&gt; &lt;char&gt; &lt;char&gt; &lt;int&gt;       &lt;char&gt;       &lt;char&gt; &lt;num&gt; &lt;num&gt;
##1:      A      B      C     4            A            D   182    60
##2:      A      B      C     4            B            D   232   280
##3:    CCC    DDD    EEE     3            B            C   310    45
##4:    CCC    DDD    EEE     5            C            E   200 21845
##5:    CCC    DDD    EEE     5            D            E   700 21100
##6:    CCC    DDD    EEE     6            A            F   278    35
##7:    CCC    DDD    EEE     6            C            F   628    20

huangapple
  • 本文由 发表于 2023年5月11日 05:14:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76222601.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定