2023年5月11日 05:14:58go评论89阅读模式

英文:

Return all combinations of rows where the sum of two number columns each sum to >0 combined and have common index columns [R]

问题

Sure, here's the translated portion of your text:

我想要返回 comboindex（第一列）的所有组合，其中 rowA 的 number1 和 rowB 的 number1 之和 >0，并且 rowA 的 number2 和 rowB 的 number2 之和 >0，并且 indexa、indexb 和 indexc 用于索引两行（indexa、indexb 和 indexc 在两行中都相同）。

例如：

# 创建示例数据
example_df <- data.frame(
  comboindex=c(LETTERS[1:4],LETTERS[1:6]),
  indexa=c(rep("A",4),rep("CCC",6)),
  indexb=c(rep("B",4),rep("DDD",6)),
  indexc=c(rep("C",4),rep("EEE",6)),
  number1=c(290, 340, -200, -108, 150, -190, 500, 1000, -300, 128),
  number2=c(-120, 100, -400, 180, -140, 200, -155, -900, 22000, 175),
  stringsAsFactors = F
)

期望的数据（手动创建），这将是给定上述 example_df 的期望结果：

# 添加计算和的计算（这样你可以看到在我的示例中加到+的数字）
desired_df <- data.frame(
  indexcombo=c("A_D","B_D","B_C","D_E","C_F","A_F","C_E"),
  indexa=c("A","A","CCC","CCC","CCC","CCC","CCC"),
  indexb=c("B","B","DDD","DDD","DDD","DDD","DDD"),
  indexc=c("C","C","EEE","EEE","EEE","EEE","EEE"),
  number1a=c(290,340,340,1000,-300,150,500),
  number1b=c(-108,-108,-200,1000,128,128,-300),
  sumnumber1=c(sum(290,-108),sum(340,-108),sum(340,-200),sum(1000,-300),sum(500,128),sum(150,128),sum(500,-300)),
  number2a=c(-120,100,200,-900,-155,-140,-155),
  number2b=c(180,180,-155,22000,175,175,22000),
  sumnumber2=c(sum(-120,180),sum(100,180),sum(200,-155),sum(-900,22000),sum(-155,175),sum(-140,175),sum(-155,22000)),
  stringsAsFactors = F
)

对于数千行的真实数据，使用循环可能不太实际。你可以考虑使用 data.table、dplyr/tidyverse 或矩阵函数来提高效率。希望这有助于你解决问题！

英文:

I'd like to return all combinations of the comboindex (column 1) where the sum of number1 in rowA and number1 in rowB >0 AND the sum of number2 in rowA and number2 in rowB >0, and indexa,indexb, and indexc are used to index both rows (indexa, indexb, and indexc are common to both rows).

For example:

#create example data
example_df &lt;- data.frame(
  comboindex=c(LETTERS[1:4],LETTERS[1:6]),
  indexa=c(rep(&quot;A&quot;,4),rep(&quot;CCC&quot;,6)),
  indexb=c(rep(&quot;B&quot;,4),rep(&quot;DDD&quot;,6)),
  indexc=c(rep(&quot;C&quot;,4),rep(&quot;EEE&quot;,6)),
  number1=c(290, 340, -200, -108, 150, -190, 500, 1000, -300, 128),
  number2=c(-120, 100, -400, 180, -140, 200, -155, -900, 22000, 175),
  stringsAsFactors = F
)

Desired data (manually created) which would be the results as desired given the above example_df:

# adding the sums as calculations (so you can see the numbers that add to + in my example)
desired_df &lt;- data.frame(
  indexcombo=c(&quot;A_D&quot;,&quot;B_D&quot;,&quot;B_C&quot;,&quot;D_E&quot;,&quot;C_F&quot;,&quot;A_F&quot;,&quot;C_E&quot;),
  indexa=c(&quot;A&quot;,&quot;A&quot;,&quot;CCC&quot;,&quot;CCC&quot;,&quot;CCC&quot;,&quot;CCC&quot;,&quot;CCC&quot;),
  indexb=c(&quot;B&quot;,&quot;B&quot;,&quot;DDD&quot;,&quot;DDD&quot;,&quot;DDD&quot;,&quot;DDD&quot;,&quot;DDD&quot;),
  indexc=c(&quot;C&quot;,&quot;C&quot;,&quot;EEE&quot;,&quot;EEE&quot;,&quot;EEE&quot;,&quot;EEE&quot;,&quot;EEE&quot;),
  number1a=c(290,340,340,1000,-300,150,500),
  number1b=c(-108,-108,-200,1000,128,128,-300),
  sumnumber1=c(sum(290,-108),sum(340,-108),sum(340,-200),sum(1000,-300),sum(500,128),sum(150,128),sum(500,-300)),
  number2a=c(-120,100,200,-900,-155,-140,-155),
  number2b=c(180,180,-155,22000,175,175,22000),
  sumnumber2=c(sum(-120,180),sum(100,180),sum(200,-155),sum(-900,22000),sum(-155,175),sum(-140,175),sum(-155,22000)),
  stringsAsFactors = F
)

My real data are thousands of lines long so a loop isnt practical. Can anyone help me with an efficient data.table, dplyr/tidyverse, or matrix function? Thanks!

答案1

得分: 2

library(dplyr)
df %>%
  left_join(df, by = join_by(indexa, indexb, indexc, comboindex < comboindex)) %>%
  mutate(indexcombo = paste(comboindex.x, comboindex.y, sep = "_"), .before = 0) %>%
  mutate(sumnumber1 = number1.x + number1.y,
         sumnumber2 = number2.x + number2.y) %>%
  filter(sumnumber1 > 0, sumnumber2 > 0)

英文:

library(dplyr)
df %&gt;%
  left_join(df, by = join_by(indexa, indexb, indexc, comboindex &lt; comboindex)) %&gt;%
  mutate(indexcombo = paste(comboindex.x, comboindex.y, sep = &quot;_&quot;), .before = 0) %&gt;%
  mutate(sumnumber1 = number1.x + number1.y,
         sumnumber2 = number2.x + number2.y) %&gt;%
  filter(sumnumber1 &gt; 0, sumnumber2 &gt; 0)

Result

  indexcombo comboindex.x indexa indexb indexc number1.x number2.x comboindex.y number1.y number2.y sumnumber1 sumnumber2
1        A_D            A      A      B      C       290      -120            D      -108       180        182         60
2        B_D            B      A      B      C       340       100            D      -108       180        232        280
3        A_F            A    CCC    DDD    EEE       150      -140            F       128       175        278         35
4        B_C            B    CCC    DDD    EEE      -190       200            C       500      -155        310         45
5        C_E            C    CCC    DDD    EEE       500      -155            E      -300     22000        200      21845
6        C_F            C    CCC    DDD    EEE       500      -155            F       128       175        628         20
7        D_E            D    CCC    DDD    EEE      1000      -900            E      -300     22000        700      21100

答案2

得分: 1

试图使用 *data.table* 的方法：
    library(data.table)
    setDT(example_df)
    
    example_df[, cmb := as.integer(as.factor(comboindex))]
    example_df[
        example_df,
        on=.(indexa, indexb, indexc, cmb < cmb),
        {
            s1 = x.number1 + i.number1
            s2 = x.number2 + i.number2
            data.table(x.comboindex, i.comboindex, s1, s2)[s1 > 0 & s2 > 0]
        },
        by=.EACHI,
        allow.cartesian=TRUE
    ]
    
    ##   indexa indexb indexc   cmb x.comboindex i.comboindex    s1    s2
    ##   <char> <char> <char> <int>       <char>       <char> <num> <num>
    ##1:      A      B      C     4            A            D   182    60
    ##2:      A      B      C     4            B            D   232   280
    ##3:    CCC    DDD    EEE     3            B            C   310    45
    ##4:    CCC    DDD    EEE     5            C            E   200 21845
    ##5:    CCC    DDD    EEE     5            D            E   700 21100
    ##6:    CCC    DDD    EEE     6            A            F   278    35
    ##7:    CCC    DDD    EEE     6            C            F   628    20

英文:

An attempt in data.table:

library(data.table)
setDT(example_df)
example_df[, cmb := as.integer(as.factor(comboindex))]
example_df[
    example_df,
    on=.(indexa, indexb, indexc, cmb &lt; cmb),
    {
        s1 = x.number1 + i.number1
        s2 = x.number2 + i.number2
        data.table(x.comboindex, i.comboindex, s1, s2)[s1 &gt; 0 &amp; s2 &gt; 0]
    },
    by=.EACHI,
    allow.cartesian=TRUE
]
##   indexa indexb indexc   cmb x.comboindex i.comboindex    s1    s2
##   &lt;char&gt; &lt;char&gt; &lt;char&gt; &lt;int&gt;       &lt;char&gt;       &lt;char&gt; &lt;num&gt; &lt;num&gt;
##1:      A      B      C     4            A            D   182    60
##2:      A      B      C     4            B            D   232   280
##3:    CCC    DDD    EEE     3            B            C   310    45
##4:    CCC    DDD    EEE     5            C            E   200 21845
##5:    CCC    DDD    EEE     5            D            E   700 21100
##6:    CCC    DDD    EEE     6            A            F   278    35
##7:    CCC    DDD    EEE     6            C            F   628    20

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Return all combinations of rows where the sum of two number columns each sum to >0 combined and have common index columns [R]

问题

答案1

答案2

Highchart 在 R 中的提示框

“geom_dotplot_interactive() 移除点的堆叠。”

检查事件是否以30秒的间隔发生。

如何根据分类变量在R中从某些行的值中减去其他行的值

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。