英文:
Excel - Dynamic SUM formula [0,1]
问题
我试图制作一个动态的Excel公式,用于在一行中的值为1后跟0时求和。如果在最后一个0之后出现了1,那么再次求和,直到达到另一个1。
如图所示,F6中的第一个值必须为323(202+47+74),然后出现一个1,但下面没有0,所以值保持不变(1997)。然后又是一个1,后面跟着两个0,合计为9(4+3+2),依此类推。
英文:
I trying to make a dynamic Excel formula that sums values in a row when the values in another row are 1 followed by 0. If after the last 0, there is a 1, the sum again until you reach another 1.
As you can see in the picture, the first value in the F6 must be 323 (202+47+74), then a 1 appears but no 0 below, so the value remains the same (1997). Then another 1 followed by two 0 that makes 9 (4+3+2), and so on.
答案1
得分: 5
以下是翻译好的部分:
• 在单元格 F2
中使用的公式
=LET(
a,B2:B13,
b,C2:C13,
c,SCAN(0,a,LAMBDA(x,y,x+y)),
d,UNIQUE(c),
IF(a=0,"",INDEX(MMULT(N(TOROW(c)=d),b),XMATCH(c,d))))
备注: 请参考下面每个变量的作用:
• 首先,我们获取两个范围,即 B2:B13
和 C2:C13
,并将它们分别定义为 a
和 b
。
• 接下来,我们使用 SCAN()
函数返回一个数组,初始值为 0
,它遍历序列 a
中的每个值,使用 LAMBDA()
将当前的 x
加到前面迭代的 y
的累积值中。返回值赋给 c
。
• 现在,我们使用 UNIQUE()
函数从 c
中删除任何重复项。
• 接下来,我们使用 MMULT()
执行矩阵乘法。
• 最后,我们使用 IF()
和 INDEX()
以及 XMATCH()
来返回所需的输出。
==> XMATCH()
函数用于在数组 d
中查找每个值 c
的位置,该位置用于在矩阵乘法中返回相应的值。最后,将其包装在 IF()
函数中,该函数检查数组 a
中的任何值是否等于 0
,如果是,则返回空值,否则使用 INDEX()
和 XMATCH()
返回的值。
当然,您不必填充下去,因为它会自动动态填充。
英文:
Here is another alternative,
• Formula used in cell F2
=LET(
a,B2:B13,
b,C2:C13,
c,SCAN(0,a,LAMBDA(x,y,x+y)),
d,UNIQUE(c),
IF(a=0,"",INDEX(MMULT(N(TOROW(c)=d),b),XMATCH(c,d))))
Notes: Refer below what each variable does:
• Firstly, we are taking both the ranges i.e. B2:B13
& C2:C13
and defining them as a
& b
respectively.
• Next, we are using SCAN()
function to return an array, with an initial value of 0
it iterates over each value in the sequence of a
which adds the current x
to the accumulated value of y
from the previous iterations using the LAMBDA()
. The return is assigned as c
.
SCAN(0,a,LAMBDA(x,y,x+y))
• Now, we are using UNIQUE()
function to remove any duplicates from c
=LET(
a,B2:B13,
b,C2:C13,
c,SCAN(0,a,LAMBDA(x,y,x+y)),
d,UNIQUE(c),
d)
• Next, we are performing a matrix multiplication using the MMULT()
.
N(TOROW(c)=d)
The above compares each value in the array c
with each in the array d
and returns a BOOLEAN
array of the same size where if its equal then TRUE
else FALSE
. Then the MMULT()
performs the matrix multiplication which results in same numbers of rows and columns as in b
where it sums the product of each value of BOOLEAN
with corresponding b
.
MMULT(N(TOROW(c)=d),b)
• Finally we are using an IF()
and INDEX()
wtih XMATCH()
to return the required output as desired.
==> The XMATCH()
function is used to find the position of each value of c
within the array d
which is used within the INDEX()
to return the corresponding values of the above matrix multiplication. Lastly wrapping it within the IF()
function which checks if any value in the array a
is equal to 0
then it returns empty else it gives the value returned using INDEX()
& XMATCH()
.
Ofcourse, you don't have to fill down, as it will spill dynamically.
答案2
得分: 4
这也可以在不使用帮助函数/lambda的情况下使用MMULT来完成:
=LET(a, B3:B14,
b, C3:C14,
x, SEQUENCE(ROWS(a)+1),
y, DROP(MMULT(--(TOROW(x)<=x),--VSTACK(a,1)),-1),
IF(a,MMULT(--(TOROW(y)=y),b),""))
它首先在公式中存储了范围a
和b
以便于引用。
然后,x
是范围a
的行数+1的计数器***。
x
在y
中用于检查x是否等于或大于范围内其他值乘以来自范围a
的这些行的1
的数量***。
* 在范围a
中添加了额外的1
,以模拟范围a
中的最终1
,直到我们希望最终求和的地方为止。
因此,y
是一个计数器,从找到的第一个1
开始,如果a
中的值再次等于1
,则增加+1。
在计算完这个之后,不再需要末尾的辅助行,因为它计算到应该求和的位置。因此,我们删除(去除)y
中的最后一行,以使其与要求和的范围(b
)大小兼容。
最后,它检查a
是否为1
。如果是,它将对b
中的行求和,其中y
中的行与y
的该行中的数字相等。
英文:
This could also be done without helpers/lambda using MMULT:
=LET(a, B3:B14,
b, C3:C14,
x, SEQUENCE(ROWS(a)+1),
y, DROP(MMULT(--(TOROW(x)<=x),--VSTACK(a,1)),-1),
IF(a,MMULT(--(TOROW(y)=y),b),""))
It first stores ranges a
and b
for easy reference in the formula.
Then x
is a counter of the number of rows of range a
+1 *.
x
is used in y
to check if x is equal to or greater than other values in the range multiplied by the qty of 1
found in these rows from range a
*.
* An extra 1
is added to the range a
in order to simulate a final 1
in the range a
up to where we want to sum the final values evetually.
So y
is a counter that starts at the first found 1 and adds up +1 if value in a
equals 1 again.
After having calculated this the helper row at the end is no longer needed, since it calculated up to where it should sum. Therefore we remove (drop) the last row from y
for size compatibility to the range to sum (b
).
Finally it checks if a
is 1
. If it does it sums the rows of b
where the rows in y
equal the number in that row of y
.
答案3
得分: 3
Sum Up Groups
=LET(o,B3:B14,v,C3:C14,c,1,n,"",
r,ROWS(o),rs,SEQUENCE(r),iss,FILTER(rs,o=c,""),
ies,IF(ROWS(iss)=1,r,VSTACK(DROP(iss-1,1),r)),
MAP(rs,LAMBDA(mr,
IF(INDEX(o,mr)<>c,n,LET(
ri,XMATCH(mr,iss),is,INDEX(iss,ri,1),
ie,INDEX(ies,ri,1),sr,SEQUENCE(ie-is+1,,is),
SUM(INDEX(v,sr)))))))
英文:
Sum Up Groups
=LET(o,B3:B14,v,C3:C14,c,1,n,"",
r,ROWS(o),rs,SEQUENCE(r),iss,FILTER(rs,o=c,""),
ies,IF(ROWS(iss)=1,r,VSTACK(DROP(iss-1,1),r)),
MAP(rs,LAMBDA(mr,
IF(INDEX(o,mr)<>c,n,LET(
ri,XMATCH(mr,iss),is,INDEX(iss,ri,1),
ie,INDEX(ies,ri,1),sr,SEQUENCE(ie-is+1,,is),
SUM(INDEX(v,sr)))))))
答案4
得分: 3
这是您要翻译的内容:
"这里有另一种数组解决方案,即一次性将整个结果全部展示出来(公式1):
=LET(A,A1:A12, B,B1:B12, n,ROWS(A), seq,SEQUENCE(n),
MAP(seq, LAMBDA(s,IF(INDEX(A,s)=1,LET(end, @FILTER(seq, (seq>s) * (A=1),n+1),
SUM(FILTER(B, (seq>=s) * (seq<end)))),""))))
或者您可以使用这个备用方案(公式2)。前一个公式有一些额外的计算开销,因为我们只需要获取第一个FILTER
调用的第一个元素,但我们获取了整个FILTER
的输出。下面的公式避免了这种情况。
=LET(A,A1:A12, B,B1:B12, n,ROWS(A), seq,SEQUENCE(n), idx,IF(A=1,seq,0),
MAP(idx, LAMBDA(x, IF(x=0,"", SUM(FILTER(B, (seq>=x)
* (seq<XLOOKUP(x+1,idx,idx,n+1,1))))))))
在公式1中,它通过MAP
迭代遍历输入(seq
)的所有索引位置。在每次迭代(s
)中,它通过INDEX
检查给定索引s
是否等于1
(否则返回空字符串),然后通过FILTER
函数找到大于当前迭代值(s
)的下一个1
值的索引位置(end
)。由于我们只关心FILTER
输出的第一个值(下一个1
值的索引位置),因此我们使用隐式交集运算符:@。如果条件不匹配,那么我们使用FILTER
的第三个输入参数返回n+1
,其中n
是输入数据的行数。
由于end
表示下一个1
值的索引位置,或者在没有找到更多1
值的情况下为行数加一,现在我们可以再次使用FILTER
来选择B列的值(B
)从s
到end-1
的索引位置,并对其求和。
在公式2中,它首先识别A
等于1
的索引位置(idx
),否则返回0
。根据定义,idx
的非零值是按升序排列的。为了确定区间的结束位置(A
中下一个1
值的位置),它使用XLOOKUP
进行近似搜索(1
-等于或大于),以查找x
的下一个元素,即x+1
的位置。它返回A
中下一个1
值的位置,否则返回n+1
。因此,要求求和的B
范围是在x
和XLOOKUP
的输出之间的seq
索引位置。
性能分析
这里基于不同的情景总结了不同答案在此问题的性能。我考虑了以下情景:
- A:最坏情况,第一行有一个
1
,其余为0
- B:列A的随机集合,
0
明显多于1
- C:列A的随机集合,
1
和0
均匀分布。
在所有情况下,我考虑了7000
行的输入。
为了生成非均匀的[0,1]
分布,我使用了以下公式:
=LET(rnd, RANDARRAY(10000,1,1,10,1),IF(rnd=1,1,0))
我考虑了以下解决方案:
- 由@VBasic2008提供,其中包括他在答案评论部分指出的更正。
- 由@MayukhBhattacharya提供。这个解决方案使用了
MMULT
,但适用于超过7500
行的情况。 - 由@P.b提供的初始方法,而不是在评论部分提供的方法,后者没有提供正确的结果。值得注意的是,这个解决方案在约
7500
行左右时停止工作。 - 由@DavidLeal提供的公式2,使用
XLOOKUP
。公式1效率低下,因此未在分析中考虑。
以下是Excel桌面的结果:
情景 | MayukhBhattacharya | VBasic2008 | DavidLeal | P.b |
---|---|---|---|---|
A | 10毫秒 |
1,070毫秒 |
10毫秒 |
7,300毫秒 |
B | 560毫秒 |
1,310毫秒 |
970毫秒 |
7,710毫秒 |
C | 2,640毫秒 |
2,270毫秒 |
4,780毫秒 |
7,590毫秒 |
我会说由@MayukhBhattacharya提供的解决方案在所有测试的情况下都是最好的,然后是由@VBasic2008提供的解决方案,它的性能相当不错,但在最坏情况(A)下需要花费相当多的时间。
引起我的注意的是,使用Excel for Web(免费版本)运行相同的测试时,我得不到相同的结果。再次,MayukhBhattacharya和VBasic2008是最佳解决方案,但这次由VBasic2008提供的解决方案性能更好:
| 情景 | May
英文:
Here another array solution approach, i.e. it spills the entire result all at once (formula 1):
=LET(A,A1:A12, B,B1:B12, n,ROWS(A), seq,SEQUENCE(n),
MAP(seq, LAMBDA(s,IF(INDEX(A,s)=1,LET(end, @FILTER(seq, (seq>s) * (A=1),n+1),
SUM(FILTER(B, (seq>=s) * (seq<end)))),""))))
or you can use this alternative (formula 2). The previous formula has a little overhead calculation, because we need to get just the first element of the first FILTER
call, but we get the entire FILTER
output. The following formula avoids that.
=LET(A,A1:A12, B,B1:B12, n,ROWS(A), seq,SEQUENCE(n), idx,IF(A=1,seq,0),
MAP(idx, LAMBDA(x, IF(x=0,"", SUM(FILTER(B, (seq>=x)
* (seq<XLOOKUP(x+1,idx,idx,n+1,1))))))))
Here is the output for formula 1:
In formula 1 it iterates via MAP
over all index positions of the input (seq
). On each iteration (s
) it checks for the given index s
if the column A values (A
) is equal to 1
via INDEX
. If that is the case (otherwise it returns an empty string), it finds the index position (end
) of the next 1
value for index position greater than the current iteration value (s
) via FILTER
function. Since we are only interested in the first value (index position of the next 1
value) of the FILTER
output we use Implicit intersection operator: @. If the condition doesn't match, then we use the third input argument of FILTER
to return n+1
, where n
is the number of rows of the input data.
Since end
represents the index position of the next 1
value or the number of rows plus one in case no more 1
value were found, now we can use FILTER
again to select B column values (B
)
from s
to end-1
index positions and sum it.
In formula 2 it identifies first the index positions where A
is equal to 1
(idx
), otherwise returns 0
. By definition non zero values of idx
are in ascending order. To identify the end of the interval (position of the following 1
value in A
), it uses XLOOKUP
with approximate search (1
-equal or greater) to look for the next element of x
, i.e. x+1
. It returns the position of the next 1
value in A
, otherwise the n+1
. Therefore the range of the B
to sum is filtered for the index positions seq
between x
and the output of XLOOKUP
.
Performance Analysis
Here a summary based on different scenarios to measure the performance of different answers provided to this question. I am considering the following scenarios:
- A: worse case scenario, 1 in the first row and the rest with zeros
- B: Random set for column A, with significant more
0
s than1
s - C: Random set for column A, with uniform distribution for
1
s and0
s.
In all cases I am considering an input of 7000
rows.
To generate a non-uniform [0,1]
-distribution I use the following:
=LET(rnd, RANDARRAY(10000,1,1,10,1),IF(rnd=1,1,0))
I am considering the following solutions:
- Provided by: @VBasic2008 with the correction indicated in the comment section of his answer.
- Provided by: @MayukhBhattacharya. This solution uses
MMULT
, but it works for more than7500
rows. - Provided by: @P.b (initial approach, not the approach provided in the comment section, which doesn't provide a correct result). Worth to notice that this solution stops working around
7500
rows. - Provided by @DavidLeal formula 2 using
XLOOKUP
. The formula 1 was inefficient, so it was not considered in the analysis.
Here are the results for Excel Desktop:
Scenario | MayukhBhattacharya | VBasic2008 | DavidLeal | P.b |
---|---|---|---|---|
A | 10ms |
1,070ms |
10ms |
7,300ms |
B | 560ms |
1,310ms |
970ms |
7,710ms |
C | 2,640ms |
2,270ms |
4,780ms |
7,590ms |
I would say the solution provided by @MayukhBhattacharya is the best one in all scenarios tested, then the solution provided by @VBasic2008, it works pretty well, but it fails for the worse case scenario (A) taking significant time.
It brought my attention that running the same test under Excel for Web (free version). I don't get the same results. Again MayukhBhattacharya and VBasic2008 are the best solutions, but this time the solution provided by VBasic2008 performs better:
Scenario | MayukhBhattacharya | VBasic2008 | DavidLeal | P.b |
---|---|---|---|---|
A | 0ms |
0ms |
10ms |
9,950ms |
B | 770ms |
40ms |
1,570ms |
10,080ms |
C | 3,640ms |
850ms |
9,930ms |
10,440ms |
I tested also @JvdV, provided in the comment section of P.b solution, which is at the end a variation of @MayukhBhattacharya approach. @JvdV is more efficient than @P.b solution, but worse than @MayukhBhattacharya solution.
Worth to mention that @MayukhBhattacharya solution can be optimized because the input argument lookup_array
from XMATCH
is sorted in ascending order, so we can use a binary search in XMATCH
using the input argument search_mode=2
. Which provides an improvement around 9%
. The same optimization applies to @VBasic2008 solution, since it uses XMATCH
with lookup_array
in ascending order.
Here is the link to the Excel file used for doing the performance analysis. The summary of the result is on the first tab. Be aware the file has the following configuration: Formulas -> Calculation Options->Manual, to avoid any specific calculation interfere other results. It uses volatile Excel functions (RANDARRAY
, NOW
) so it avoids automatic recalculation.
It would be interesting to verify the results by others.
Conclusion
Using the idea of finding start/end for each group, used by @VBasic2008 and by @DavidLeal. Using INDEX
performs better than FILTER
. For example @DavidLeal solution modified to remove FILTER
as follows has a similar performance as @VBasic2008 solution, but not better:
= LET(A,A1:A12, B,B1:B12, n,ROWS(A), seq,SEQUENCE(n), idx,
IF(A=1,seq,0), MAP(idx, LAMBDA(x, IF(x=0,"",
LET(xe, XLOOKUP(x+1,idx,idx,n+1,1)-1,SUM(INDEX(B, SEQUENCE(xe-x+1,,x))))))))
@VBasic2008 solution has a simple way to find start/end of the intervals, compared to the previous formula, therefore the performance is better.
Solutions using MMULT
to identify each group, work better when the calculation involves a reduced portion such as in @MayukhBhattacharya solution, compared to @P.b solution. Which is also good to avoid any possible Excel limits.
Both approaches are good strategy, taking into account previous considerations.
Still an open question, why some scenarios work better depending on Excel platform (Desktop, Web) used. Probably internally the functions are not implemented in the same way, or different version for the same functions used.
答案5
得分: 1
你可以使用一个(隐藏的)额外列来轻松实现这一点。在隐藏列中保持累计总数,并仅在[0, 1]列为1时显示该累计总数。
请注意,在未来的帖子中,请将数据以我们可以复制的表格形式提供,而不是作为截图。
与数据中的行相匹配,在隐藏列中的第3行(F是隐藏列):
=IF(B3=1, IF(B4 = 1, C3, D4+C3), IF(B4 = 1, C3, C3+D4))
将其复制到F列。
在显示列中的第3行(这里是G):
=IF(B3 = 1, F3, "")
当然,将其复制到G列。
| A | B | C | G |
---|---|---|---|---|
1 | | | | |
2 | | | | |
3 | | 1 | 202 | 323 |
4 | | 0 | 47 | |
5 | | 0 | 74 | |
6 | | 1 | 1997 | 1997 |
7 | | 1 | 4 | 9 |
8 | | 0 | 3 | |
9 | | 0 | 2 | |
10 | | 1 | 2 | 2 |
11 | | 1 | 5 | 5 |
12 | | 1 | 1981 | 1981 |
13 | | 1 | 3 | 11 |
14 | | 0 | 8 | |
英文:
You can do this easily with a (hidden) extra column. Keep the running total in the hidden column, and display that running total only when the [0, 1] column is 1.
Please note, in future posts provide the data as a table we can copy, not as a screenshot.
Matching the rows in your data, in row 3 in the hidden column (F is the hidden column here):
=IF(B3=1, IF(B4 = 1, C3, D4+C3), IF(B4 = 1, C3, C3+D4))
Copy that down column F.
In row 3 in the display column (G here):
=IF(B3 = 1, F3, "")
Of course, copy that down column G.
| A | B | C | G |
---|---|---|---|---|
1 | | | | |
2 | | | | |
3 | | 1 | 202 | 323 |
4 | | 0 | 47 | |
5 | | 0 | 74 | |
6 | | 1 | 1997 | 1997 |
7 | | 1 | 4 | 9 |
8 | | 0 | 3 | |
9 | | 0 | 2 | |
10 | | 1 | 2 | 2 |
11 | | 1 | 5 | 5 |
12 | | 1 | 1981 | 1981 |
13 | | 1 | 3 | 11 |
14 | | 0 | 8 | |
答案6
得分: 1
以下是翻译好的部分:
比许多其他人建议的更简单的答案如下(受 @richardcook 启发):
在列 B 中有多个零时,请使用辅助列:
辅助列的公式是=IF(AND(C4=0,C5=0),D4+F5,D4)
,最终答案的公式是=IFS(C4=0,,AND(C4=1,C5=1),D4,AND(C4=1,C5=0),D4+F5)
if 块的作用类似于 RichardCook 的答案,其中公式检查初始列是否为 1 或 0,如果列为 1,则检查是否有多个零。如果有多个零,则使用辅助列找到总数。
英文:
A simpler answer than many others are suggesting is as follows (inspired by @richardcook)
Have a helper column for when there are multiple zeroes in column B:
The formula for the helper column is =IF(AND(C4=0,C5=0),D4+F5,D4)
and the formula for the final answer is =IFS(C4=0,,AND(C4=1,C5=1),D4,AND(C4=1,C5=0),D4+F5)
What the block of ifs is doing is similar to RichardCook's answer where the formula checks to see if the initial column is 1 or 0 and if the column is 1 checks to see if there are multiple zeroes. If there are multiple zeroes it uses the helper column to find the total.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论