组合折叠在一个序列上。

huangapple go评论66阅读模式
英文:

combining folds over a sequence

问题

我有一个惰性序列,我想在它上面运行多个for循环(例如,对一组数字进行循环,计算所有复合数的数量,并获得所有素数的总和)。

  • 我宁愿不将代码合并成一个循环,因为这会使单独更改其中一个循环变得更加困难,而有可能破坏其他部分。
  • 我宁愿不按顺序运行它们,因为生成列表的成本高昂,而且无法一次全部存储在内存中。

我记得从我的 Haskell 时代,for 循环等同于折叠操作,所以我想可能有一种方法将我的多个 for 循环转换成折叠操作,然后将它们批量处理成单个折叠,可以在单次迭代中运行,产生结果的元组。

我不知道这是否已经有一个名称,所以不知道在哪里寻找更多阅读材料或现有库。

英文:

I've got a lazy sequence, and I want to run several for-loops over it (e.g. over a list of numbers, count all the composite numbers, and get the sum of all the primes).

  • I would rather not combine the code into a single loop, because it makes it harder to change one in isolation without risking breaking the other.
  • I would rather not run them in sequence, because the list is expensive to produce and is too big to be stored in memory all at once.

I think I remember from my Haskell days that for-loops are equivalent to folds, so I figured there may be a way to convert my several for-loops into folds, and then batch them into a single fold that can be run over the data in a single iteration, yielding a tuple of the results.

I don't know if this already has a name, so I don't know where to look for further reading or existing libraries.

答案1

得分: 1

我不知道它是否有名字,除了*(左)折叠*之外。由于你提到了Haskell,它定义了一个左折叠如下:

ghci> :t foldl
foldl :: Foldable t => (b -> a -> b) -> b -> t a -> b

其中函数(b -> a -> b)是'折叠'函数,b是初始值。

所以,假设你想计算合数的数量并求素数的和,就像OP给出的示例一样,你可以定义两个独立变化的折叠函数:

countComposite :: (Integral a, Num b) => b -> a -> b
countComposite acc x = if isComposite x then acc + 1 else acc

sumPrimes :: Integral a => a -> a -> a
sumPrimes acc x = if isPrime x then acc + x else acc

这里假设存在适当的isPrimeisComposite函数。

现在你可以轻松将它们组合成一个累积值的函数:

ghci> foldl (\(a1, a2) x -> (countComposite a1 x, sumPrimes a2 x)) (0, 0) [1..30]
(20,129)

这个表达式用元组(0, 0)初始化折叠。'折叠'函数现在是一个lambda表达式,它将元组的值模式匹配为两个累加器值a1a2,然后通过调用每个单独的'折叠'函数来产生一个新的元组,每个元组中都有它们的累加器和x

可能的一般化

如果你想进行一般化,我们可以注意到sumPrimes显然是一个二进制操作(一个接受两个相同类型的值并产生该类型单个值的函数)。

如果a ~ bcountComposite也可以是一个二进制操作。

因此,值得探讨是否它们都是幺半群。我明白这些特定函数只是占位符函数,所以我不想详细研究这两个特定函数是否会导致幺半群。

然而,如果你的所有单独的累加器函数都是幺半群,那么它们可以组合,元组(或积类型)是它们的组合方式。

再次强调,幺半群可以累积。在Haskell中,这是一个通用折叠的特化,称为fold。(然而,那个特定的实现基于右折叠,所以我不确定它是否适用于OP的情况。即使经过7-8年的Haskell学习,有时候我仍然很难理解它的惰性在实践中是如何工作的。无论如何,从左边积累幺半群的想法仍然适用,如链接的fold文档也解释了。)

如果你想要更多面向对象的观点,幺半群是组合体,因此你还可以将累加器函数定义为多态对象,然后将它们组合成组合体累加器。

英文:

I don't know if it has a name other than (left) fold. Since you mentioned Haskell, it defines a left fold like this:

ghci> :t foldl
foldl :: Foldable t => (b -> a -> b) -> b -> t a -> b

where the function (b -> a -> b) is the 'folder' function and b the initial value.

So, assuming that you want to count the composites and sum the primes, as the OP gives as an example, you might define two separate folder functions that you can vary independently of each other:

countComposite :: (Integral a, Num b) => b -> a -> b
countComposite acc x = if isComposite x then acc + 1 else acc

sumPrimes :: Integral a => a -> a -> a
sumPrimes acc x = if isPrime x then acc + x else acc

Here assuming the existence of suitable isPrime and isComposite functions.

You can now easily combine them into a function that accumulates a tuple of values:

ghci> foldl (\(a1, a2) x -> (countComposite a1 x, sumPrimes a2 x)) (0, 0) [1..30]
(20,129)

This expression initializes the fold with the tuple (0, 0). The 'folder' function is now the lambda expression that pattern-matches the tuple values into two accumulator values a1 and a2 and then produces a new tuple by calling each of the individual 'folder' functions with each their accumulator and x.

Possible generalization

If you want to generalize, we may notice that sumPrimes is clearly a binary operation (a function taking exactly two values of the same type and producing a single value of that type).

countComposite could also be a binary operation, if a ~ b.

Thus, it might be relevant to investigate if both are monoids. I do realize that these particular functions are only placeholder functions, so I don't want to go into a detailed examination of whether or not these two particular functions give rise to monoids.

If, however, all your individual accumulator functions are monoidal, then they do compose, and tuples (or product types) are the way they compose.

And again, monoids accumulate. In Haskell, this is a specialization of a general fold simply called fold. (That particular implementation, however, is based on a right fold, so I'm not sure that it'll work in the OP scenario. Even after 7-8 years with Haskell, I sometimes have a hard time wrapping my head around the way that its laziness works out in practice. In any case, the idea of accumulating a monoid from the left can still be applied, as the linked fold documentation also explains.)

If you want a more object-oriented view, monoids are Composites, so you also define accumulator functions as a polymorphic object and then combine them into a Composite accumulator.

答案2

得分: 0

Rust的宏似乎足够强大,可以将for循环的源代码转换为表示fold的某种数据结构。

然后,给定一些fold的表示,也许类似于这样:

trait Fold {
  type Input;
  type Output;

  fn run<I: Iterator<Self::Input>>(self, input: I) -> Self::Output {
    for x in input {
      self.step(x);
    }
    self.complete()
  }

  fn step(&mut self, x: Self::Input);
  fn complete(self) -> Self::Output;
}

我认为你可以组合任意的fold以执行它们都在单次遍历中,类似于这样:

fn batch
  <T: Copy, A: Fold<Input = T>, B: Fold<Input = T>>
  (a: A, b: B)
  -> impl Fold<Input = T, Output = (A::Output, B::Output)>
{
  Batch{a,b}
}

struct Batch<A, B>{a: A, b: B}

impl<T, A, B> Fold for Batch<A, B> 
  where
    T: Copy,
    A: Fold<Input = T>,
    B: Fold<Input = T>,
{
  type Input = T;
  type Output = (A::Output, B::Output);

  fn step(&mut self, x: Self::Input) {
    self.a.step(x);
    self.b.step(x);
  }
  fn complete(self) -> Self::Output {
    (
      self.a.complete(),
      self.b.complete()
    )
  }
}
英文:

Rust's macros seem powerful enough that they may be able to convert the source code for a for-loop into some data structure representing a fold.

Then given some representation of a fold, maybe something like this:

trait Fold {
  type Input;
  type Output;

  fn run&lt;I: Iterator&lt;Self::Input&gt;&gt;(self, input: I) -&gt; Self::Output {
    for x in input {
      self.step(x);
    }
    self.complete()
  }

  fn step(&amp;mut self, x: Self::Input);
  fn complete(self) -&gt; Self::Output;
}

I figure you could combine arbitrary folds in order to perform them all in a single pass, something like this:

fn batch
  &lt;T:Copy, A:Fold&lt;Input = T&gt;, B:Fold&lt;Input = T&gt;&gt;
  (a: A, b: B)
  -&gt; impl Fold&lt;Input = T, Output = (A::Output, B::Output)&gt;
{
  Batch{a,b}
}

struct Batch&lt;A, B&gt;{a: A, b: B}

impl&lt;T, A, B&gt; Fold for Batch&lt;A, B&gt; 
  where
    T: Copy,
    A: Fold&lt;Input = T&gt;,
    B: Fold&lt;Input = T&gt;,
{
  type Input = T;
  type Output = (A::Output, B::Output);

  fn step(&amp;mut self, x: Self::Input) {
    self.a.step();
    self.b.step();
  }
  fn complete(self) -&gt; Self::Output {
    (
      self.a.complete(),
      self.b.complete()
    )
  }
}

huangapple
  • 本文由 发表于 2023年4月4日 09:38:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/75924878.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定