在强类型编程语言中对数据进行“sum-types”的映射函数。

huangapple go评论75阅读模式
英文:

Mapping functions over data in 'sum-types' in strongly typed programming languages

问题

今天我在F#中遇到了一个实现挑战,涉及一个大型DU(和类型),基本上我想能够在DU的值上映射一组函数,而不必在右侧重复匹配的情况。动机是,很容易意外返回错误的情况,而类型检查器显然无法捕获。

为了说明问题,这里是我为F#使用反射提出的一个候选解决方案 '模式':

open FSharp.Reflection
type MyData =
    | Foo of int
    | Bar of bool
    | Baz of string
    | Hello of int
    override this.ToString () =
        let (case, _ ) = FSharpValue.GetUnionFields(this, typeof<MyData>)
        case.Name
    static member map (f : int -> int) (g : bool -> bool) (h : string -> string) x : MyData =
        let f' = box << f
        let g' = box << g
        let h' = box << h
        let cases = FSharpType.GetUnionCases typeof<MyData>
        let matchCase = cases |> Array.find (fun case -> case.Name = x.ToString())
        let res =
            match x with
            | Foo v -> f' v
            | Bar v -> g' v
            | Baz v -> h' v
            | Hello v -> f' v
        FSharpValue.MakeUnion(matchCase, [| res |]) :?> MyData

let mapMyData : MyData -> MyData = MyData.map ((+) 1) not (fun s -> s.ToUpper())
let foo : MyData = Foo 42
let newFoo : MyData = mapMyData foo
let bar : MyData = Bar true
let newBar : MyData = mapMyData bar

现在我好奇有哪些其他解决方案适用于此问题,无论是在F#中还是在其他具有和类型的语言中。我很容易看到这在例如Haskell、Purescript、Idris、Rust中可能是一个普遍的问题。

问题可以归纳为:

  1. 在不重复右侧案例名称的情况下对和类型的值进行映射。
  2. 尽可能充分利用类型检查器以确保安全。
英文:

Today I encountered an implementation challenge with a large DU (sum-type) in F#, where I basically wanted to be able to map a set of functions over the values of the DU without having to repeat the matched case on the right hand side. The motivation being, that it would be quite easy to accidently return a wrong case, which the type checker obviously wouldn't be able to catch.
To illustrate the problem, here is one candidate solution 'pattern' I came up with for F# using reflection

open FSharp.Reflection
type MyData =
    | Foo of int
    | Bar of bool
    | Baz of string
    | Hello of int
    override this.ToString () =
        let (case, _ ) = FSharpValue.GetUnionFields(this, typeof&lt;MyData&gt;)
        case.Name
    static member map (f : int -&gt; int) (g : bool -&gt; bool) (h : string -&gt; string) x : MyData =
        let f&#39; = box &lt;&lt; f
        let g&#39; = box &lt;&lt; g
        let h&#39; = box &lt;&lt; h
        let cases = FSharpType.GetUnionCases typeof&lt;MyData&gt;
        let matchCase = cases |&gt; Array.find (fun case -&gt; case.Name = x.ToString())
        let res =
            match x with
            | Foo v -&gt; f&#39; v
            | Bar v -&gt; g&#39; v
            | Baz v -&gt; h&#39; v
            | Hello v -&gt; f&#39; v
        FSharpValue.MakeUnion(matchCase, [| res |]) :?&gt; MyData

let mapMyData : MyData -&gt; MyData = MyData.map ((+) 1) not (fun s -&gt; s.ToUpper())
let foo : MyData = Foo 42
let newFoo : MyData = mapMyData foo
let bar : MyData = Bar true
let newBar : MyData = mapMyData bar

And now I am curious about what other solutions exists for this problem, in F# as well as in other languages that has sum-types. I can easily see this being a general problem in for example Haskell, Purescript, Idris, Rust.

The problem can be distilled down to:

  1. Mapping over the values of a sum-type without repeating the case names on the right-hand side.
  2. Leverage the type checker as much as possible for safety.

答案1

得分: 4

以下是您要翻译的内容:

The motivation being, that it would be quite easy to accidentally return a wrong case, which the type checker obviously wouldn't be able to catch.

如果将事物多态化到足够程度,类型检查器实际上可以捕捉到这种错误。我将使用Haskell来说明,因为这是我的母语,但我确信有F#的类似方法。

{-# LANGUAGE LambdaCase #-}

data MyDataParameterized a b c d
  = Foo a
  | Bar b
  | Baz c
  | Hello d

-- 这个函数只有*一种*良好类型化的实现,不会无限循环
quadramapParameterized :: (a -> a') -> (b -> b') -> (c -> c') -> (d -> d') ->
    MyDataParameterized a b c d -> MyDataParameterized a' b' c' d'
quadramapParameterized fa fb fc fd = \case
    Foo a -> Foo (fa a)
    Bar b -> Bar (fb b)
    Baz c -> Baz (fc c)
    Hello d -> Hello (fd d)

type MyData = MyDataParameterized Int Bool String Int

-- 有许多许多“错误”的实现方法,但
-- 正确的实现方法显然是正确的
quadramap :: (Int -> Int) -> (Bool -> Bool) -> (String -> String) ->
    MyData -> MyData
quadramap fInt fBool fString = quadramapParameterized fInt fBool fString fInt

当然,以这种方式做事情非常痛苦,因此从实际角度来看,大多数人不这样做。他们只需承担手动确保在一个顶级折叠中使用相同的构造函数的负担,然后使用该折叠来避免在尽可能多的地方手动模式匹配。与工作日程序员面临的其他负担相比,“左边和右边相同的构造函数”是非常轻松的 - 它是代码的一个非常局部的属性,通常在您出错时非常明显,以至于在您有意返回不同的构造函数的罕见情况下,需要一条评论来解释为什么以安抚读者。

英文:

> The motivation being, that it would be quite easy to accidently return a wrong case, which the type checker obviously wouldn't be able to catch.

If you make things polymorphic enough, the type checker actually can catch such errors. I'll speak in Haskell because it's my native tongue, but I'm sure there's an F# analog.

{-# LANGUAGE LambdaCase #-}

data MyDataParameterized a b c d
  = Foo a
  | Bar b
  | Baz c
  | Hello d

-- there is only *one* well-typed implementation of this function that doesn&#39;t loop forever
quadramapParameterized :: (a -&gt; a&#39;) -&gt; (b -&gt; b&#39;) -&gt; (c -&gt; c&#39;) -&gt; (d -&gt; d&#39;) -&gt;
    MyDataParameterized a b c d -&gt; MyDataParameterized a&#39; b&#39; c&#39; d&#39;
quadramapParameterized fa fb fc fd = \case
    Foo a -&gt; Foo (fa a)
    Bar b -&gt; Bar (fb b)
    Baz c -&gt; Baz (fc c)
    Hello d -&gt; Hello (fd d)

type MyData = MyDataParameterized Int Bool String Int

-- there are many, many &quot;wrong&quot; implementations of this type, but
-- the right one is just so obviously right
quadramap :: (Int -&gt; Int) -&gt; (Bool -&gt; Bool) -&gt; (String -&gt; String) -&gt;
    MyData -&gt; MyData
quadramap fInt fBool fString = quadramapParameterized fInt fBool fString fInt

Of course, doing things this way is wildly painful, so practically speaking mostly people don't. They just take on the burden of manually ensuring the same constructor is used on both sides in one top-level fold, then use that fold to avoid manual pattern matching in as many places as that makes sense. Compared to other burdens facing the workaday programmer, "same constructor left and right" is very light -- it's a very local property of your code, and usually blatantly obvious when you screw up, to the point that in the rare situation where you intended to return a different constructor it warrants a comment explaining why to assuage the reader.

答案2

得分: 4

以下是翻译好的部分:

我认为处理大数据结构的小部分的通常方法是使用某种泛型。对于您的特定示例,您可以使用“Scrap Your Boilerplate”泛型来编写一个相当简单的Haskell解决方案,该解决方案来自“syb”包:

import Data.Data               -- from `base`
import Data.Generics.Aliases   -- from `syb`

data MyData
  = Foo Int
  | Bar Bool
  | Baz String
  | Hello Int
  deriving (Show, Data)

myMap :: (Int -> Int) -> (Bool -> Bool) -> (String -> String) -> MyData -> MyData
myMap f g h x = gmapT (mkT f `extT` g `extT` h) x

这段代码如预期工作:

λ> map (myMap (*2) not ("hello "++)) [Foo 42, Bar True, Baz "world", Hello 5]
[Foo 84,Bar False,Baz "hello world",Hello 10]

与“常规”函数式语言工具不同,泛型通常具有在语言之间甚至在语言内部广泛不同的语法和设计,因此给定的解决方案可能非常特定于某种情况。

英文:

I think the usual approach for uniform processing of small parts of large data structures is to use some kind of generics. For your specific example, you can write a fairly straightforward Haskell solution using "Scrap Your Boilerplate" generics from the syb package:

import Data.Data               -- from `base`
import Data.Generics.Aliases   -- from `syb`

data MyData
  = Foo Int
  | Bar Bool
  | Baz String
  | Hello Int
  deriving (Show, Data)

myMap :: (Int -&gt; Int) -&gt; (Bool -&gt; Bool) -&gt; (String -&gt; String) -&gt; MyData -&gt; MyData
myMap f g h x = gmapT (mkT f `extT` g `extT` h) x

which works as expected:

λ&gt; map (myMap (*2) not (&quot;hello &quot;++)) [Foo 42, Bar True, Baz &quot;world&quot;, Hello 5]
[Foo 84,Bar False,Baz &quot;hello world&quot;,Hello 10]

Unlike "normal" functional language facilities, generics tend to have syntax and design that differs widely between languages, or even within a language, and a given solution may be very case-specific.

答案3

得分: 2

在Rust中至少(我不能代表你提到的其他语言)可以原地修改值而不需要获取和返回所有权:

enum MyData {
    Foo(i32),
    Bar(bool),
    Baz(String),
    Hello(i32),
}

impl MyData {
    fn map(&mut self, f: fn(&mut i32), g: fn(&mut bool), h: fn(&mut str)) {
        use MyData::*;
        match self {
            Foo(i) => f(i),
            Bar(b) => g(b),
            Baz(s) => h(s),
            Hello(i) => f(i),
        }
    }
}

Playground.

英文:

In Rust at least (I can't speak for the other languages you mention), one can mutate the value in-place rather than taking & returning ownership:

enum MyData {
    Foo(i32),
    Bar(bool),
    Baz(String),
    Hello(i32),
}

impl MyData {
    fn map(&amp;mut self, f: fn(&amp;mut i32), g: fn(&amp;mut bool), h: fn(&amp;mut str)) {
        use MyData::*;
        match self {
            Foo(i) =&gt; f(i),
            Bar(b) =&gt; g(b),
            Baz(s) =&gt; h(s),
            Hello(i) =&gt; f(i),
        }
    }
}

Playgroud.

答案4

得分: 1

只返回翻译好的部分:

  1. 如果您希望联合中的每种情况都是类型安全的,您可以使其类型安全。

  2. 访问者模式有效地执行此操作,它使用唯一类型编码联合的每种情况,所以您也可以这样做。

  3. 如果您愿意让类型检查器给出警告而不是显式错误,如果有人弄错了,您可以通过使其成为泛型来省去大部分样板代码。

  4. 调用者可能潜在地传递错误的函数,但对于我来说,为了简单起见,我可能会选择这种方法。

英文:

If you want each case in the union to be typesafe you can make it typesafe

type Foo = Foo&#39; of int
type Bar = Bar&#39; of bool

type MyData =
    | Foo of Foo
    | Bar of Bar
    static member bimapish (f : int -&gt; int) (g : bool -&gt; bool) x : MyData =
        match x with
        | Foo (Foo&#39; v) -&gt; Foo (Foo&#39; (f v))
        | Bar (Bar&#39; v) -&gt; Bar (Bar&#39; (g v))

let foo = Foo (Foo&#39; 42)
let newFoo = MyData.bimapish ((+) 1) not foo

OO Visitors effectively do this, they encode each case of the union with a unique type, so you can probably do it that way too.

or have i missed the point?
(I accept I have repeated the case name, but your motivation was type safety, not brevity)?


someone put a comment in and deleted it, I think they objected that I was allowing this error...fair enough...but there's a cost.

type Foo = Foo&#39; of int
type Bar = Bar&#39; of int

type MyData =
    | Foo of Foo
    | Bar of Bar
    static member bimapish (f : int -&gt; int) (g : int -&gt; int) x : MyData =
        match x with
        | Foo (Foo&#39; v) -&gt; Foo (Foo&#39; (f v))
        // WRONG this is applying an &#39;f&#39; to a Bar!!! but not type error
        | Bar (Bar&#39; v) -&gt; Bar (Bar&#39; (f v)) 

let foo = Foo (Foo&#39; 42)
let newFoo = MyData.bimapish ((+) 1) ((+) 1) foo

in which case you would have to effectively make the 'map' functions typesafe...but this is becoming verbose to the caller.

type Foo = Foo&#39; of int
type Bar = Bar&#39; of int

type MyData =
    | Foo of Foo
    | Bar of Bar
    static member bimapish (f : Foo -&gt; Foo) (g : Bar -&gt; Bar) x : MyData =
        match x with
        | Foo v -&gt; Foo (f v)
        //| Bar v -&gt; Bar (f v) // now get a type error!
        | Bar v -&gt; Bar (g v) // this works though

let foo = Foo (Foo&#39; 42)
let newFoo = 
    MyData.bimapish 
        (fun (Foo&#39; x) -&gt; Foo&#39; (x + 1)) 
        (fun (Bar&#39; x) -&gt; Bar&#39; (x + 2)) 
        foo

If you are happy for the type checker to give you a warning rather than an explicit error if someone gets it wrong, you can dispense with much of the boiler plate by simply making it generic

type MyData&lt;&#39;a,&#39;b&gt; =
    | Foo of &#39;a
    | Bar of &#39;b
    static member bimapish (f : &#39;a -&gt; &#39;a) (g : &#39;b -&gt; &#39;b) x : MyData&lt;&#39;a,&#39;b&gt; =
        match x with
        | Foo v -&gt; Foo (f v)
        // this DOESNT cause an error but does cause a warning, &#39;a has to be the same as &#39;b...which is enough for me to know its wrong
        //| Bar v -&gt; Bar (f v)
        | Bar v -&gt; Bar (g v)

let foo = Foo 42
let newFoo = 
    MyData.bimapish 
        ((+) 1) 
        ((+) 2) 
        foo

ok, and the caller CAN potentially pass the wrong functions.
but for me I'd probably go with this simply for simplicity.

huangapple
  • 本文由 发表于 2023年6月21日 23:27:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76524919.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定