英文:
Restricting a type variable to a class type in a data declaration in Haskell
问题
我想声明一种数据类型,该数据类型由可比较元素的列表构成。这是因为我编写了一个只在其输入列表已排序时才能正确工作的函数,我希望通过某种方式让编译器阻止它在未排序的列表上被意外使用(虽然我认为无法阻止用户说谎,但至少我希望他们声明列表已排序)。
在我看来,除非列表的元素在Ord中,否则排序列表没有意义,我想尝试让编译器至少强制执行这一点。
data WrappedList a = WrappedList [a]
data SortedList (Ord a) => SortedList a = SortedList [a]
SortedList是我失败的尝试之一,而WrappedList实际上是可以编译的。我找不到任何这方面的示例,所以也许我完全错了?
(注:我刚刚开始学习Haskell,这只是一个玩具问题。)
英文:
I want to declare a data type that is constructed from a list of comparable elements. This is because I wrote a function that only works correctly if its input lists are sorted, and I wanted a way to get the compiler to stop its being accidentally used on unsorted lists. (I don't think there's any way to stop a user lying, but I at least wanted them to declare the lists as sorted).
It doesn't make sense (to me) to have a sorted list unless the elements of the list are in Ord, and I wanted to try to get the compiler to enforce at least this much.
data WrappedList a = WrappedList [a]
data SortedList (Ord a) => a = SortedList [a]
SortedList is one of my failed attempts, whilst WrappedList does actually compile. I can't find any examples of this, so maybe I've missed the point completely?
(N.B. I've just starting learning Haskell and this is a toy problem.)
答案1
得分: 3
你可以使用 DataTypeContexts 扩展来声明它,但正如文档中所描述的,它被认为是一个缺陷,不鼓励使用。
你可以在 DatatypeContexts Deprecated in Latest GHC: Why? 中找到为什么它被认为是一个缺陷的原因。
data Ord a => SortedList = SortedList [a]
而不是将约束添加到你的数据类型中,你可以将约束添加到实际需要这些约束的函数中。例如,你可以将 Ord a =>
添加到构建 SortedList
的函数或使用二分查找查找元素的函数中。
buildSortedList :: Ord a => [a] -> SortedList [a]
buildSortedList = ...
英文:
You can declare it with DataTypeContexts extension, but as described in the document, it's considered a misfeature and you're discouraged to use it.
You can find why it's considered a misfeature at DatatypeContexts Deprecated in Latest GHC: Why?.
data Ord a => SortedList = SortedList [a]
Instead of adding constraints to your data type, you can add constraints to functions that actually need the constraints. For example, you can add Ord a =>
to a function to build SortedList
or a function to find an element using binary search.
buildSortedList :: Ord a => [a] -> SortedList [a]
buildSortedList = ...
答案2
得分: 3
如果你想创建一个名为 SortedList
的类型,你想要编写的函数很可能会具有类似以下的类型签名:
myFunc :: Ord a => SortedList a -> ...
如果 myFunc
没有 Ord
约束,那么它将无法使用比较函数(这意味着它无法观察列表是否已排序;这并不是说已排序完全没有用,但你将失去已排序的知识带来的大部分效用)。
这意味着如果有人想要使用不带 Ord
约束的类型调用 myFunc
,编译器仍然会对其进行调用。因此,如果他们无法证明 Ord a
,则实际上不会捕获任何额外的错误。因此,只是简单地向 SortedList
自身添加一个 Ord a
约束可能不值得花费太多精力;它实际上为你提供了很少的好处。像这样的一个简单类型:
data SortedList a = SortedList [a]
也可以用来创建这种情况,即很难无意中在未排序的列表上调用你的函数(调用者必须故意欺骗你,或者至少对来自其他地方的列表是否已排序表示疏忽),并且对于那些对 SortedList
中的顺序有任何有趣用途的函数的 Ord
约束将捕获任何试图使用元素类型实际上无法排序(通过规范顺序)的列表的人。
实际上,Haskell 以前实际上有一种做到你想要的功能的方法,但在使用过程中,专家和社区共识是最好根本不使用这个功能。它已被弃用,在较新版本的 GHC 中不再默认可用,最终将被完全移除。因此,由于你正在学习 Haskell,我建议你根本不要学习如何使用这个功能。学会在没有它的情况下解决问题是一项有用的技能,将会在未来的 Haskell 中发挥作用;依赖它是一条死路。
如果你真的想在构造 SortedList
包装器的地方进行 Ord a
检查,那么在更现代的 Haskell 中实现这一点的方法是使用 GADTs
语言扩展。然而,这是一项更高级的 Haskell 功能,所以在你还在学习的时候可能不应该尝试处理它。
但为了完整起见,你可以编写如下类型:
data SortedList a where
SortedList :: Ord a => [a] -> SortedList a
这意味着当应用 SortedList
构造器时,编译器不仅会检查 Ord a
,它实际上会将 Ord
实例存储在 SortedList
值中。然后,使用 SortedList
的函数实际上不需要 Ord
约束,因为它们在模式匹配 SortedList
时可以访问存储的实例。
但要小心在所有地方使用这种技术;它可能会导致性能问题。如果你使用了许多具有存储实例的值,你可能会使用大量内存来存储对相同实例的引用(并花费时间来解引用所有这些引用)。更糟糕的是,这种技术会破坏编译器通常可以进行的许多优化。编译器通常可以内联和专门化具有类型类约束的函数,以便它们最终调用静态已知的类型类方法,这比通过运行时字典调用它们要快得多。当你管理字典传递(通过将它们存储在 GADTs 中)而不是编译器时,编译器很难进行这些优化。
如果你深入研究 GADTs,你还会发现它们允许你“隐藏”类型参数,这会引发一大堆需要牢牢掌握 Haskell 类型系统和语法才能掌握的问题。或者至少当出现问题时,你可能会得到非常令人困惑的错误消息,这使得它们很难纠正。
GADTs 是一项非常强大的功能,可以启用使用普通的 Haskell 数据类型无法完成的代码结构方式。我的个人建议是将它们保留用于像那样的情况,而不仅仅是为了更早地捕获编译器将在任何情况下捕获的错误。这是一个你必须自己权衡成本与收益的决策。但无论如何,学习和掌握 GADTs 在你的 Haskell 之旅中肯定是非常值得的!
如果你想进一步确实有一个编译器强制执行的数据类型,使其成为已排序列表,那么实际上有方法可以实现这一点。最直接的方法甚至不需要 GADTs
!技巧在于使用模块系统。(如果你还不熟悉编写多模块程序,那么可能将这个留到以后;让编译器强制执行你的不变性的冲动非常符合 Haskell 的精神,但在你刚刚入门的时候,你无法做到 Haskell 能做的一切)
在与其他代码*分离
英文:
Assuming you could create your SortedList
type, the function you want to write will most likely have a type that looks a bit like:
myFunc :: Ord a => SortedList a -> ...
If myFunc
doesn't have the Ord
constraint then it won't be able to use comparison functions (which would mean it's completely unable to observe whether the list is sorted or not; that doesn't make the fact that it's sorted completely useless, but certainly you would lose a large part of the utility of knowing it was already sorted).
That means that if someone wants to call myFunc
with a type that doesn't have an Ord
constraint, the compiler is going to call them on it anyway. So stopping them from constructing the SortedList a
if they can't prove Ord a
doesn't actually catch any additional errors. So simply adding an Ord a
constraint to SortedList
itself is probably not something you should put a lot of effort into; it actually buys you very little. A simple type like this:
data SortedList a = SortedList [a]
-- Actually it could be a newtype, which is a bit more efficient; but don't worry
-- about it if you don't know what that is yet
newtype SortedList a = SortedList [a]
works fine to create the situation where it's hard to accidentally call your function on an unsorted list (the caller would have to deliberately lie to you, or at least be negligent about assuming a list from elsewhere was sorted), and the Ord
constraints on functions that do anything interesting with the order in a SortedList
will catch anyone using your functions with lists that actually can't be sorted (by a canonical order) because their element type doesn't even have such an order.
Haskell in fact used to have a feature for doing exactly what you're after, but after the experience of having it the expert and community consensus was that it is better not to have the feature at all. It has been deprecated, is no longer available by default in newer GHC versions, and will be removed entirely at some point. So since you are learning Haskell, I recommend you simply never learn how to use this feature. Learning to solve problems without it is a useful skill that will carry over into future Haskell; learning to rely on it is a dead end.
If you really want the Ord a
check to be made at the point where the SortedList
wrapper is constructed the way to do that in more modern Haskell is to use the GADTs
language extension. This is however a more advanced Haskell feature, so probably not something you should try to tackle when you're still learning the ropes.
But for completeness, it would allow you to write a type like this:
data SortedList a where
SortedList :: Ord a => [a] -> SortedList a
This means that when the SortedList
constructor is applied the compiler will not just check Ord a
, it will actually store the Ord
instance inside the SortedList
value. Functions that use a SortedList
then don't actually need an Ord
constraint, because they get access to the stored instance when pattern matching on the SortedList
.
Be careful about using this technique everywhere though; it can cause significant performance issues. If you use many values with a stored instance you'll potentially use a lot of memory storing references to the same instance (and time dereferencing all those references). But even worse this technique defeats a lot of optimisations the compiler can usually make. The compiler can often inline and specialise functions with typeclass constraints so that they end up calling statically-known typeclass methods, which can be much faster than calling them through a runtime dictionary. When you're managing the dictionary passing (by having them stored in GADTs) instead of the compiler, it can become much harder for the compiler to make these optimisations.
If you dig further into GADTs you'll also find that they let you "hide" type parameters, and that opens a large can of worms that you need a very firm grasp on Haskell's type system and syntax to keep hold of. Or at least the error messages you're likely to get when something goes wrong are very confusing to a novice, which makes them hard to correct.
GADTs are a really powerful feature that enables ways of structuring your code that simply couldn't be done with vanilla Haskell data types. My personal rubrik is to save them for cases like that, and not use them merely to catch an error a bit earlier that the compiler would have caught anyway. That's a cost-benefit tradeoff you'll have to make for yourself. But certainly GADTs are well worth learning and mastering at some point in your Haskell journey!
If you want to go further and actually have a data type that the compiler enforces to be a sorted list, then there in fact are ways to do that. The most straightforward way doesn't even need GADTs
! The trick is to use the module system. (If you're not already comfortable with writing multi-module programs, then probably save this for later; your instinct to make the compiler enforce your invariants is very good Haskell, but you won't be able to do everything that Haskell is capable of while you're just getting started)
In a separate module from your other code (i.e. in a file SortedList.hs
), write something like this:
module SortedList
( SortedList -- note this just exports the *type* name, not the constructor
, fromUnsorted
, empty
, elements
, sortedInsert
, unsafeFromSorted
)
where
import Data.List (sort, insert)
newtype SortedList a = SL [a]
fromUnsorted :: Ord a => [a] -> SortedList a
fromUnsorted = SL . sort
empty :: SortedList a
empty = SL []
elements :: SortedList a -> [a]
elements (SL xs) = xs
sortedInsert :: Ord a => a -> SortedList a -> SortedList a
sortedInsert x (SL xs) = SL $ insert x xs
-- Optionally include a function like this to allow a programmer to declare
-- that a list is *already* sorted. Having an option like this is equivalent
-- to exporting the constructor, so it loses the guarantee that the list is
-- *always* sorted. But there are often ways to build a list that is sorted by
-- construction (e.g. [1..100]), so having to call `sort` on them is a
-- performance hit. Take your pick between performance and safety.
unsafeFromSorted :: [a] -> SortedList a
unsafeFromSorted = SL
The key thing is that we haven't exported the constructor SL
(which I've named differently from the SortedList
only to avoid confusion when I'm talking about them). Nobody outside this module can apply SL
to a random unsorted list. They would have to use the fromUnsorted
function which is going to sort the list. Alternatively if they already have a SortedList
they can use sortedInsert
to add a new element, but that maintains the property that the output list is sorted if the input was. (Or they can use empty
, since an empty list is obviously always sorted)
If those are the only ways that other parts of the program can get access to a SortedList
, then you always know that such a list is sorted. The elements
function extracts the elements as a raw list (needed since the constructor isn't available so nobody can pattern match to get them); this throws away guarantee but allows full access to ordinary list functions.
The sortedInsert
function isn't strictly necessary; you could always use fromUnsorted . insert x . elements
to do the same thing. But that is much less efficient, since it requires re-checking that the list is sorted to convert it back to a SortedList
at the end, and we know that insert
is always going to produce a sorted result if the input is sorted. So including an analogue of insert
makes it easier and more efficient to work with SortedList
s. Similarly you could think of other list operations that can be done (more efficiently) in a way that maintains the sorted invariant and include those. The more of this you do, the less need there is for something like unsafeFromSorted
, as people will be able to work with SortedList
s directly instead of working with ordinary lists and then converting them at the end.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论