Kevin Hoffman's Blog
Blathering about development, dragons, and all that lies between
Constraining Types with DataKinds in Haskell
Exploring polymorphism and type constraints in Haskell with DataKinds
Late to the party as always, I discover why people love type systems.
I’m no stranger to type systems or functional programming. I still consider myself a worthless newb when it comes to Haskell, however. I have been trying to learn more and not let AI sap my inspiration and desire to tinker.
When I think about type systems, my mind usually wanders to Rust structs or even C++ class hierarchies. I’m used to defining constraints like this. For example, if I’m building a Player
type, it’s easy to define an experience_points
field that has a value between 0 and the largest signed 64-bit whole number.
As always, I think about game models when trying to learn new concepts. In classical OOP, I might define a GameObject
type, and then I might derive/inherit from that an Item
(which can have inventory and be in an environment) and a Player
. The core object has shared properties like ID
, name
, environment
, etc. Players have information like a name and experience points.
This is all pretty much review for me. My next step in modeling might be to add some code that validates these objects at runtime. The main constraint I want to add is that an object can only ever have its environment set to a reference to a room. You can only ever put a thing inside a room (small nit: I know inventory is a thing, but I have a different idea for that).
Constraining Types
Here is where my lack of experience with higher kinded types shows. I basically never consider the idea that I might be able to model a compile-time set of types that enforces my rules. This changes the game–everything goes from hoping that I catch all the edge cases at runtime to encoding those edge cases directly in types, making it physically impossible to use a type in a way that’s not allowed.
In the “old school” days, I would have a field like objectType
which might be an enum. Then I would write a bunch of tedious runtime logic to enforce rules. Let’s create a sum type for this instead:
data ObjectKind = RoomK | PlayerK | ItemK
Next I want to create references to these objects. Rooms are singletons, and so they lack an instance ID. Whereas players and items are clones of a prototype, and so include both the prototype name (e.g. std.player
or mordor.ring
) and an instance ID, which could be a short hex string like #f3ef7e
.
I could enforce this rule about references at runtime, and try and throw errors when the constructor is used to create an instance reference to a room. Or, now that my world has been expanded, I can create a data structure that takes a kind as a parameter and I can use one of the ObjectKind
variants as a tag of sorts.
-- Rooms have no instance, only a prototype
data ObjectRef (k :: ObjectKind) where
RoomRef :: Text -> ObjectRef 'RoomK
InstRef :: Text -> Text -> ObjectRef
showRef :: ObjectRef k -> Text
showRef (RoomRef proto) = proto
showRef (InstRef proto objId) = proto <> "#" <> objId
The 'RoomK
tag here is made possible through a language extension (DataKinds
) that creates kind constructors from regular data constructors. In ObjectRef
, it’s physically impossible to create a room reference with an instance ID because that function is of type Text -> ObjectRef 'RoomK
.
Room references print out like mordor.room12
while item and player references print like std.sword#f734e
. I could use this opportunity to make a dig at “generics” and how these type system constraints are better, but I won’t. Nope, I’ll take the high ground.
Let’s take a look at the ObjectData
type now. Remember this is a data structure that is enforcing compile-time constraints that I might otherwise have to code (and pray it works). DataKinds with tags makes it obvious what’s going on here:
-- All objects have the same core object data
data ObjectData (k :: ObjectKind) = ObjectData
{ objRef :: ObjectRef k
, objName :: Text
, objEnv :: ObjectRef 'RoomK
, objInventory :: [SomeInstRef]
, objVisible :: Visibility
, objPersistent :: Bool
}
objRef
here is something akin to self
, it’s what the object looks like as a reference. Note that it’s passing the kind parameter through to the ObjectRef
data type.
Next we see objEnv
. This field isn’t just a regular ObjectRef
, it’s a compile-time verified type that can only ever be a reference to a room via the ObjectRef 'RoomK
type.
Take a look at the objInventory
field. Here we have a list of objects that can be contained within the object. All objects (including rooms) can contain inventory, but rooms cannot be inventory. So let’s take a look at the SomeInstRef
type, which, as the name implies, only allows references to instances (which automatically excludes rooms).
-- Instanced Ref (no RoomRef allowed)
data InstancedRef (k :: ObjectKind) where
IsInst :: ObjectRef k -> InstancedRef k
mkInstancedRef :: ObjectRef k -> Maybe (InstancedRef k)
mkInstancedRef ref@(InstRef _ _) = Just (IsInst ref)
mkInstancedRef (RoomRef _) = Nothing
data SomeInstRef = forall k. SomeInstRef (InstancedRef k)
-- A wrapper for heterogeneous inventories
data SomeObjectRef = forall k. SomeRef (ObjectRef k)
deriving instance Show SomeObjectRef
Here a “smart constructor” is being used (mkInstancedRef
) which prevents the creation of an instance ref to a room object. Because the pattern match in mkInstancedRef
is just a regular pattern match, it is invoked at runtime and will give me a Nothing
if I try and create a reference to a room.
This isn’t necessarily a bad thing, but look back at the inventory type ([SomeInstRef]
). While eventually you’ll run into runtime problems using mkInstancedRef
the type system isn’t actually enforcing this at compile time. Compile time enforcement was kind of the point of this blog post.
So I created a type class called IsInstantiable
, which will only have members of 'ItemK
and 'PlayerK
. It might seem odd that players are instantiable, but think of every logged in player as some instance of the std.player
prototype, e.g. std.player#bob
or std.player#eff7de
.
class IsInstantiable (k :: ObjectKind)a
instance IsInstantiable 'PlayerK
instance IsInstantiable 'ItemK
-- Note: No instance for 'RoomK
It took me a while to realize that you can create instances of a type class that takes a kind as a parameter. It’s still kind of blowing my mind.
And now I can modify the SomeInstRef
type used to store heterogenous inventory to use this new constraint:
data SomeInstRef = forall k. IsInstantiable k =>
SomeInstRef (InstancedRef k)
So now the SomeInstRef
type (note the GADT syntax again) is simply not defined when the kind
type is 'RoomK
.
You haven’t yet seen the function that will add an object to an object’s inventory. While we know (indirectly) that we can’t create a data type unsuitable for being in inventory, we can also change the signature for addToInventory
so that only an IsInstantiable
type can be added to an inventory:
addToInventory :: IsInstantiable j => ObjectData k
-> InstancedRef j -> ObjectData k
addToInventory obj inst =
obj { objInventory = SomeInstRef inst : objInventory obj }
The addToInventory
function can now work like this:
addToInventory darkDungeon creepySnake
Another tiny thing to note is that we don’t pass a SomeInstRef
into the function, we pass an InstancedRef
, which is a data record. We only need the SomeInstRef
to store heterogenous inventory items in a single list of a single type.
Wrap-up
It might not be obvious at first. It might take a while to sink in. I know it didn’t quite click for me until I had to try out DataKind
s in anger when building something “real” rather than hello world (or worse, a math-related sample).
Think about building just the inventory system the “old” way. You would have to enforce the instantiability requirement at every single call point that attempted to add an item to inventory. That’s not bad, you might be thinking. Just make sure to route all of that through a single function that enforces the constraints.
That sounds good on paper, but I know from painful experience that such solutions only last a short time in production. At some point in the very near future, someone’s going to push an enhancement that circumvents the runtime check.
But if we are using data kinds, type classes, and other fundamental aspects of Haskell’s type system, we literally cannot write code that violates the rules. This is so much better in a thousand ways than runtime checking. Think about it, if your types are strict, then nobody will ever be able to push a pull request that sneaks a violation of the rules into the codebase.
I have a ton of other stuff that I’ve learned while tinkering with this, but I’ll save it for another post, or maybe even another video using my disembodied avatar. Hope you geek out on this stuff as much as I do.