Thursday, May 17, 2012

Phantom type examples in Ocaml

What are phantom types

Phantom types are a way for multiple types to have the same exact underlying representation but still be seen as a distinct type by the compiler. The common example is units of measurements. Feet and meters are both distances and are best represented as an int or float, but it makes no sense to add a foot to a meter. Phantom types allow you to solve this and still represent a foot and a meter exactly the same. One can think of this as the opposite of duck typing. In duck typing, if you only care if the current value you are working with has four legs, a cat a dog and a table are the effectively the same thing. In phantom types you care about what something actually represents and that means they are different, even though underneath they are represented with the same type.

The trick to phantom types is the module system. In Ocaml you have a module and that module defines the interface it exposes to the world. For phantom types part of that interface is providing a type, for example rstring, but not giving a concrete definition of it, leaving it as an abstract concept. Then the module definition is explicit that rstring is actually a string and it can use a type of rstring just like a type of string. So outside the module the compiler just knows some type exists by a name but no more, and inside the module the compiler knows that the same type is actually a string (for example) and all the string APIs are valid on it.

The repo for the examples is located here.

Examples

estring - Encoded strings

This code is my paraphrasing of Martin Jambon's example of phantom typing located here.

When working with strings you might want to make sure you don't mix and match strings of incompatible encodings. You probably don't want to append a utf8 string to a utf16 string. But underneath it all, both of these can be represented the same way and a bunch of operations are going to be the same. Concatenating two utf8 strings is the same operation as concatenating two utf16 strings. The estring module lets you express this. For example, we can write code like this:

let () =
  let s1 = Estring.make_utf8 "hello" in
  let s2 = Estring.make_utf8 "world" in
  let s3 = Estring.concat (Estring.make_utf8 " ") [s1; s2] in
  Printf.printf "%s\n" (Estring.str s3)

This says that we want to make two utf8 strings from a regular string, and then concatenate them together with " " as the separator. Then we convert that back to a string for output. This works because everything is the same estring type. The following code will not compile because things aren't the same type:

let () =
  let s1 = Estring.make_utf8 "hello" in
  let s2 = Estring.make_utf16 "world" in
  let s3 = Estring.concat (Estring.make_utf8 " ") [s1; s2] in
  Printf.printf "%s\n" (Estring.str s3)

One weakness of estring is that it is not extendible. Remember, it is only inside the estring module that the compiler knows it is a string. That means other modules cannot define a latin1 encoding type and use the estring module to create a latin1 estring type.

rstring - Read only string

Another use of phantom types is to provide read-only interfaces to mutable types. With rstring we provide a phantom type called rstring and then define a subset of the string API that we want to expose. The major downside to rstring is that it copies the string when you turn a string to an rstring and vice versa. This is for obvious reasons: you can't ensure that nobody else has a reference to the string unless you know you made the version you're referencing. Maybe something like Linear types would work well here. In a real system I would consider offering an unsafe_make and an unsafe_str set of functions that don't copy. That way you could avoid the copy if that is important (it probably isn't) but at least you know you're polluting your code with unsafe's. Code for rstring is pretty straight forward, usage looks like:

let () =
  let s1 = Rstring.make "hello" in
  let s2 = Rstring.make "world" in
  let s3 = Rstring.concat " " [s1; s2] in
  Printf.printf "%s\n" (Rstring.str s3)

Other ideas

Another common example of using phantom types that I did not provide is something like a file or socket interface. You can encode information, such as if you opened the file for reading or writing or both and then make it so it is a compile-time error if you try to write to a file opened for reading.

No comments:

Post a Comment