Monday, December 23, 2013

Gen_server in Ocaml

Note, this post is written against the 2.0.1 version of gen_server

Erlang comes with a rich set of small concurrency primitives to make handling and manipulating state easier. The most generic of the frameworks is the gen_server which is also the most commonly used. A gen_server provides a way to control state over multiple requests. It serializes operations and handles both synchronous and asynchronous communication with clients. The strength of a gen_server is the ability to create multiple, lightweight, servers inside an application where each operation inside of it runs in serial but individually the gen_servers run concurrently.

While it is not possible to provide all of the Erlang semantics in Ocaml, we can create something roughly analogous. We can also get some properties that Erlang can not give us. In particular, the implementation of gen_server provided here:

Tuesday, July 9, 2013

Experimenting in API Design: Riakc

Disclaimer: Riakc's API is in flux so not all of the code here is guaranteed to work by the time you read this post. However the general principles should hold.

While not perfect, Riakc attempts to provide an API that is very hard to use incorrectly, and hopefully easy to use correctly. The idea being that using Riakc incorrectly will result in a compile-time error. Riakc derives its strength from being written in Ocaml, a language with a very expressive type system. Here are some examples of where I think Riakc is successful.

Siblings

In Riak, when you perform a GET you can get back multiple values associated with the a single key. This is known as siblings. However, a PUT can only associate one value with a key. However, it is convenient to use the same object type for both GET and PUT. In the case of Riakc, that is a Riakc.Robj.t. But, what to do if you create a Robj.t with siblings and try to PUT? In the Ptyhon client you will get a runtime error. Riakc solves this by using phantom types. A Robj.t isn't actually just that, it's a 'a Robj.t. The API requires that 'a to be something specific at different parts of the code. Here is the simplified type for GET:

val get :
  t ->
  b:string ->
  string ->
  ([ `Maybe_siblings ] Robj.t, error) Deferred.Result.t

And here is the simplified type for PUT:

val put :
  t ->
  b:string ->
  ?k:string ->
  [ `No_siblings ] Robj.t ->
  (([ `Maybe_siblings ] Robj.t * key), error) Deferred.Result.t

The important part of the API is that GET returns a [ `Maybe_siblings ] Riak.t and PUT takes a [ `No_siblings ] Riak.t. How does one convert something that might have siblings to something that definitely doesn't? With Riakc.Robj.set_content

val set_content  : Content.t -> 'a t -> [ `No_siblings ] t

set_content takes any kind of Robj.t, and a single Content.t and produces a [ `No_siblings ] Riak.t, because if you set contents to one value obviously you cannot have siblings. Now the type system can ensure that any call to PUT must have a set_content prior to it.

Setting 2i

If you use the LevelDB backend you can use secondary indices, known as 2i, which allow you to find a set of keys based on some mapping. When you create an object you specify the mappings to which it belongs. Two types are supported in Riak: bin and int. And two query types are supported: equal and range. For example, if you encoded the time as an int you could use a range query to find all those keys that occurred within a range of times.

Riak encodes the type of the index in the name. As an example, if you want to allow people to search by a field called "foo" which is a binary secondary index, you would name that index "foo_bin". In the Python Riak client, one sets an index with something like the following code:

obj.add_index('field1_bin', 'val1')
obj.add_index('field2_int', 100000)

In Riakc, the naming convention is hidden from the user. Instead, the the name the field will become is encoded in the value. The Python code looks like the following in Riakc:

let module R = Riakc.Robj in
let index1 =
  R.index_create
    ~k:"field1"
    ~v:(R.Index.String "val1")
in
let index2 =
  R.index_create
    ~k:"field2"
    ~v:(R.Index.Integer 10000)
in
R.set_content
  (R.Content.set_indices [index1; index2] content)
  robj

When the Robj.t is written to the DB, "field1" and "field2" will be transformed into their appropriate names.

Reading from Riak results in the same translation happening. If Riakc cannot determine the type of the value from the field name, for example if Riak gets a new index type, the field name maintains its precise name it got from Riak and the value is a Riakc.Robj.Index.Unknown string.

In this way, we are guaranteed at compile-time that the name of the field will always match its type.

2i Searching

With objects containing 2i entries, it is possible to search by values in those fields. Riak allows for searching fields by their exact value or ranges of values. While it's unclear from the Riak docs, Riakc enforces the two values in a range query are of the same type. Also, like in setting 2i values, the field name is generated from the type of the value. It is more verbose than the Python client but it enforces constraints.

Here is a Python 2i search followed by the equivalent search in Riakc.

results = client.index('mybucket', 'field1_bin', 'val1', 'val5').run()
Riakc.Conn.index_search
  conn
  ~b:"mybucket"
  ~index:"field1"
  (range_string
     ~min:"val1"
     ~max:"val2"
     ~return_terms:false)

Conclusion

It's a bit unfair comparing an Ocaml API to a Python one, but hopefully this has demonstrated that with a reasonable type system one can express safe and powerful APIs without being inconvenient.

Thursday, July 4, 2013

Riakc In Five Minutes

This is a simple example using Riakc to PUT a key into a Riak database. It assumes that you already have a Riak database up and running.

First you need to install riakc. Simply do: opam install riakc. As of this writing, the latest version of riakc is 2.0.0 and the code given depends on that version.

Now, the code. The following is a complete CLI tool that will PUT a key and print back the result from Riak. It handles all errors that the library can generate as well as outputting siblings correctly.

(*
 * This example is valid for version 2.0.0, and possibly later
 *)
open Core.Std
open Async.Std

(*
 * Take a string of bytes and convert them to hex string
 * representation
 *)
let hex_of_string =
  String.concat_map ~f:(fun c -> sprintf "%X" (Char.to_int c))

(*
 * An Robj can have multiple values in it, each one with its
 * own content type, encoding, and value.  This just prints
 * the value, which is a string blob
 *)
let print_contents contents =
  List.iter
    ~f:(fun content ->
      let module C = Riakc.Robj.Content in
      printf "VALUE: %s\n" (C.value content))
    contents

let fail s =
  printf "%s\n" s;
  shutdown 1

let exec () =
  let host = Sys.argv.(1) in
  let port = Int.of_string Sys.argv.(2) in
  (*
   * [with_conn] is a little helper function that will
   * establish a connection, run a function on the connection
   * and tear it down when done
   *)
  Riakc.Conn.with_conn
    ~host
    ~port
    (fun c ->
      let module R = Riakc.Robj in
      let content  = R.Content.create "some random data" in
      let robj     = R.create [] |> R.set_content content in
      (*
       * Put takes a bucket, a key, and an optional list of
       * options.  In this case we are setting the
       * [Return_body] option which returns what the key
       * looks like after the put.  It is possible that
       * siblings were created.
       *)
      Riakc.Conn.put
        c
        ~b:"test_bucket"
        ~k:"test_key"
        ~opts:[Riakc.Opts.Put.Return_body]
        robj)

let eval () =
  exec () >>| function
    | Ok (robj, key) -> begin
      (*
       * [put] returns a [Riakc.Robj.t] and a [string
       * option], which is the key if Riak had to generate
       * it
       *)
      let module R = Riakc.Robj in
      (*
       * Extract the vclock, if it exists, and convert it to
       * to something printable
       *)
      let vclock =
 Option.value
   ~default:"<none>"
   (Option.map ~f:hex_of_string (R.vclock robj))
      in
      let key = Option.value ~default:"<none>" key in
      printf "KEY: %s\n" key;
      printf "VCLOCK: %s\n" vclock;
      print_contents (R.contents robj);
      shutdown 0
    end
    (*
     * These are the various errors that can be returned.
     * Many of then come directly from the ProtoBuf layer
     * since there aren't really any more semantics to apply
     * to the data if it matches the PB frame.
     *)
    | Error `Bad_conn           -> fail "Bad_conn"
    | Error `Bad_payload        -> fail "Bad_payload"
    | Error `Incomplete_payload -> fail "Incomplete_payload"
    | Error `Notfound           -> fail "Notfound"
    | Error `Incomplete         -> fail "Incomplete"
    | Error `Overflow           -> fail "Overflow"
    | Error `Unknown_type       -> fail "Unknown_type"
    | Error `Wrong_type         -> fail "Wrong_type"

let () =
  ignore (eval ());
  never_returns (Scheduler.go ())

Now compile it:

ocamlfind ocamlopt -thread -I +camlp4 -package riakc -c demo.ml
ocamlfind ocamlopt -package riakc -thread -linkpkg \
-o demo.native demo.cmx

Finally, you can run it: ./demo.native hostname port

...And More Detail

The API for Riakc is broken up into two modules: Riakc.Robj and Riakc.Conn with Riakc.Opts being a third helper module. Below is in reference to version 2.0.0 of Riakc.

Riakc.Robj

Riakc.Robj defines a representation of an object stored in Riak. Robj is completely pure code. The API can be found here.

Riakc.Conn

This is the I/O layer. All interaction with the actual database happens through this module. Riakc.Conn is somewhat clever in that it has a compile-time requirement that you have called Riakc.Robj.set_content on any value you want to PUT. This guarantees you have resolved all siblings, somehow. Its API can be found here.

Riakc.Opts

Finally, various options are defined in Riakc.Opts. These are options that GET and PUT take. Not all of them are actually supported but support is planned. The API can be viewed here.

Hopefully Riakc has a fairly straight forward API. While the example code might be longer than other clients, it is complete and correct (I hope).

Saturday, May 25, 2013

Setting Up NixOps On Mac OS X With VirtualBox

Disclaimer

I am a new user of nixops, so I cannot guarantee these directions work for everyone. I have successfully set it up on two machines.

Preamble

The following directions describe how to setup nixops on a Mac OS X machine in VirtualBox. By the end of this you should be able to spawn as many NixOS instances in VirtualBox as your machine can handle. NixOps is similar to vagrant, except it deploys NixOS instances. It can deploy them locally, using VirtualBox, or remotely using EC2. It allows you to deploy clusters of machines, automatically allowing them to communicate with each other. At a high-level, nixops deploys an instance by doing the following:

  1. It builds the environment you ask for on another NixOS instance. This could be your local machine or a build server.
  2. It creates a VM on the service or system you defined (VirtualBox, EC2, etc).
  3. It uploads the environment you've defined to the machine.

The main problem is that nixops must build the environment on the same OS and arch it is deploying. NixOS is a linux distro, that means you cannot built the environment on your Mac. The minor problem is that, by default, the OS X filesystem that everyone gets is case insensitive and that doesn't play well with nix, the package manager.

This post will accomplish the following:

  1. Install and setup VirtualBox.
  2. If your OS X file system is case insensitive (assume it is if you haven't done anything to change it), we will create a loopback mount to install nix on.
  3. Install nix on OS X.
  4. An initial NixOS VirtualBox instance will be created to bootstrap the process and act as a distributed build server.
  5. Create a user on the build system.
  6. Setup up signing keys, so we can copy environments between build server, host, and deployed VM.
  7. Setup local nix to use this VM as a build server.
  8. Deploy a VM.

1. Install VirtualBox

Download VirtualBox and install it. Just follow the directions. The only interesting thing you have to do is make sure you have the vboxnet0 adapter setup in networking. To do this:

  1. Start VirtualBox.
  2. Go to preferences (Cmd-,).
  3. Click on Network.
  4. If vboxnet0 is not present, add it by clicking the green +.
  5. Edit vboxnet0 and make sure DHCP Server is turned on. The settings I use are below.
  • Server Address: 192.168.56.100
  • Server Mask: 255.255.255.0
  • Lower Address Bound: 192.168.56.101
  • Upper Address Bound: 192.168.56.254

2. Creating a case-sensitive file system

Unless you have explicitly changed it, your OS X machine likely has a case insensitive file system. This means nix build some packages. The method I have chosen to get around this is to create a loopback filesystem and mount that.

  1. Create a image. I have been using one 5GB successfully, but if you plan on being a heavy user of nix, you should make it larger.
    hdiutil create ~/nix-loopback -megabytes 5000 -ov -type UDIF
  2. Load it but do not mount:
    hdiutil attach -imagekey diskimage-class=CRawDiskImage -nomount ~/nix-loopback.dmg
  3. Determine which disk and partition your newly created image corresponds to. Specifically you want to find the image that corresponds to the Apple_HFS entry you just created. It will probably be something like disk2s2, but could be anything.
    diskutil list
  4. Create a case-sensitive file system on this partition:
    newfs_hfs -s /dev/disk2s2
  5. Make the mountpoint:
    sudo mkdir /nix
  6. Mount it:
    sudo mount -t hfs /dev/disk2s2 /nix

At this point if you run mount you should see something mounted on /nix.

NOTE: I don't know how to make this point on reboot, which you will need to do if you want to use nix after restarting your system.

3. Install Nix

  1. Download the binary nix darwin package from nixos.org.
  2. Go to root:
    cd /
  3. Untar nix:
    sudo tar -jxvf /path/to/nix-1.5.2-x86_64-darwin.tar.bz2
  4. Chown it to your user:
    sudo chown -R your-user /nix
  5. Finish the install:
    nix-finish-install
  6. nix-finish-install will print out some instructions, you should copy the 'source' to your ~/.profile and run it in your current shell (and any other shell you plan on not restarting but using nix in).
  7. Delete the installer:
    sudo rm /usr/bin/nix-finish-install

4. Setup Nix

  1. Add the nixos channel:
    nix-channel --add http://nixos.org/releases/nixos/channels/nixos-unstable
  2. Update:
    nix-channel --update
  3. Install something:
    nix-env -i tmux

5. Install NixOps

  1. Set NIX_PATH:
    export NIX_PATH=/nix/var/nix/profiles/per-user/`whoami`/channels/nixos
  2. Get nixops:
    git clone git://github.com/NixOS/nixops.git
  3. cd nixops
  4. Install:
    nix-env -f . -i nixops
  5. Verify it is installed:
    nixops --version

5. Setup Distributed Builds

When deploying an instance, nixops needs to build the environment somewhere then it will transfer it to the instance. In order to do this, it needs an already existing NixOS instance to build on. If you were running NixOS already, this would be the machine you are deploying from. To accomplish this, you need a a NixOS running in a VM. Eventually nixops will probably accomplish this for you, but for now it needs to be done manually. Luckily, installing NixOS on VirtualBox is pretty straight forward.

  1. Install a NixOS on VirtualBox from the directions here. This doesn't need any special settings, just SSH.
  2. Setup a port forward so you can SSH into the machine. I'll assume this port forward is 3223.
  3. Make a user called 'nix' on the VM. This is the user that we will SSH through for building. The name of the user doesn't matter, but these directions will assume its name is 'nix'.
  4. On OS X, create two pairs of passwordless SSH keys. One pair will be the login for the nix user. The other will be signing keys.
  5. Install the login public key.
  6. On OS X, create /etc/nix/ (mkdir /etc/nix)
  7. Copy the private signing key to /etc/nix/signing-key.sec. Make sure this is owned by the user you'll be running nixops as and is readable only by that user.
  8. Create a public signing key from your private signing key using openssl. This needs to be in whatever format openssl produces which is not the same as what ssh-keygen created. This output should be in /etc/nix/signing-key.pub. The owner and permissions don't matter as long as the user you'll run nixops as can read it.
    openssl rsa -in /etc/nix/signing-key.sec -pubout > /etc/nix/signing-key.pub
  9. Copy the signing keys to the build server, putting them in the same location. Make sure the nix user owns the private key and is the only one that can read it.
  10. Tell nix to do distributed builds:
    export NIX_BUILD_HOOK=$HOME/.nix-profile/libexec/nix/build-remote.pl
  11. Tell the distributed builder where to store load content:
    export NIX_CURRENT_LOAD=/tmp/current-load
    mkdir /tmp/current-load
  12. Go into a directory you can create files in:
    cat <<EOF > remote-systems.conf
    nix@nix-build-server x86_64-linux /Users/`whoami`/.ssh/id_rsa 1 1
    EOF
  13. Tell the remote builder where to find machine information:
    export NIX_REMOTE_SYSTEMS=$PWD/remote-systems.conf
  14. Add an entry to ~/.ssh/config the fake host 'nix-build-server' turns into your actual VM:
    Host nix-build-server
        HostName localhost
        Port 3223

6. Start An Instance

  1. Create your machine's nix expression:
    cat <<EOF > test-vbox.nix
    {
      test = 
        { config, pkgs, ... }:
        { deployment.targetEnv = "virtualbox";
          deployment.virtualbox.memorySize = 512; # megabytes
        };
    }
    EOF
  2. Create a machine instance named test:
    nixops create ./test-vbox.nix --name test
  3. Deploy it:
    nixops deploy -d test

This could take awhile, and at some points it might not seem like it's doing anything because it's waiting for a build or a transfer. It will push around a fair amount of data. After all is said and done you should be able to do nixops ssh -d test test to connect to it.

Troubleshooting

  • I do a deploy and it sits forever waiting for SSH - You probably forgot to setup your vboxnet0 adapter properly. See Section 1.
  • It dies while building saying a store isn't signed - Only root an import unsigned stores, this means your signing keys aren't stup properly. Double check your permissions.

Other problems? Post them in the comments and I'll add them to the list.

Known Bugs

  • nixops stop -d test never returns - I've only experienced this on one of my installations. It is okay, though. Wait a bit and exit out of the command, then you can do any command as if stop succeeded
  • My Mac grey-screens of death! - This has happened to be once. I update my version of VirtualBox and installed any updates from Apple and I have not experienced it again.

Further Reading

Sunday, March 17, 2013

[ANN] Riakc 0.0.0

Note, since writing this post, Riakc 1.0.0 has already been released and merged into opam. It fixes the below issue of Links (there is a typo in the release notes, 'not' should be 'now'. The source code can be found here. The 1.0.0 version number does not imply any stability or completeness of the library, just that it is not backwards compatible with 0.0.0.

Riakc is a Riak Protobuf client for Ocaml. Riakc uses Jane St Core/Async for concurrency. Riakc is in early development and so far supports a subset of the Riak API. The supported methods are:

  • ping
  • client_id
  • server_info
  • list_buckets
  • list_keys
  • bucket_props
  • get
  • put
  • delete

A note on GET

Links are currently dropped all together in the implementation, so if you read a value with links and write it back, you will have lost them. This will be fixed in the very near future.

As with anything, please feel free to submit issues and pull requests.

The source code can be found here. Riakc is in opam and you can install it by doing opam install riakc.

Usage

There are two API modules in Riakc. Examples of all existing API functions can be found here.

Riakc.Conn

Riakc.Conn provides the API for performing actions on the database. The module interface can be read here.

Riakc.Robj

Riakc.Robj provides the API for objects stored in Riak. The module interface can be read here. Riakc.Conn.get returns a Riakc.Robj.t and Riakc.Conn.put takes one. Robj.t supports representing siblings, however Riakc.Conn.put cannot PUT objects with siblings, this is enforced using phantom types. A value of Riakc.Robj.t that might have siblings is converted to one that doesn't using Riakc.Robj.set_content.

[ANN] Protobuf 0.0.2

Protobuf is an Ocaml library for communicating with Google's protobuf format. It provides a method for writing parsers and builders. There is no protoc support, yet and writing it is not a top goal right now. Protobuf is meant to be fairly lightweight and straight forward to use. The only other Protobuf support for Ocaml I am aware of is through piqi, however that was too heavy for my needs.

Protobuf is meant to be very low level, mostly dealing with representation of values and not semantics. For example, the fixed32 and sfixed32 values are both parsed as Int32.t's. Dealing with being signed or not is left up to the user.

The source code can be viewed here. Protobuf is in opam, to install it opam install protobuf.

The hope is that parsers and builders look reasonably close to the .proto files such that translation is straight forward, at least until protoc support is added. This is an early release and, without a doubt, has bugs in it please submit pull requests and issues.

https://github.com/orbitz/ocaml-protobuf/tree/0.0.2/

Examples

The best collection of examples right now is the tests. An example from the file:

let simple =
  P.int32 1 >>= P.return

let complex =
  P.int32 1           >>= fun num ->
  P.string 2          >>= fun s ->
  P.embd_msg 3 simple >>= fun emsg ->
  P.return (num, s, emsg)

let run_complex str =
  let open Result.Monad_infix in
  P.State.create (Bitstring.bitstring_of_string str)
  >>= fun s ->
  P.run complex s

The builder for this message looks like:

let build_simple i =
  let open Result.Monad_infix in
  let b = B.create () in
  B.int32 b 1 i >>= fun () ->
  Ok (B.to_string b)

let build_complex (i1, s, i2) =
  let open Result.Monad_infix in
  let b = B.create () in
  B.int32 b 1 i1                 >>= fun () ->
  B.string b 2 s                 >>= fun () ->
  B.embd_msg b 3 i2 build_simple >>= fun () ->
  Ok (B.to_string b)

Thursday, February 7, 2013

[ANN] ocaml-vclock - 0.0.0

I ported some Erlang vector clock code to Ocaml for fun and learning. It's not well tested and it hasn't any performance optimizations. I'm not ready yet but I have some projects in mind to use it so it will likely get fleshed out more.

Vector clocks are a system for determining the partial ordering of events in a distributed environment. You can determine if one value is the ancestor of another, equal, or was concurrently updated. It is one mechanism that distributed databases, such as Riak, use to automatically resolve some conflicts in data while maintaining availability.

The vector clock implementation allows for user defined site id type. It also allows metadata to be encoded in the site id, which is useful if you want your vector clock to be prunable by encoding timestamps in it.

The repo can be found here. If you'd like to learn more about vector clocks read the wikipedia page here. The Riak website also has some content on vector clocks here.

Friday, January 4, 2013

Deconstructing Zed's K&R2 Deconstruction

I recently stumbled upon Zed Shaw's deconstruction of K&R2. The post is well intended but, in my opinion, flawed. I believe Zed fails to make a valid argument and also fails to provide a valid solution to the issue he raises.

The chapter is clearly not finished, so this rebuttal might not be valid at the time of reading.

The Argument

The primary argument is that K&R2 is not an appropriate tool for learning C in our modern age. The example given is a function called copy which is effectively strcpy. Zed points out that if the function is not given a valid string, as C defines it, the behaviour of the function is undefined.

This provides a formal proof that the function is defective because there are possible inputs that causes the while-loop to run forever or overflow the target.

When presented with the rebuttal that the cases where it fails are not valid C strings, the response is that it doesn't matter:

... but I'm saying the function is defective because most of the possible inputs cause it to crash the software.

The problem with this mindset is there's no way to confirm that a C string is valid.

Also:

Another argument in favor of this copy() function is when the proponents of K&RC state that you are "just supposed to not use bad strings". Despite the mountains of empirical evidence that this is impossible in C code...

To reiterate, the problem with copy is that:

  1. It depends on valid C strings to operate correctly
  2. C strings are impossible to validate at run-time
  3. The behaviour of copy is undefined for most values that are possible to be put into a char*

Proposed Solution

The solution is a function called safercopy which takes the lengths of the storages as input, allegedly guaranteeing the termination of safercopy:

In every case the for-loop variant with string length given as arguments will terminate no matter what.

What's Wrong With This

We can write what is wrong with safercopy using the exact same criteria Zed used for copy:

  1. It depends on valid lengths to operate correctly
  2. The lengths are impossible to validate at run-time
  3. The behaviour of safercopy is undefined for most values that are possible to be put into a size_t (I am presuming that the lengths would be a size_t)

Additionally, Zed instills a false confidence in his safercopy. The function is no more guaranteed to terminate than copy when given bad input. Specifically, if the lengths are wrong causing the copy loop to go out of bounds of its storage it could easily overwrite the value of anything, including the lengths and pointer values in the loop its in. It could blow up, it could loop forever, who knows. It's undefined.

Finally, if it is hard to properly handle C strings, why should we think it is any easier to track the length of a string separately? Remember, in C the length of a string is encoded in the string itself by the location of the '\0'. The solution provided by Zed takes the length of the strings as separate input. But he provides no reason to believe developers will get this correct. If the solution proposed had been implementing a safe string library, I might be able to agree.

And that is the crux of the problem. It's not if K&R2 is any good or not, it's that the solution given isn't any better. It doesn't address the faults of C strings. There are known way safely handle C strings, the problem tends to be that its tedious so people get it wrong. Many C strings issues have to do with lack of proper allocation rather than forgetting the '\0'-terminator. In what way does the solution solve this problem?

If the solution given is no better than the problem it's solving, then it isn't a very good solution.

K&R2

Is K&R2 not suitable for teaching people C in this day? It has plenty of faults in it but I don't think this particular article, as it exists now, makes a compelling argument. Nor does it provide anything better than what it's critiquing.

Experiences using Result.t vs Exceptions in Ocaml

Disclaimer: I have not compiled any of the example code in this post. Mostly because they are snippets meant to illustrate a point rather than be complete on their own. If they have any errors then apologies.

Previously I gave an introduction to return values vs exceptions in Ocaml. But a lot of ideas in software engineering sound good, how does this particular one work out in real software?

I have used this style in two projects. The first is a project that was originally written using exceptions and I have converted most of it to using return values. The second is one that was written from the start using return values. They can be found here and here. I make no guarantees about the quality of the code, in fact I believe some of it to be junk. These are just my subjective opinions in writing software with a particular attribute.

The Good

Expected Result

The whole system worked as expected. I get compile-time errors for all failure cases I do not handle. This has helped me catch some failure cases I had forgotten about previously, some of which would require an unlikely chain of events to hit, which would have made finding in a test harder, but obviously not impossible. In particular, ParaMugsy is (although the current rewrite does not cover this yet) meant to run in a distributed environment, which increases the cost of errors. Both in debugging and reproducing. In the case of opass, writing the DB is important to get right. Missing handling a failure here can mean the users database of passwords can be lost, a tragic event.

Not Cumbersome

In the Introduction I showed that for a simple program, return-values are no more cumbersome than exceptions. In these larger projects the same holds. This shouldn't really be a surprise though, as the monadic operators actually simulate the exact flow of exception code. But the 'not cumbersome' is half of a lie, which is explained more below.

Refactoring Easier

Ocaml is a great language when it comes to refactoring. Simply make the change you want and iterate on compiler errors. This style has made it even easier for me. I can add new failures to my functions and work through the compiler errors to make sure the change is handled in every location.

Works No Matter The Concurrent Framework

The original implementation of ParaMugsy used Lwt. In the rewrite I decided to use Core's Async library. Both are monadic. And both handle exceptions quite differently. Porting functions over that did return-values was much easier because they didn't rely on the framework to handle and propagate failures. Exceptions are tricky in a concurrent framework and concurrency is purely library based in Ocaml rather than being part of the language, which means libraries can choose incompatible ways to handle them. Return-values give one less thing to worry about when porting code or trying to get code to work in multiple frameworks.

The Bad

Prototyping Easier With Exceptions

The whole idea is to make it hard to miss an error case. But that can be annoying when you just want to get something running. Often times we write software in such a way that the success path is the first thing we write and we handle the errors after that. I don't think there is necessarily a good reason for this other than it's much more satisfying to see the results of the hard work sooner rather than later. In this case, my solution is to relax the ban on exceptions temporarily. Any place that I will return an Error I instead write failwith "not yet implemented". That way there is an easily grepable string to ensure I have replaced all exceptions with Error's when I am done. This is an annoyance but thankfully with a fairly simple solution.

Cannot Express All Invariants In Type System

Sometimes there are sections of code where I know something is true, but it is not expressible in the type system. For example, perhaps I have a data structure that updates multiple pieces of information together. I know when I access one piece of information it will be in the other place. Or perhaps I have a pattern match that I need to handle due to exhaustiveness but I know that it cannot happen given some invariants I have established earlier. In the case where I am looking up data that I know will exist, I will use a lookup function that can throw an exception if it is easiest. In the case where I have a pattern match that I know will never happen, I use assert. But note, these are cases where I have metaphysical certitude that such events will not happen. Not cases where I'm just pretty sure they work.

Many Useful Libraries Throw Exceptions

Obviously a lot of libraries throw exceptions. Luckily the primary library I use is Jane St's Core Suite, where they share roughly the same aversion of exceptions. Some functions still do throw exceptions though, most notably In_channel.with_file and Out_channel.with_file. This can be solved by wrapping those functions in return-value ones. The problem comes in: what happens when the function being wrapped is poorly documented or at some point can throw more exceptional cases than when it was originally wrapped. One option is to always catch _ and turn it into a fairly generic variant type. Or maybe a function only has a few logical failure conditions so collapsing them to a few variant types makes sense. I'm not aware of any really good solution here.

A Few Examples

There are a few transformations that come up often when converting exception code to return-value code. Here are some in detail.

Building Things

It's common to want to do some work and then construct a value from it. In exception-land that is as simple, just something like Constructor (thing_that_may_throw_exception ()). This doesn't work with return-values. Instead we have to do what we did in the Introduction post. Here is an example:

let f () =
  let open Result.Monad_infix in
  thing_that_may_fail () >>= fun v ->
  Ok (Constructor v)

Looping

Some loops cannot be written in their most obvious style. Consider an implementation of map that expects the function passed to it to use Result.t to signal failures. The very naive implementation of map is:

let map f = function
  | []    -> []
  | x::xs -> (f x)::(map xs)

There are two ways to write this. The first requires two passes over the elements. The first pass applies the function and the second one checks which value each function returned or the first error that was hit.

let map f l =
  Result.all (List.map f l)

Result.all has the type ('a, 'b) Core.Std.Result.t list -> ('a list, 'b) Core.Std.Result.t

The above is simple but could be inefficient. The entire map is preformed regardless of failure and then walked again. If the function being applied is expensive this could be a problem. The other solution is a pretty standard pattern in Ocaml of using an accumulator and reversing it on output. The monadic operator could be replaced by a match in this example, I just prefer the operator.

let map f l =
  let rec map' f acc = function
    | []    -> Ok (List.rev acc)
    | x::xs -> begin
      let open Result.Monad_infix in
      f x >>= fun v ->
      map' f (v::acc) xs
    end
  in
  map' f [] l

I'm sure someone cleverer in Ocaml probably has a superior solution but this has worked well for me.

try/with

A lot of exception code looks like the following.

let () =
  try
    thing1 ();
    thing2 ();
    thing3 ()
  with
    | Error1 -> handle_error1 ()
    | Error2 -> handle_error2 ()
    | Error3 -> handle_error3 ()

The scheme I use would break this into two functions. The one inside the try and the one handling its result. This might sound heavy but the syntax to define a new function in Ocaml is very light. In my experience this hasn't been a problem.

let do_things () =
  let open Result.Monad_infix in
  thing1 () >>= fun () ->
  thing2 () >>= fun () ->
  thing3

let () =
  match do_things () with
    | Ok _ -> ()
    | Error Error1 -> handle_error1 ()
    | Error Error2 -> handle_error2 ()
    | Error Error3 -> handle_error3 ()

Conclusion

Using return-values instead of exceptions in my Ocaml projects has had nearly the exact output I anticipated. I have compile-time guarantees for handling failure cases and the cost to my code has been minimal. Any difficulties I've run into have had straight forward solutions. In some cases it's simply a matter of thinking about the problems from a new perspective and the solution is clear. I plan on continuing to develop code with these principles and creating larger projects. I believe that this style scales well in larger projects and actually becomes less cumbersome as the project increases since the guarantees can help make it easier to reason about the project.

Thursday, January 3, 2013

Introduction to Result.t vs Exceptions in Ocaml

This post uses Jane St's Core suite. Specifically the Result module. It assumes some basic knowledge of Ocaml. Please check out Ocaml.org for more Ocaml reading material.

There are several articles and blog posts out there arguing for or against return values over exceptions. I'll add to the discussion with my reasons for using return values in the place of exceptions in Ocaml.

What's the difference?

Why does the debate even exist? Because each side has decent arguments for why their preference is superior when it comes to writing reliable software. Pro-return-value developers, for example, argue that their code is easier identify if the code is wrong simply by reading it (if it isn't handling a return value of a function, it's wrong), while exception based code requires understanding all of the functions called to determine if and how they will fail. Pro-exception developers argue that it is much harder to get their program into an undefined state because an exception has to be handled or else the program fails, where in return based code one can simply forget to check a function's return value and the program continues on in an undefined state.

I believe that Ocaml has several features that make return values the preferable way to handle errors. Specifically variants, polymorphic variants, exhaustive pattern matching, and a powerful static type system make return values attractive.

This debate is only worth your time if you are really passionate about writing software that has fairly strong guarantees about its quality in the face of errors. For a majority of software, it doesn't matter which paradigm you choose. Most errors will be stumbled upon during debugging and fairly soon after going into production or through writing unit and integration tests. But, tests cannot catch everything. And in distributed and concurrent code rare errors can now become common errors and it can be near impossible to reconstruct the conditions that caused it. But in some cases it is possible to make whole classes of errors either impossible or catchable at compile-time with some discipline. Ocaml is at least one language that makes this possible.

Checked exceptions

A quick aside on checked exceptions, as in Java. Checked exceptions provide some of the functionality I claim is valuable, the main problem with how checked exceptions are implemented in Java (the only language I have any experience in that uses them), is they have a very heavy syntax, to the point where using them can seem too burdensome.

The Claim

The claim is that if one cares about ensuring they are handling all failure cases in their software, return-values are superior to exceptions because, with the help of a good type system, their handling can be validated at compile-time. Ocaml provides a fairly light, non intrusive, syntax to make this feasible.

Good Returns

The goal of a good return value based error handling system is to make sure that all errors are handled at compile-time. This is because there is no way to enforce this at run-time, as an exception does. This is a good reason to prefer exceptions in a dynamically typed language like Python or Ruby, your static analyzers are few and far between.

In C this is generally accomplished by using a linting tool that will report an error if a function's return value is ignored in a call. This is why you might see printf casted to void in some code, to make it clear the return value is meant to be ignored. But a problem with this solution is that it only enforces that the developer handles the return value, not all possible errors. For example, POSIX functions return a value saying the function failed and put the actual failure in errno. How, then, to enforce that all of the possible failures are handled? Without encoding all of that information in a linting tool, the options in C (and most languages) are pretty weak. Linting tools are also separate from the compiler and vary in quality. Writing code that takes proper advantage of a linting tool, in C, is a skill all of its own as well.

Better Returns

Ocaml supports exceptions but the compiler provides no guarantees that the exceptions are actually handled anywhere in the code. So what happens if the documentation of a function is incomplete or a dependent function is changed to add a new exception being thrown? The compiler won't help you.

But Ocaml's rich type system, combined with some discipline, gives you more power than a C linter. The primary strength is that Ocaml lets you encode information in your types. For example, in POSIX many functions return an integer to indicate error. But an int has no interesting meaning to the compiler other than it holds values between INT_MIN and INT_MAX. In Ocaml, we can instead create a type to represent the errors a function can return and the compiler can enforce that all possible errors are handled in some way thanks to exhaustive pattern matching.

An Example

What does all of this look like? Below a contrived example. The goal is to provide a function, called parse_person that takes a string and turns it into a person record. The requirements of the code is that if a valid person cannot be parsed out, the part of the string that failed is specified in the error message.

Here is a version using exceptions, ex1.ml:

open Core.Std

exception Int_of_string of string

exception Bad_line of string
exception Bad_name of string
exception Bad_age of string
exception Bad_zip of string

type person = { name : (string * string)
              ; age  : Int.t
              ; zip  : string
              }

(* A little helper function *)
let int_of_string s =
  try
    Int.of_string s
  with
    | Failure _ ->
      raise (Int_of_string s)

let parse_name name =
  match String.lsplit2 ~on:' ' name with
    | Some (first_name, last_name) ->
      (first_name, last_name)
    | None ->
      raise (Bad_name name)

let parse_age age =
  try
    int_of_string age
  with
    | Int_of_string _ ->
      raise (Bad_age age)

let parse_zip zip =
  try
    ignore (int_of_string zip);
    if String.length zip = 5 then
      zip
    else
      raise (Bad_zip zip)
  with
    | Int_of_string _ ->
      raise (Bad_zip zip)

let parse_person s =
  match String.split ~on:'\t' s with
    | [name; age; zip] ->
      { name = parse_name name
      ; age  = parse_age age
      ; zip  = parse_zip zip
      }
    | _ ->
      raise (Bad_line s)

let () =
  (* Pretend input came from user *)
  let input = "Joe Mama\t25\t11425" in
  try
    let person = parse_person input in
    printf "Name: %s %s\nAge: %d\nZip: %s\n"
      (fst person.name)
      (snd person.name)
      person.age
      person.zip
  with
    | Bad_line l ->
      printf "Bad line: '%s'\n" l
    | Bad_name name ->
      printf "Bad name: '%s'\n" name
    | Bad_age age ->
      printf "Bad age: '%s'\n" age
    | Bad_zip zip ->
      printf "Bad zip: '%s'\n" zip

ex2.ml is a basic translation of the above but using variants. The benefit is that the type system will ensure that all failure case are handled. The problem is the code is painful to read and modify. Every function that can fail has its own variant type to represent success and error. Composing the functions is painful since every thing returns a different type. We have to create a type that can represent all of the failures the other functions returned. It would be nice if each function could return an error and we could use that value instead. It would also be nice if everything read as a series of steps, rather than pattern matching on a tuple which makes it hard to read.

ex3.ml introduces Core's Result.t type. The useful addition is that we only need to define a type for parse_person. Every other function only has one error condition so we can just encode the error in the Error variant. This is still hard to read, though. The helper functions aren't so bad but the main function is still painful.

While the previous solutions have solved the problem of ensuring that all errors are handled, they introduced the problem of being painful to develop with. The main problem is that nothing composes. The helpers have their own error types and for every call to them we have to check their return and then encompass their error in any function above it. What would be nice is if the compiler could automatically union all of the error codes we want to return from itself and any function it called. Enter polymorphic variants.

ex4.ml Shows the version with polymorphic variants. The nice bit of refactoring we were able to do is in parse_person. Rather than an ugly match, the calls to the helper functions can be sequenced:

let parse_person s =
  match String.split ~on:'\t' s with
    | [name; age; zip] ->
      let open Result.Monad_infix in
      parse_name name >>= fun name ->
      parse_age  age  >>= fun age  ->
      parse_zip  zip  >>= fun zip  ->
      Ok { name; age; zip }
    | _ ->
      Error (`Bad_line s)

Don't worry about the monad syntax, it's really just to avoid the nesting to make the sequencing easier on the eyes. Except for the >>=, this looks a lot like code using exceptions. There is a nice linear flow and only the success path is shown. But! The compiler will ensure that all failures are handled.

The final version of the code is ex5.ml. This takes ex4 and rewrites portions of it to be prettier. As a disclaimer, I'm sure someone else would consider writing this differently even with the same restrictions I put on it, I might even write it different on a different day, but this version of the code demonstrates the points I am making.

A few points of comparison between ex1 and ex5:

  • The body of parse_person is definitely simpler and easier to read in the exception code. It is short and concise.
  • The rest of the helper functions are a bit of a toss-up between the exception and return-value code. I think one could argue either direction.
  • The return-value code has fulfilled my requirements in terms of handling failures. The compiler will complain if any failure parse_person could return is not handled. If I add another error type the code will not compile. It also fulfilled the requirements without bloating the code. The return-value code and exception code are roughly the same number of lines. Their flows are roughly equal. But the return-value code is much safer.

Two Points

It's not all sunshine and lollipops. There are two issues to consider:

  • Performance - Exceptions in Ocaml are really, really, fast. Like any performance issue, I suggest altering code only when needed based on measurements and encapsulating those changes as well as possible. This also means if you want to provide a safe and an exception version of a function, you should probably implement the safe version in terms of the exception verson.
  • Discipline - I referred to discipline a few times above. This whole scheme is very easy to mess up with a single mistake: pattern matching on anything (_). The power of exhaustive pattern matching means you need to match on every error individually. This is effectively for the same reason catching the exception base class in other languages is such a bad idea, you lose a lot of information.

Conclusion

The example given demonstrates an important point: code can become much safer at compile time without detriment to its length or readability. The cost is low and the benefit is high. This is a strong reason to prefer a return-value based solution over exceptions in Ocaml.