functional orbitz

Gen_server in Ocaml

2013-12-23T18:17:00.000-05:00

Note, this post is written against the 2.0.1 version of gen_server

Erlang comes with a rich set of small concurrency primitives to make handling and manipulating state easier. The most generic of the frameworks is the gen_server which is also the most commonly used. A gen_server provides a way to control state over multiple requests. It serializes operations and handles both synchronous and asynchronous communication with clients. The strength of a gen_server is the ability to create multiple, lightweight, servers inside an application where each operation inside of it runs in serial but individually the gen_servers run concurrently.

While it is not possible to provide all of the Erlang semantics in Ocaml, we can create something roughly analogous. We can also get some properties that Erlang can not give us. In particular, the implementation of gen_server provided here:

Does not have the concept of a process or a process id. A gen_server is an abstract type that is parameterized by a message type.
Uses queues to communicate messages between clients and servers.
gen_servers are typesafe, only messages that they can handle can be sent to them.
You can only communicate with gen_servers in your own process, there is no concept of location ignorance.
Only provides an asynchronous communication function, called send that has pushback. That means a send will be evaluated when the gen_server accepts the message but will not wait for the gen_server to complete the processing of the message.
Has the concept of process linking, however it is not preemptive. When a gen_server stops, for any reason, any calls to send will return an error stating the gen_server has closed itself. This will not force the termination of any other gen_servers in Ocaml, but the termination can at least be detected.
Any thrown exceptions are handled by the gen_server framework and result in the gen_server being gracefully terminated.

Relative to Erlang the Ocaml version isn't very impressive, however it's still a useful technique for encapsulating state in a concurrent environment.

This implementation of gen_server is on top of Jane St's Async. What does it look like? The primary interface looks like this:

val start  :
    'i ->
    ('i, 's, 'm, 'ie, 'he) Server.t ->
    ('m t, [> 'ie init_ret ]) Deferred.Result.t

val stop   :
    'm t ->
    (unit, [> `Closed ]) Deferred.Result.t

val send   :
    'm t ->
    'm ->
    ('m, [> send_ret ]) Deferred.Result.t

The interface is only three functions: start, stop and send.

The start function is a bit harry looking but don't be put off by the server type parameterized on five type variables. The start function takes two parameters, the first is the initial parameters to pass to the gen_server, the second is the callbacks of the gen_server.
stop takes a gen_server and returns Ok () on success and Error `Closed if the gen_server is not running.
send takes a gen_server and a message. The message must be the same type the gen_server accepts. It returns Ok msg on success and Error `Closed if the gen_server is not running.

The most confusion part is probably the ('i, 's, 'm, 'ie, 'he) Server.t. This is the type that the implementer of the gen_server writes. It is three callbacks: init, handle_call and terminate. Let's breakdown the type variables:

'i - This is the type of the variable that you pass to start and will be given to the init callback.
's - This is the type of the state that the gen_server will encapsulate. This will be passed to handle_call and terminate. The handle_call callback will manipulate the state and return a new one.
'm - This is the message type that the gen_server will accept.
'ie - This is the type of error that the init callback can return.
'he - This is the type of error that the handle_call callback can return.

While the server type looks complicated, as you can see each variable corresponds to all of the type information needed to understand a gen_server. So what does a server look like? While the types are big it's actually not too bad. Below is an example of a call to start. The full source code can be found here.

(* Package the callbacks *)
let callbacks =
  { Gen_server.Server.init; handle_call; terminate }

let start () =
  Gen_server.start () callbacks

And what do the callbacks look like? Below is a simplified version of what a set of callbacks could look like, with comments.

module Resp = Gen_server.Response

module Gs = Gen_server

(* Callbacks *)
let init self init =
  Deferred.return (Ok ())

let handle_call self state = function
  | Msg.Msg1 ->
    (* Success *)
    Deferred.return (Resp.Ok state)
  | Msg.Msg2 ->
    (* Error *)
    Deferred.return (Resp.Error (reason, state))
  | Msg.Msg3 ->
    (* Exceptions can be thrown too *)
    failwith "blowin' up"

(* Exceptions thrown from terminate are silently ignored *)
let terminate reason state =
   match reason with
     | Gs.Server.Normal ->
       (* Things exited normally *)
       Deferred.unit
     | Gs.Server.Exn exn ->
       (* An exception was thrown *)
       Deferred.unit
     | Gs.Server.Error err ->
       (* User returned an error *)
       Deferred.unit

There isn't much more to it than that.

A functor implementation is also provided. I prefer the non-functor version, I think it's a bit less verbose and easier to work with, but some people like them.

How To Get It?

You can install gen_server through opam, simply: opam install gen_server

The source can be found here. Only the tags should be trusted as working.

There are a few examples here.

Enjoy.

Experimenting in API Design: Riakc

2013-07-09T14:37:00.000-04:00

Disclaimer: Riakc's API is in flux so not all of the code here is guaranteed to work by the time you read this post. However the general principles should hold.

While not perfect, Riakc attempts to provide an API that is very hard to use incorrectly, and hopefully easy to use correctly. The idea being that using Riakc incorrectly will result in a compile-time error. Riakc derives its strength from being written in Ocaml, a language with a very expressive type system. Here are some examples of where I think Riakc is successful.

Siblings

In Riak, when you perform a GET you can get back multiple values associated with the a single key. This is known as siblings. However, a PUT can only associate one value with a key. However, it is convenient to use the same object type for both GET and PUT. In the case of Riakc, that is a Riakc.Robj.t. But, what to do if you create a Robj.t with siblings and try to PUT? In the Ptyhon client you will get a runtime error. Riakc solves this by using phantom types. A Robj.t isn't actually just that, it's a 'a Robj.t. The API requires that 'a to be something specific at different parts of the code. Here is the simplified type for GET:

val get :
  t ->
  b:string ->
  string ->
  ([ `Maybe_siblings ] Robj.t, error) Deferred.Result.t

And here is the simplified type for PUT:

val put :
  t ->
  b:string ->
  ?k:string ->
  [ `No_siblings ] Robj.t ->
  (([ `Maybe_siblings ] Robj.t * key), error) Deferred.Result.t

The important part of the API is that GET returns a [ `Maybe_siblings ] Riak.t and PUT takes a [ `No_siblings ] Riak.t. How does one convert something that might have siblings to something that definitely doesn't? With Riakc.Robj.set_content

val set_content  : Content.t -> 'a t -> [ `No_siblings ] t

set_content takes any kind of Robj.t, and a single Content.t and produces a [ `No_siblings ] Riak.t, because if you set contents to one value obviously you cannot have siblings. Now the type system can ensure that any call to PUT must have a set_content prior to it.

Setting 2i

If you use the LevelDB backend you can use secondary indices, known as 2i, which allow you to find a set of keys based on some mapping. When you create an object you specify the mappings to which it belongs. Two types are supported in Riak: bin and int. And two query types are supported: equal and range. For example, if you encoded the time as an int you could use a range query to find all those keys that occurred within a range of times.

Riak encodes the type of the index in the name. As an example, if you want to allow people to search by a field called "foo" which is a binary secondary index, you would name that index "foo_bin". In the Python Riak client, one sets an index with something like the following code:

obj.add_index('field1_bin', 'val1')
obj.add_index('field2_int', 100000)

In Riakc, the naming convention is hidden from the user. Instead, the the name the field will become is encoded in the value. The Python code looks like the following in Riakc:

let module R = Riakc.Robj in
let index1 =
  R.index_create
    ~k:"field1"
    ~v:(R.Index.String "val1")
in
let index2 =
  R.index_create
    ~k:"field2"
    ~v:(R.Index.Integer 10000)
in
R.set_content
  (R.Content.set_indices [index1; index2] content)
  robj

When the Robj.t is written to the DB, "field1" and "field2" will be transformed into their appropriate names.

Reading from Riak results in the same translation happening. If Riakc cannot determine the type of the value from the field name, for example if Riak gets a new index type, the field name maintains its precise name it got from Riak and the value is a Riakc.Robj.Index.Unknown string.

In this way, we are guaranteed at compile-time that the name of the field will always match its type.

2i Searching

With objects containing 2i entries, it is possible to search by values in those fields. Riak allows for searching fields by their exact value or ranges of values. While it's unclear from the Riak docs, Riakc enforces the two values in a range query are of the same type. Also, like in setting 2i values, the field name is generated from the type of the value. It is more verbose than the Python client but it enforces constraints.

Here is a Python 2i search followed by the equivalent search in Riakc.

results = client.index('mybucket', 'field1_bin', 'val1', 'val5').run()

Riakc.Conn.index_search
  conn
  ~b:"mybucket"
  ~index:"field1"
  (range_string
     ~min:"val1"
     ~max:"val2"
     ~return_terms:false)

Conclusion

It's a bit unfair comparing an Ocaml API to a Python one, but hopefully this has demonstrated that with a reasonable type system one can express safe and powerful APIs without being inconvenient.

Riakc In Five Minutes

2013-07-04T13:01:00.000-04:00

This is a simple example using Riakc to PUT a key into a Riak database. It assumes that you already have a Riak database up and running.

First you need to install riakc. Simply do: opam install riakc. As of this writing, the latest version of riakc is 2.0.0 and the code given depends on that version.

Now, the code. The following is a complete CLI tool that will PUT a key and print back the result from Riak. It handles all errors that the library can generate as well as outputting siblings correctly.

(*
 * This example is valid for version 2.0.0, and possibly later
 *)
open Core.Std
open Async.Std

(*
 * Take a string of bytes and convert them to hex string
 * representation
 *)
let hex_of_string =
  String.concat_map ~f:(fun c -> sprintf "%X" (Char.to_int c))

(*
 * An Robj can have multiple values in it, each one with its
 * own content type, encoding, and value.  This just prints
 * the value, which is a string blob
 *)
let print_contents contents =
  List.iter
    ~f:(fun content ->
      let module C = Riakc.Robj.Content in
      printf "VALUE: %s\n" (C.value content))
    contents

let fail s =
  printf "%s\n" s;
  shutdown 1

let exec () =
  let host = Sys.argv.(1) in
  let port = Int.of_string Sys.argv.(2) in
  (*
   * [with_conn] is a little helper function that will
   * establish a connection, run a function on the connection
   * and tear it down when done
   *)
  Riakc.Conn.with_conn
    ~host
    ~port
    (fun c ->
      let module R = Riakc.Robj in
      let content  = R.Content.create "some random data" in
      let robj     = R.create [] |> R.set_content content in
      (*
       * Put takes a bucket, a key, and an optional list of
       * options.  In this case we are setting the
       * [Return_body] option which returns what the key
       * looks like after the put.  It is possible that
       * siblings were created.
       *)
      Riakc.Conn.put
        c
        ~b:"test_bucket"
        ~k:"test_key"
        ~opts:[Riakc.Opts.Put.Return_body]
        robj)

let eval () =
  exec () >>| function
    | Ok (robj, key) -> begin
      (*
       * [put] returns a [Riakc.Robj.t] and a [string
       * option], which is the key if Riak had to generate
       * it
       *)
      let module R = Riakc.Robj in
      (*
       * Extract the vclock, if it exists, and convert it to
       * to something printable
       *)
      let vclock =
 Option.value
   ~default:"<none>"
   (Option.map ~f:hex_of_string (R.vclock robj))
      in
      let key = Option.value ~default:"<none>" key in
      printf "KEY: %s\n" key;
      printf "VCLOCK: %s\n" vclock;
      print_contents (R.contents robj);
      shutdown 0
    end
    (*
     * These are the various errors that can be returned.
     * Many of then come directly from the ProtoBuf layer
     * since there aren't really any more semantics to apply
     * to the data if it matches the PB frame.
     *)
    | Error `Bad_conn           -> fail "Bad_conn"
    | Error `Bad_payload        -> fail "Bad_payload"
    | Error `Incomplete_payload -> fail "Incomplete_payload"
    | Error `Notfound           -> fail "Notfound"
    | Error `Incomplete         -> fail "Incomplete"
    | Error `Overflow           -> fail "Overflow"
    | Error `Unknown_type       -> fail "Unknown_type"
    | Error `Wrong_type         -> fail "Wrong_type"

let () =
  ignore (eval ());
  never_returns (Scheduler.go ())

Now compile it:

ocamlfind ocamlopt -thread -I +camlp4 -package riakc -c demo.ml
ocamlfind ocamlopt -package riakc -thread -linkpkg \
-o demo.native demo.cmx

Finally, you can run it: ./demo.native hostname port

...And More Detail

The API for Riakc is broken up into two modules: Riakc.Robj and Riakc.Conn with Riakc.Opts being a third helper module. Below is in reference to version 2.0.0 of Riakc.

Riakc.Robj

Riakc.Robj defines a representation of an object stored in Riak. Robj is completely pure code. The API can be found here.

Riakc.Conn

This is the I/O layer. All interaction with the actual database happens through this module. Riakc.Conn is somewhat clever in that it has a compile-time requirement that you have called Riakc.Robj.set_content on any value you want to PUT. This guarantees you have resolved all siblings, somehow. Its API can be found here.

Riakc.Opts

Finally, various options are defined in Riakc.Opts. These are options that GET and PUT take. Not all of them are actually supported but support is planned. The API can be viewed here.

Hopefully Riakc has a fairly straight forward API. While the example code might be longer than other clients, it is complete and correct (I hope).

Setting Up NixOps On Mac OS X With VirtualBox

2013-05-25T13:40:00.000-04:00

Disclaimer

I am a new user of nixops, so I cannot guarantee these directions work for everyone. I have successfully set it up on two machines.

Preamble

The following directions describe how to setup nixops on a Mac OS X machine in VirtualBox. By the end of this you should be able to spawn as many NixOS instances in VirtualBox as your machine can handle. NixOps is similar to vagrant, except it deploys NixOS instances. It can deploy them locally, using VirtualBox, or remotely using EC2. It allows you to deploy clusters of machines, automatically allowing them to communicate with each other. At a high-level, nixops deploys an instance by doing the following:

It builds the environment you ask for on another NixOS instance. This could be your local machine or a build server.
It creates a VM on the service or system you defined (VirtualBox, EC2, etc).
It uploads the environment you've defined to the machine.

The main problem is that nixops must build the environment on the same OS and arch it is deploying. NixOS is a linux distro, that means you cannot built the environment on your Mac. The minor problem is that, by default, the OS X filesystem that everyone gets is case insensitive and that doesn't play well with nix, the package manager.

This post will accomplish the following:

Install and setup VirtualBox.
If your OS X file system is case insensitive (assume it is if you haven't done anything to change it), we will create a loopback mount to install nix on.
Install nix on OS X.
An initial NixOS VirtualBox instance will be created to bootstrap the process and act as a distributed build server.
Create a user on the build system.
Setup up signing keys, so we can copy environments between build server, host, and deployed VM.
Setup local nix to use this VM as a build server.
Deploy a VM.

1. Install VirtualBox

Download VirtualBox and install it. Just follow the directions. The only interesting thing you have to do is make sure you have the vboxnet0 adapter setup in networking. To do this:

Start VirtualBox.
Go to preferences (Cmd-,).
Click on Network.
If vboxnet0 is not present, add it by clicking the green +.
Edit vboxnet0 and make sure DHCP Server is turned on. The settings I use are below.

Server Address: 192.168.56.100
Server Mask: 255.255.255.0
Lower Address Bound: 192.168.56.101
Upper Address Bound: 192.168.56.254

2. Creating a case-sensitive file system

Unless you have explicitly changed it, your OS X machine likely has a case insensitive file system. This means nix build some packages. The method I have chosen to get around this is to create a loopback filesystem and mount that.

Create a image. I have been using one 5GB successfully, but if you plan on being a heavy user of nix, you should make it larger.
hdiutil create ~/nix-loopback -megabytes 5000 -ov -type UDIF
Load it but do not mount:
hdiutil attach -imagekey diskimage-class=CRawDiskImage -nomount ~/nix-loopback.dmg
Determine which disk and partition your newly created image corresponds to. Specifically you want to find the image that corresponds to the Apple_HFS entry you just created. It will probably be something like disk2s2, but could be anything.
diskutil list
Create a case-sensitive file system on this partition:
newfs_hfs -s /dev/disk2s2
Make the mountpoint:
sudo mkdir /nix
Mount it:
sudo mount -t hfs /dev/disk2s2 /nix

At this point if you run mount you should see something mounted on /nix.

NOTE: I don't know how to make this point on reboot, which you will need to do if you want to use nix after restarting your system.

3. Install Nix

Download the binary nix darwin package from nixos.org.
Go to root:
cd /
Untar nix:
sudo tar -jxvf /path/to/nix-1.5.2-x86_64-darwin.tar.bz2
Chown it to your user:
sudo chown -R your-user /nix
Finish the install:
nix-finish-install
nix-finish-install will print out some instructions, you should copy the 'source' to your ~/.profile and run it in your current shell (and any other shell you plan on not restarting but using nix in).
Delete the installer:
sudo rm /usr/bin/nix-finish-install

4. Setup Nix

Add the nixos channel:
nix-channel --add http://nixos.org/releases/nixos/channels/nixos-unstable
Update:
nix-channel --update
Install something:
nix-env -i tmux

5. Install NixOps

Set NIX_PATH:
export NIX_PATH=/nix/var/nix/profiles/per-user/`whoami`/channels/nixos
Get nixops:
git clone git://github.com/NixOS/nixops.git
cd nixops
Install:
nix-env -f . -i nixops
Verify it is installed:
nixops --version

5. Setup Distributed Builds

When deploying an instance, nixops needs to build the environment somewhere then it will transfer it to the instance. In order to do this, it needs an already existing NixOS instance to build on. If you were running NixOS already, this would be the machine you are deploying from. To accomplish this, you need a a NixOS running in a VM. Eventually nixops will probably accomplish this for you, but for now it needs to be done manually. Luckily, installing NixOS on VirtualBox is pretty straight forward.

Install a NixOS on VirtualBox from the directions here. This doesn't need any special settings, just SSH.
Setup a port forward so you can SSH into the machine. I'll assume this port forward is 3223.
Make a user called 'nix' on the VM. This is the user that we will SSH through for building. The name of the user doesn't matter, but these directions will assume its name is 'nix'.
On OS X, create two pairs of passwordless SSH keys. One pair will be the login for the nix user. The other will be signing keys.
Install the login public key.
On OS X, create /etc/nix/ (mkdir /etc/nix)
Copy the private signing key to /etc/nix/signing-key.sec. Make sure this is owned by the user you'll be running nixops as and is readable only by that user.
Create a public signing key from your private signing key using openssl. This needs to be in whatever format openssl produces which is not the same as what ssh-keygen created. This output should be in /etc/nix/signing-key.pub. The owner and permissions don't matter as long as the user you'll run nixops as can read it.
openssl rsa -in /etc/nix/signing-key.sec -pubout > /etc/nix/signing-key.pub
Copy the signing keys to the build server, putting them in the same location. Make sure the nix user owns the private key and is the only one that can read it.
Tell nix to do distributed builds:
export NIX_BUILD_HOOK=$HOME/.nix-profile/libexec/nix/build-remote.pl
Tell the distributed builder where to store load content:
export NIX_CURRENT_LOAD=/tmp/current-load
mkdir /tmp/current-load

Go into a directory you can create files in:

cat <<EOF > remote-systems.conf
nix@nix-build-server x86_64-linux /Users/`whoami`/.ssh/id_rsa 1 1
EOF

Tell the remote builder where to find machine information:
export NIX_REMOTE_SYSTEMS=$PWD/remote-systems.conf
Add an entry to ~/.ssh/config the fake host 'nix-build-server' turns into your actual VM:
```
Host nix-build-server
    HostName localhost
    Port 3223
```

6. Start An Instance

Create your machine's nix expression:

cat <<EOF > test-vbox.nix
{
  test = 
    { config, pkgs, ... }:
    { deployment.targetEnv = "virtualbox";
      deployment.virtualbox.memorySize = 512; # megabytes
    };
}
EOF

Create a machine instance named test:
nixops create ./test-vbox.nix --name test
Deploy it:
nixops deploy -d test

This could take awhile, and at some points it might not seem like it's doing anything because it's waiting for a build or a transfer. It will push around a fair amount of data. After all is said and done you should be able to do nixops ssh -d test test to connect to it.

Troubleshooting

I do a deploy and it sits forever waiting for SSH - You probably forgot to setup your vboxnet0 adapter properly. See Section 1.
It dies while building saying a store isn't signed - Only root an import unsigned stores, this means your signing keys aren't stup properly. Double check your permissions.

Other problems? Post them in the comments and I'll add them to the list.

Known Bugs

nixops stop -d test never returns - I've only experienced this on one of my installations. It is okay, though. Wait a bit and exit out of the command, then you can do any command as if stop succeeded
My Mac grey-screens of death! - This has happened to be once. I update my version of VirtualBox and installed any updates from Apple and I have not experienced it again.

[ANN] Riakc 0.0.0

2013-03-17T10:42:00.001-04:00

Note, since writing this post, Riakc 1.0.0 has already been released and merged into opam. It fixes the below issue of Links (there is a typo in the release notes, 'not' should be 'now'. The source code can be found here. The 1.0.0 version number does not imply any stability or completeness of the library, just that it is not backwards compatible with 0.0.0.

Riakc is a Riak Protobuf client for Ocaml. Riakc uses Jane St Core/Async for concurrency. Riakc is in early development and so far supports a subset of the Riak API. The supported methods are:

ping
client_id
server_info
list_buckets
list_keys
bucket_props
get
put
delete

A note on GET

Links are currently dropped all together in the implementation, so if you read a value with links and write it back, you will have lost them. This will be fixed in the very near future.

As with anything, please feel free to submit issues and pull requests.

The source code can be found here. Riakc is in opam and you can install it by doing opam install riakc.

Usage

There are two API modules in Riakc. Examples of all existing API functions can be found here.

Riakc.Conn

Riakc.Conn provides the API for performing actions on the database. The module interface can be read here.

Riakc.Robj

Riakc.Robj provides the API for objects stored in Riak. The module interface can be read here. Riakc.Conn.get returns a Riakc.Robj.t and Riakc.Conn.put takes one. Robj.t supports representing siblings, however Riakc.Conn.put cannot PUT objects with siblings, this is enforced using phantom types. A value of Riakc.Robj.t that might have siblings is converted to one that doesn't using Riakc.Robj.set_content.

[ANN] Protobuf 0.0.2

2013-03-17T10:21:00.000-04:00

Protobuf is an Ocaml library for communicating with Google's protobuf format. It provides a method for writing parsers and builders. There is no protoc support, yet and writing it is not a top goal right now. Protobuf is meant to be fairly lightweight and straight forward to use. The only other Protobuf support for Ocaml I am aware of is through piqi, however that was too heavy for my needs.

Protobuf is meant to be very low level, mostly dealing with representation of values and not semantics. For example, the fixed32 and sfixed32 values are both parsed as Int32.t's. Dealing with being signed or not is left up to the user.

The source code can be viewed here. Protobuf is in opam, to install it opam install protobuf.

The hope is that parsers and builders look reasonably close to the .proto files such that translation is straight forward, at least until protoc support is added. This is an early release and, without a doubt, has bugs in it please submit pull requests and issues.

https://github.com/orbitz/ocaml-protobuf/tree/0.0.2/

Examples

The best collection of examples right now is the tests. An example from the file:

let simple =
  P.int32 1 >>= P.return

let complex =
  P.int32 1           >>= fun num ->
  P.string 2          >>= fun s ->
  P.embd_msg 3 simple >>= fun emsg ->
  P.return (num, s, emsg)

let run_complex str =
  let open Result.Monad_infix in
  P.State.create (Bitstring.bitstring_of_string str)
  >>= fun s ->
  P.run complex s

The builder for this message looks like:

let build_simple i =
  let open Result.Monad_infix in
  let b = B.create () in
  B.int32 b 1 i >>= fun () ->
  Ok (B.to_string b)

let build_complex (i1, s, i2) =
  let open Result.Monad_infix in
  let b = B.create () in
  B.int32 b 1 i1                 >>= fun () ->
  B.string b 2 s                 >>= fun () ->
  B.embd_msg b 3 i2 build_simple >>= fun () ->
  Ok (B.to_string b)

[ANN] ocaml-vclock - 0.0.0

2013-02-07T16:52:00.000-05:00

I ported some Erlang vector clock code to Ocaml for fun and learning. It's not well tested and it hasn't any performance optimizations. I'm not ready yet but I have some projects in mind to use it so it will likely get fleshed out more.

Vector clocks are a system for determining the partial ordering of events in a distributed environment. You can determine if one value is the ancestor of another, equal, or was concurrently updated. It is one mechanism that distributed databases, such as Riak, use to automatically resolve some conflicts in data while maintaining availability.

The vector clock implementation allows for user defined site id type. It also allows metadata to be encoded in the site id, which is useful if you want your vector clock to be prunable by encoding timestamps in it.

The repo can be found here. If you'd like to learn more about vector clocks read the wikipedia page here. The Riak website also has some content on vector clocks here.

Deconstructing Zed's K&R2 Deconstruction

2013-01-04T19:26:00.000-05:00

I recently stumbled upon Zed Shaw's deconstruction of K&R2. The post is well intended but, in my opinion, flawed. I believe Zed fails to make a valid argument and also fails to provide a valid solution to the issue he raises.

The chapter is clearly not finished, so this rebuttal might not be valid at the time of reading.

The Argument

The primary argument is that K&R2 is not an appropriate tool for learning C in our modern age. The example given is a function called copy which is effectively strcpy. Zed points out that if the function is not given a valid string, as C defines it, the behaviour of the function is undefined.

This provides a formal proof that the function is defective because there are possible inputs that causes the while-loop to run forever or overflow the target.

When presented with the rebuttal that the cases where it fails are not valid C strings, the response is that it doesn't matter:

... but I'm saying the function is defective because most of the possible inputs cause it to crash the software.

The problem with this mindset is there's no way to confirm that a C string is valid.

Also:

Another argument in favor of this copy() function is when the proponents of K&RC state that you are "just supposed to not use bad strings". Despite the mountains of empirical evidence that this is impossible in C code...

To reiterate, the problem with copy is that:

It depends on valid C strings to operate correctly
C strings are impossible to validate at run-time
The behaviour of copy is undefined for most values that are possible to be put into a char*

Proposed Solution

The solution is a function called safercopy which takes the lengths of the storages as input, allegedly guaranteeing the termination of safercopy:

In every case the for-loop variant with string length given as arguments will terminate no matter what.

What's Wrong With This

We can write what is wrong with safercopy using the exact same criteria Zed used for copy:

It depends on valid lengths to operate correctly
The lengths are impossible to validate at run-time
The behaviour of safercopy is undefined for most values that are possible to be put into a size_t (I am presuming that the lengths would be a size_t)

Additionally, Zed instills a false confidence in his safercopy. The function is no more guaranteed to terminate than copy when given bad input. Specifically, if the lengths are wrong causing the copy loop to go out of bounds of its storage it could easily overwrite the value of anything, including the lengths and pointer values in the loop its in. It could blow up, it could loop forever, who knows. It's undefined.

Finally, if it is hard to properly handle C strings, why should we think it is any easier to track the length of a string separately? Remember, in C the length of a string is encoded in the string itself by the location of the '\0'. The solution provided by Zed takes the length of the strings as separate input. But he provides no reason to believe developers will get this correct. If the solution proposed had been implementing a safe string library, I might be able to agree.

And that is the crux of the problem. It's not if K&R2 is any good or not, it's that the solution given isn't any better. It doesn't address the faults of C strings. There are known way safely handle C strings, the problem tends to be that its tedious so people get it wrong. Many C strings issues have to do with lack of proper allocation rather than forgetting the '\0'-terminator. In what way does the solution solve this problem?

If the solution given is no better than the problem it's solving, then it isn't a very good solution.

K&R2

Is K&R2 not suitable for teaching people C in this day? It has plenty of faults in it but I don't think this particular article, as it exists now, makes a compelling argument. Nor does it provide anything better than what it's critiquing.

Experiences using Result.t vs Exceptions in Ocaml

2013-01-04T15:37:00.000-05:00

Disclaimer: I have not compiled any of the example code in this post. Mostly because they are snippets meant to illustrate a point rather than be complete on their own. If they have any errors then apologies.

Previously I gave an introduction to return values vs exceptions in Ocaml. But a lot of ideas in software engineering sound good, how does this particular one work out in real software?

I have used this style in two projects. The first is a project that was originally written using exceptions and I have converted most of it to using return values. The second is one that was written from the start using return values. They can be found here and here. I make no guarantees about the quality of the code, in fact I believe some of it to be junk. These are just my subjective opinions in writing software with a particular attribute.

The Good

Expected Result

The whole system worked as expected. I get compile-time errors for all failure cases I do not handle. This has helped me catch some failure cases I had forgotten about previously, some of which would require an unlikely chain of events to hit, which would have made finding in a test harder, but obviously not impossible. In particular, ParaMugsy is (although the current rewrite does not cover this yet) meant to run in a distributed environment, which increases the cost of errors. Both in debugging and reproducing. In the case of opass, writing the DB is important to get right. Missing handling a failure here can mean the users database of passwords can be lost, a tragic event.

Not Cumbersome

In the Introduction I showed that for a simple program, return-values are no more cumbersome than exceptions. In these larger projects the same holds. This shouldn't really be a surprise though, as the monadic operators actually simulate the exact flow of exception code. But the 'not cumbersome' is half of a lie, which is explained more below.

Refactoring Easier

Ocaml is a great language when it comes to refactoring. Simply make the change you want and iterate on compiler errors. This style has made it even easier for me. I can add new failures to my functions and work through the compiler errors to make sure the change is handled in every location.

Works No Matter The Concurrent Framework

The original implementation of ParaMugsy used Lwt. In the rewrite I decided to use Core's Async library. Both are monadic. And both handle exceptions quite differently. Porting functions over that did return-values was much easier because they didn't rely on the framework to handle and propagate failures. Exceptions are tricky in a concurrent framework and concurrency is purely library based in Ocaml rather than being part of the language, which means libraries can choose incompatible ways to handle them. Return-values give one less thing to worry about when porting code or trying to get code to work in multiple frameworks.

The Bad

Prototyping Easier With Exceptions

The whole idea is to make it hard to miss an error case. But that can be annoying when you just want to get something running. Often times we write software in such a way that the success path is the first thing we write and we handle the errors after that. I don't think there is necessarily a good reason for this other than it's much more satisfying to see the results of the hard work sooner rather than later. In this case, my solution is to relax the ban on exceptions temporarily. Any place that I will return an Error I instead write failwith "not yet implemented". That way there is an easily grepable string to ensure I have replaced all exceptions with Error's when I am done. This is an annoyance but thankfully with a fairly simple solution.

Cannot Express All Invariants In Type System

Sometimes there are sections of code where I know something is true, but it is not expressible in the type system. For example, perhaps I have a data structure that updates multiple pieces of information together. I know when I access one piece of information it will be in the other place. Or perhaps I have a pattern match that I need to handle due to exhaustiveness but I know that it cannot happen given some invariants I have established earlier. In the case where I am looking up data that I know will exist, I will use a lookup function that can throw an exception if it is easiest. In the case where I have a pattern match that I know will never happen, I use assert. But note, these are cases where I have metaphysical certitude that such events will not happen. Not cases where I'm just pretty sure they work.

Many Useful Libraries Throw Exceptions

Obviously a lot of libraries throw exceptions. Luckily the primary library I use is Jane St's Core Suite, where they share roughly the same aversion of exceptions. Some functions still do throw exceptions though, most notably In_channel.with_file and Out_channel.with_file. This can be solved by wrapping those functions in return-value ones. The problem comes in: what happens when the function being wrapped is poorly documented or at some point can throw more exceptional cases than when it was originally wrapped. One option is to always catch _ and turn it into a fairly generic variant type. Or maybe a function only has a few logical failure conditions so collapsing them to a few variant types makes sense. I'm not aware of any really good solution here.

A Few Examples

There are a few transformations that come up often when converting exception code to return-value code. Here are some in detail.

Building Things

It's common to want to do some work and then construct a value from it. In exception-land that is as simple, just something like Constructor (thing_that_may_throw_exception ()). This doesn't work with return-values. Instead we have to do what we did in the Introduction post. Here is an example:

let f () =
  let open Result.Monad_infix in
  thing_that_may_fail () >>= fun v ->
  Ok (Constructor v)

Looping

Some loops cannot be written in their most obvious style. Consider an implementation of map that expects the function passed to it to use Result.t to signal failures. The very naive implementation of map is:

let map f = function
  | []    -> []
  | x::xs -> (f x)::(map xs)

There are two ways to write this. The first requires two passes over the elements. The first pass applies the function and the second one checks which value each function returned or the first error that was hit.

let map f l =
  Result.all (List.map f l)

Result.all has the type ('a, 'b) Core.Std.Result.t list -> ('a list, 'b) Core.Std.Result.t

The above is simple but could be inefficient. The entire map is preformed regardless of failure and then walked again. If the function being applied is expensive this could be a problem. The other solution is a pretty standard pattern in Ocaml of using an accumulator and reversing it on output. The monadic operator could be replaced by a match in this example, I just prefer the operator.

let map f l =
  let rec map' f acc = function
    | []    -> Ok (List.rev acc)
    | x::xs -> begin
      let open Result.Monad_infix in
      f x >>= fun v ->
      map' f (v::acc) xs
    end
  in
  map' f [] l

I'm sure someone cleverer in Ocaml probably has a superior solution but this has worked well for me.

try/with

A lot of exception code looks like the following.

let () =
  try
    thing1 ();
    thing2 ();
    thing3 ()
  with
    | Error1 -> handle_error1 ()
    | Error2 -> handle_error2 ()
    | Error3 -> handle_error3 ()

The scheme I use would break this into two functions. The one inside the try and the one handling its result. This might sound heavy but the syntax to define a new function in Ocaml is very light. In my experience this hasn't been a problem.

let do_things () =
  let open Result.Monad_infix in
  thing1 () >>= fun () ->
  thing2 () >>= fun () ->
  thing3

let () =
  match do_things () with
    | Ok _ -> ()
    | Error Error1 -> handle_error1 ()
    | Error Error2 -> handle_error2 ()
    | Error Error3 -> handle_error3 ()

Conclusion

Using return-values instead of exceptions in my Ocaml projects has had nearly the exact output I anticipated. I have compile-time guarantees for handling failure cases and the cost to my code has been minimal. Any difficulties I've run into have had straight forward solutions. In some cases it's simply a matter of thinking about the problems from a new perspective and the solution is clear. I plan on continuing to develop code with these principles and creating larger projects. I believe that this style scales well in larger projects and actually becomes less cumbersome as the project increases since the guarantees can help make it easier to reason about the project.

Introduction to Result.t vs Exceptions in Ocaml

2013-01-03T17:55:00.000-05:00

This post uses Jane St's Core suite. Specifically the Result module. It assumes some basic knowledge of Ocaml. Please check out Ocaml.org for more Ocaml reading material.

There are several articles and blog posts out there arguing for or against return values over exceptions. I'll add to the discussion with my reasons for using return values in the place of exceptions in Ocaml.

What's the difference?

Why does the debate even exist? Because each side has decent arguments for why their preference is superior when it comes to writing reliable software. Pro-return-value developers, for example, argue that their code is easier identify if the code is wrong simply by reading it (if it isn't handling a return value of a function, it's wrong), while exception based code requires understanding all of the functions called to determine if and how they will fail. Pro-exception developers argue that it is much harder to get their program into an undefined state because an exception has to be handled or else the program fails, where in return based code one can simply forget to check a function's return value and the program continues on in an undefined state.

I believe that Ocaml has several features that make return values the preferable way to handle errors. Specifically variants, polymorphic variants, exhaustive pattern matching, and a powerful static type system make return values attractive.

This debate is only worth your time if you are really passionate about writing software that has fairly strong guarantees about its quality in the face of errors. For a majority of software, it doesn't matter which paradigm you choose. Most errors will be stumbled upon during debugging and fairly soon after going into production or through writing unit and integration tests. But, tests cannot catch everything. And in distributed and concurrent code rare errors can now become common errors and it can be near impossible to reconstruct the conditions that caused it. But in some cases it is possible to make whole classes of errors either impossible or catchable at compile-time with some discipline. Ocaml is at least one language that makes this possible.

Checked exceptions

A quick aside on checked exceptions, as in Java. Checked exceptions provide some of the functionality I claim is valuable, the main problem with how checked exceptions are implemented in Java (the only language I have any experience in that uses them), is they have a very heavy syntax, to the point where using them can seem too burdensome.

The Claim

The claim is that if one cares about ensuring they are handling all failure cases in their software, return-values are superior to exceptions because, with the help of a good type system, their handling can be validated at compile-time. Ocaml provides a fairly light, non intrusive, syntax to make this feasible.

Good Returns

The goal of a good return value based error handling system is to make sure that all errors are handled at compile-time. This is because there is no way to enforce this at run-time, as an exception does. This is a good reason to prefer exceptions in a dynamically typed language like Python or Ruby, your static analyzers are few and far between.

In C this is generally accomplished by using a linting tool that will report an error if a function's return value is ignored in a call. This is why you might see printf casted to void in some code, to make it clear the return value is meant to be ignored. But a problem with this solution is that it only enforces that the developer handles the return value, not all possible errors. For example, POSIX functions return a value saying the function failed and put the actual failure in errno. How, then, to enforce that all of the possible failures are handled? Without encoding all of that information in a linting tool, the options in C (and most languages) are pretty weak. Linting tools are also separate from the compiler and vary in quality. Writing code that takes proper advantage of a linting tool, in C, is a skill all of its own as well.

Better Returns

Ocaml supports exceptions but the compiler provides no guarantees that the exceptions are actually handled anywhere in the code. So what happens if the documentation of a function is incomplete or a dependent function is changed to add a new exception being thrown? The compiler won't help you.

But Ocaml's rich type system, combined with some discipline, gives you more power than a C linter. The primary strength is that Ocaml lets you encode information in your types. For example, in POSIX many functions return an integer to indicate error. But an int has no interesting meaning to the compiler other than it holds values between INT_MIN and INT_MAX. In Ocaml, we can instead create a type to represent the errors a function can return and the compiler can enforce that all possible errors are handled in some way thanks to exhaustive pattern matching.

An Example

What does all of this look like? Below a contrived example. The goal is to provide a function, called parse_person that takes a string and turns it into a person record. The requirements of the code is that if a valid person cannot be parsed out, the part of the string that failed is specified in the error message.

Here is a version using exceptions, ex1.ml:

open Core.Std

exception Int_of_string of string

exception Bad_line of string
exception Bad_name of string
exception Bad_age of string
exception Bad_zip of string

type person = { name : (string * string)
              ; age  : Int.t
              ; zip  : string
              }

(* A little helper function *)
let int_of_string s =
  try
    Int.of_string s
  with
    | Failure _ ->
      raise (Int_of_string s)

let parse_name name =
  match String.lsplit2 ~on:' ' name with
    | Some (first_name, last_name) ->
      (first_name, last_name)
    | None ->
      raise (Bad_name name)

let parse_age age =
  try
    int_of_string age
  with
    | Int_of_string _ ->
      raise (Bad_age age)

let parse_zip zip =
  try
    ignore (int_of_string zip);
    if String.length zip = 5 then
      zip
    else
      raise (Bad_zip zip)
  with
    | Int_of_string _ ->
      raise (Bad_zip zip)

let parse_person s =
  match String.split ~on:'\t' s with
    | [name; age; zip] ->
      { name = parse_name name
      ; age  = parse_age age
      ; zip  = parse_zip zip
      }
    | _ ->
      raise (Bad_line s)

let () =
  (* Pretend input came from user *)
  let input = "Joe Mama\t25\t11425" in
  try
    let person = parse_person input in
    printf "Name: %s %s\nAge: %d\nZip: %s\n"
      (fst person.name)
      (snd person.name)
      person.age
      person.zip
  with
    | Bad_line l ->
      printf "Bad line: '%s'\n" l
    | Bad_name name ->
      printf "Bad name: '%s'\n" name
    | Bad_age age ->
      printf "Bad age: '%s'\n" age
    | Bad_zip zip ->
      printf "Bad zip: '%s'\n" zip

ex2.ml is a basic translation of the above but using variants. The benefit is that the type system will ensure that all failure case are handled. The problem is the code is painful to read and modify. Every function that can fail has its own variant type to represent success and error. Composing the functions is painful since every thing returns a different type. We have to create a type that can represent all of the failures the other functions returned. It would be nice if each function could return an error and we could use that value instead. It would also be nice if everything read as a series of steps, rather than pattern matching on a tuple which makes it hard to read.

ex3.ml introduces Core's Result.t type. The useful addition is that we only need to define a type for parse_person. Every other function only has one error condition so we can just encode the error in the Error variant. This is still hard to read, though. The helper functions aren't so bad but the main function is still painful.

While the previous solutions have solved the problem of ensuring that all errors are handled, they introduced the problem of being painful to develop with. The main problem is that nothing composes. The helpers have their own error types and for every call to them we have to check their return and then encompass their error in any function above it. What would be nice is if the compiler could automatically union all of the error codes we want to return from itself and any function it called. Enter polymorphic variants.

ex4.ml Shows the version with polymorphic variants. The nice bit of refactoring we were able to do is in parse_person. Rather than an ugly match, the calls to the helper functions can be sequenced:

let parse_person s =
  match String.split ~on:'\t' s with
    | [name; age; zip] ->
      let open Result.Monad_infix in
      parse_name name >>= fun name ->
      parse_age  age  >>= fun age  ->
      parse_zip  zip  >>= fun zip  ->
      Ok { name; age; zip }
    | _ ->
      Error (`Bad_line s)

Don't worry about the monad syntax, it's really just to avoid the nesting to make the sequencing easier on the eyes. Except for the >>=, this looks a lot like code using exceptions. There is a nice linear flow and only the success path is shown. But! The compiler will ensure that all failures are handled.

The final version of the code is ex5.ml. This takes ex4 and rewrites portions of it to be prettier. As a disclaimer, I'm sure someone else would consider writing this differently even with the same restrictions I put on it, I might even write it different on a different day, but this version of the code demonstrates the points I am making.

A few points of comparison between ex1 and ex5:

The body of parse_person is definitely simpler and easier to read in the exception code. It is short and concise.
The rest of the helper functions are a bit of a toss-up between the exception and return-value code. I think one could argue either direction.
The return-value code has fulfilled my requirements in terms of handling failures. The compiler will complain if any failure parse_person could return is not handled. If I add another error type the code will not compile. It also fulfilled the requirements without bloating the code. The return-value code and exception code are roughly the same number of lines. Their flows are roughly equal. But the return-value code is much safer.

Two Points

It's not all sunshine and lollipops. There are two issues to consider:

Performance - Exceptions in Ocaml are really, really, fast. Like any performance issue, I suggest altering code only when needed based on measurements and encapsulating those changes as well as possible. This also means if you want to provide a safe and an exception version of a function, you should probably implement the safe version in terms of the exception verson.
Discipline - I referred to discipline a few times above. This whole scheme is very easy to mess up with a single mistake: pattern matching on anything (_). The power of exhaustive pattern matching means you need to match on every error individually. This is effectively for the same reason catching the exception base class in other languages is such a bad idea, you lose a lot of information.

Conclusion

The example given demonstrates an important point: code can become much safer at compile time without detriment to its length or readability. The cost is low and the benefit is high. This is a strong reason to prefer a return-value based solution over exceptions in Ocaml.

C++11 is unsafe

2012-08-03T12:53:00.000-04:00

With all due respect for Mr. Sutter, his claim that C++11 is "as clean and safe as any other modern language, and still the king of fast", is simply false. Clean and fast are up to the particular developer and benchmark but safe is more objective, and C++11 is not safe.

It is important for C++ to maintain backwards compatibility with previous versions, so C++11 supports C++03. C++03 is not safe. So, by definition, C++11 is also not safe. For those who are new to the term, safety generally comes down to how bad my program can screw up. Even a wrong Java application will not segfault the VM. C++11 adds some tools to make causing this behavior harder, but it is by no means impossible. C++11 still has pointers. It still has pointer arithmetic.
"But", you say, "we are talking about just the features C++11 added to the language". Still false. Take std::array, which was added in C++11. This code is unsafe: std::array<int 1> arr; arr[2] = 1;. And consider a lambda that captures a reference to a variable that goes out of scope. Perfectly valid C++11. Perfectly unsafe.
Threading in C++11 also allows you to do unsafe things. Just try modifying two non-atomic variables concurrently. You have no guarantees of what will happen.
"BUT", you yell, "he said 'modern language' so...." Indeed, so? I'm not sure what Herb Sutter considers a modern language but let's just take some languages that are somewhat popular today:
- Java, C# - Considered safe.
- Clojure, Scala - On the JVM, so one would consider them safe.
- Python, Ruby, Perl - These are all considered safe languages. You cannot, without effort, access memory you should not.
- Ada - Hah!
- F# - Runs on .Net, safe.
- Ocaml, Haskell - The languages themselves are considered safe but you can do whatever you want if you drop down to C.
So which languages, exactly, is Mr. Sutter referring to when he says C++11 is as safe as any modern language? I have no idea.

The problem, though, is that it's OK to be honest about C++'s lack of safety. That is the compromise I am agreeing to when I use C++. I want the benefits of it and I understand that I am making a sacrifice. I don't want to be told C++ is something that it is not. This "C++11 is safe" talk is nonsense and not a reflection of reality.

You can follow the discussion on reddit: http://www.reddit.com/r/programming/comments/xml97/c11_is_unsafe/

The Erl Next Door

2012-07-09T03:33:00.000-04:00

The Erl Next Door, also known as TEND, is a project created for SpawnFest2012 by MononcQc and I. The hope of TEND is to make playing with Erlang easier.

Ever wanted to show someone your cool Erlang hack or teach them a something about Erlang, but getting their project setup was too complicated? With TEND, you can now provide them a URL that will setup all of the dependencies.

TEND takes three kinds of URLs:

An HTML document. The document can link to other document types through LINK or A tags but setting REL to "erlang-tend".
A raw .erl file.
A zip of an OTP application. This will be compiled for you, assuming it has a Makefile, rebar, or Emakfile. You can link to the ZIP link in Github.

The project page has all of the details, but the basic idea is in the shell you can run tend:load with a URL and everything will be loaded. In fact, this blog post can be loaded with TEND. This post links to the calc example from LYSE. Once it's loaded you can do calc:rpn("10 10 + 2 /"). . Just do:

tend:load("http://functional-orbitz.blogspot.se/2012/07/erl-next-door.html").

The official github repo is here: https://github.com/ferd/tend

We have also made a small demonstration site: http://ferd.ca/tend/

The SpawnFest2012 repo: https://github.com/Spawnfest2012/tend

So Pythonistas, you want to get rid of the GIL...

2012-07-06T02:43:00.001-04:00

There is no shortage of hate for the GIL. There is a slight problem, though. The GIL might be the cause of Python's single-core-only utilization, but it's not the root reason.

The origin of the GIL is to keep the interpreter internals sane when running with multiple kernel threads. When it comes to problems involving parallelism, the easy solution is simple: serialize everything. So you get the GIL. Now, if that were the end of it, getting rid of the GIL would probably not that challenging. But, for better or for worse, the GIL also made a number of operations atomic that would not be in other languages. The Python FAQ has this example. Python programmers made use of these benefits in CPython, regardless of if the language designers actually guaranteed them. But at this point, it doesn't matter. The amount of code that depends on this behavior is large.

Greg Stein attempted to remove the GIL in Python 1.5, but programs ran about 2x slower than with the GIL. The reason being: in order to give people the guarantees they have grown accustom to in the previous paragraph, you need to do the locking on those operations for them. Where there was one a single lock, you now have a lock per object. And it is difficult to determine if an object will be accessed by multiple threads so the naive solution is to lock the object every time it's accessed in a way that needs to be atomic. This kind of fine-grained locking is expensive.

So this attempt didn't work. And it wasn't really a big deal. Multicore CPUs weren't that ubiquitous and people weren't doing things that would benefit that much from multiple cores. But now multicore is the rage and people believe that their Python programs will benefit from it. The common advice is simply to use multiprocessing, but people tend to find this inadequate.

PyPy is trying to solve this using STM, and blog their progress. PyPy doesn't seem to be a solution for a lot of people yet and it's unclear how successful the STM approach will be.

If you really think multicore support is important to Python, then you don't want to pitch a fit about the GIL. What you want to do is convince the Python designers that you are OK with giving up those guarantees you have been taking advantage of over the years. You will rewrite your code to not make use of the guarantees. Then they can get rid of the fine-grained locking.

But... before you say "sure", make sure you know what you're getting into. If L1.pop() is no longer thread-safe, then what does it mean if two threads access L1 in parallel? I don't know much about threaded memory models but it could get pretty complicated. You might not be able to define all states of the program at that point.

In the future, before you pour too much hate on the GIL, remember: really just a symptom. The actual problem is, for simplicity, Python makes a number of guarantees that make executing performant code harder without the GIL than with it. And also, not everyone hates the GIL, some people are fond of the guarantees. Like this guy.

My Mental Evolution In Making A Language

2012-06-06T07:17:00.000-04:00

Over the past year I've been thinking about how to make my own language. It's a pretty big undertaking and I've been too busy with other projects to put any serious effort into it, so I just think about it when I go for a walk. Mostly I try to convince myself to not make a language but it sounds like a lot of fun. With the explosion of JavaScript, and so few people wanting to write JavaScript, it seems like other people think writing a language sounds like fun too. We have CoffeeScript, IcedCoffeeScript, Roy, JSX, Amber, Dart, and many others. And that's just a list of recent languages that compile to JavaScript. Many more languages have come out, relatively recently, such as Go, Fancy, Elixir and Loop. Most of these languages will be minor dents in the history of programming languages, but that is OK. Not everything has to be important to be worth doing. But they have gotten me thinking. Should I create a language? What would it have to offer? Is the effort worth it? Below are thoughts that have steered my decision process for when I get some time to start hacking on a language. The thoughts are targeted at me, someone who has little experience building a language, not a professional.

Why?

I think there are three reasons I should consider making a language:

Just for fun - Every hobby project should be fun! If I built a language just for fun, though, I think I would prefer to implement someone else's language. Depending on what language I chose, I would learn a lot about language design without having to make a lot of mistakes myself, I could learn from others mistakes. I would also have other implementations of it to compare to my own.
Experiment with semantics - This is the biggest reason for me. I have some unoriginal semantic ideas I'd like to understand better and I think creating them is the best way to go about that.
Experiment with syntax - C'mon now, there is no reason to experiment with syntax any more. God already gave it to us. But seriously, I find this the least compelling reason. I haven't seen recent language that does something with syntax I consider all that important. I think it will take a very clever person to change syntax enough to really matter. Removing semicolons doesn't really matter much to me. I can represent the semantics I'm interested in just fine in an existing syntax.

Can I stand on the shoulders of giants?

What languages already exist that have most of the semantics I care about? That way I can just extend it with the new semantics I care about. The clear upside is, even if the community for that language is small, they already know it so they just have to learn the extensions I added. It makes evaluating the new ideas easier. There are also a lot of language options that I'm likely to mess up or just not be interested in. I might as well go with whatever the professionals went with unless I have a strong opinion. Objective C, AliceML and Vala are examples of what I mean. In each case, the languages either took an existing language and extended it or was heavily inspired by an existing language.

The other side of this is deciding what backend to use. Should I build an interpreter or should I target a VM, like the JVM? Maybe JavaScript? Or should I target making native binaries? Maybe produce C. Or just write LLVM-IR myself? Or build the optimizer and backend myself? It depends on the real reason I want to make the language. Do I want people to use it? Or do I just want learn the entire stack on a compiler? If I implement an existing language, for example SML, maybe the twist I could add is having it target LLVM-IR. Just because I'd be implementing someone else's language doesn't mean there isn't any room for some innovation.

Disirregardlessly, a lot of smart people have thought very hard about building languages. I should take knowledge from them whenever possible (so basically, always). If I think I'm being original in an idea, I'm probably not. ALGOL probably has it...it always does. And I do not mean that in a defeatist way, new combinations of old ideas is progress. What I mean is a lot of these ideas have been thought out already, we don't know about them because they failed, and as a creator I should be aware of that.

Sometimes it's better to stay silent

Anders Hejlsberg was interviewed about the lack of checked exceptions in C# and said:

I'm a strong believer that if you don't have anything right to say, or anything that moves the art forward, then you'd better just be completely silent and neutral, as opposed to trying to lay out a framework.

I like this idea. Language design is often about trade-offs. If I am going to introduce a new concept because I don't like an old one, I better be sure it doesn't just swap one set of problems for another. At the very least, I've created a new complexity for people who want to evaluate my language to learn, and if it doesn't move the language forward then there isn't any benefit to the new complexity.

Keep It Simple

I think we should all take a lesson from Niklaus Wirth. When he was designing Oberon he took Modula-2 and greatly simplified it. The syntax is minimal and the language definition is tiny. Maybe it's too small, I don't know. But the language is very easy to think about and implement because of it's minimalism. Complexity is a burden. It's really hard to avoid complexity too, just look at the results of some trivial operators in JavaScript. JavaScript is simple in many ways but those ways interact to make complex results. These little, harmless at first, interactions cause painful bugs later on. On top of that, too much complexity makes for a more cumbersome implementation, which makes for taking longer to create. A simple language can get me something to play with sooner.

REPL

I don't actually mean "should my language have a REPL" here (one would be nice), but I mean every few weeks I take this list of questions and integrate them with what I've learned and evaluate if my previous conclusions still apply. Since I'm so new to designing a language the results tend to change. I think anyone interested in making their own language should do this. I've come up with ideas I thought were new and cool only to do some research and find out their are silly and absurd. Unfortunately, that's how I feel when I see a lot of these new toy languages being created. With a little bit of research the author could have made something much more impressive and easier to use. But then, I also feel a similar way about C++. I think a lot of people believe you can just cobble together a language and you'll get something great. It's hard to design a good, consistent, simple, extendable language. Language design is a profession and should be respected as such.

Phantom type examples in Ocaml

2012-05-17T15:55:00.000-04:00

What are phantom types

Phantom types are a way for multiple types to have the same exact underlying representation but still be seen as a distinct type by the compiler. The common example is units of measurements. Feet and meters are both distances and are best represented as an int or float, but it makes no sense to add a foot to a meter. Phantom types allow you to solve this and still represent a foot and a meter exactly the same. One can think of this as the opposite of duck typing. In duck typing, if you only care if the current value you are working with has four legs, a cat a dog and a table are the effectively the same thing. In phantom types you care about what something actually represents and that means they are different, even though underneath they are represented with the same type.

The trick to phantom types is the module system. In Ocaml you have a module and that module defines the interface it exposes to the world. For phantom types part of that interface is providing a type, for example rstring, but not giving a concrete definition of it, leaving it as an abstract concept. Then the module definition is explicit that rstring is actually a string and it can use a type of rstring just like a type of string. So outside the module the compiler just knows some type exists by a name but no more, and inside the module the compiler knows that the same type is actually a string (for example) and all the string APIs are valid on it.

The repo for the examples is located here.

Examples

estring - Encoded strings

This code is my paraphrasing of Martin Jambon's example of phantom typing located here.

When working with strings you might want to make sure you don't mix and match strings of incompatible encodings. You probably don't want to append a utf8 string to a utf16 string. But underneath it all, both of these can be represented the same way and a bunch of operations are going to be the same. Concatenating two utf8 strings is the same operation as concatenating two utf16 strings. The estring module lets you express this. For example, we can write code like this:

let () =
  let s1 = Estring.make_utf8 "hello" in
  let s2 = Estring.make_utf8 "world" in
  let s3 = Estring.concat (Estring.make_utf8 " ") [s1; s2] in
  Printf.printf "%s\n" (Estring.str s3)

This says that we want to make two utf8 strings from a regular string, and then concatenate them together with " " as the separator. Then we convert that back to a string for output. This works because everything is the same estring type. The following code will not compile because things aren't the same type:

let () =
  let s1 = Estring.make_utf8 "hello" in
  let s2 = Estring.make_utf16 "world" in
  let s3 = Estring.concat (Estring.make_utf8 " ") [s1; s2] in
  Printf.printf "%s\n" (Estring.str s3)

One weakness of estring is that it is not extendible. Remember, it is only inside the estring module that the compiler knows it is a string. That means other modules cannot define a latin1 encoding type and use the estring module to create a latin1 estring type.

rstring - Read only string

Another use of phantom types is to provide read-only interfaces to mutable types. With rstring we provide a phantom type called rstring and then define a subset of the string API that we want to expose. The major downside to rstring is that it copies the string when you turn a string to an rstring and vice versa. This is for obvious reasons: you can't ensure that nobody else has a reference to the string unless you know you made the version you're referencing. Maybe something like Linear types would work well here. In a real system I would consider offering an unsafe_make and an unsafe_str set of functions that don't copy. That way you could avoid the copy if that is important (it probably isn't) but at least you know you're polluting your code with unsafe's. Code for rstring is pretty straight forward, usage looks like:

let () =
  let s1 = Rstring.make "hello" in
  let s2 = Rstring.make "world" in
  let s3 = Rstring.concat " " [s1; s2] in
  Printf.printf "%s\n" (Rstring.str s3)

Other ideas

Another common example of using phantom types that I did not provide is something like a file or socket interface. You can encode information, such as if you opened the file for reading or writing or both and then make it so it is a compile-time error if you try to write to a file opened for reading.

What is Covariance and Contravariance Anyways?

2011-10-29T12:48:00.000-04:00

If you have been following the talk around Google's latest language, Dart, then you might have heard things like "covariant arrays are widely regarded as a mistake" (src). But what is covariance? Why are covariant arrays considered a mistake? Who has covariant arrays? Why?

What is Covariance and Contravariance All About?

class X:
    pass

class Y(X):
    def zoom(self):
        print 'Y.zoom'

class A:
    # Returns a value of type Y
    def foo(self):
        return Y()

class B(A):
    # Returns a value of type X
    def foo(self):
        return X()


# Takes a value of type A
def bar(a):
    y = a.foo()
    y.zoom()


# Call bar with a B
bar(B())

Is this code safe? Often, in an OO language, inheritance is used to create a type that looks just like another type, and pass values of this new type to functions that expect the other type and without breaking anything. So one might expect, looking at this code, that passing a B to bar would be safe. But it isn't. The problem is B.foo returns an X where A.foo returns a Y. bar then goes on to try to call a member function that only exists in Y, so the code will fail at runtime. B has violated the interface that A established. Python doesn't give us any tools to discover this error without executing the code, but statically typed languages like C++ and Java do. These languages can define the types that member functions of a subclass can take, and return, in order to be valid through covariant and contravariant properties. The concepts are larger than member functions but it's a good place to start.

Definitions

Before we can talk about covariance and contravariance we need to know what subtyping is. I'll restrict this to classes in an object-orientated language because that's what my knowledge is limited to, but these concepts go beyond that. An example of a subtype in an OO language would be a class that extends or inherits from another class, for example in C++ you would do class S : public T for S to extend T. The common way type theorists write this is S <: T. This can be read in a few ways: "any term of type S can safely be used in a context where a term of type T is expected" or "every value described by S is also described by T" (Pierce, 182). Thus, S is a more specific type than T, and conversely T is a more generic type than S. Given this, covariance and contravariance define when a more specific or more generic type is acceptable in a particular context.

The definition from the wiki article:

covariance and contravariance refers to the ordering of types from narrower to wider and their interchangeability or equivalence in certain situations

And their definitions:

Covariant - Given the types S and T specified above, covariance is when a more specific type, S, can be used when a more generic type, T, is specified. This applies to functions, a function that returns S can be used in the same context as a function that returns T.
Contravariant - When the more generic type, T, can be used where the more specific type, S, is specified. A function that takes a T can be used in the same context as a function that takes a S.
Invariant - The type specified is the only type that can be used.

My method for remembering the distinction is contravariant is larger than covariant, so it means narrower to wider.

Covariant Example

Consider this example from C++:

class X {};
class Y : public X {};
class Z : public Y {};

class A {
public:
  virtual Y *foo() { return new Y(); }
};

class B : public A {
public:
  virtual Z *foo() { return new Z(); }
};

Here we have three classes X, Y, and Z which we will return from a virtual function in classes A and B. This code is valid because B::foo is returning a narrower type than A::foo because Z is a subtype of Y. But what happens if we make B::foo return a wider type?

class X {};
class Y : public X {};
class Z : public Y {};

class A {
public:
  virtual Y *foo() { return new Y(); }
};

class B : public A {
public:
  virtual X *foo() { return new X(); }
};

$ g++ -W -Wall -ansi -pedantic -c foo.cc
foo.cc:12: error: invalid covariant return type for ‘virtual X* B::foo()'
foo.cc:7: error:   overriding ‘virtual Y* A::foo()’

Contravariant Example

As we saw above, one can be covariant on return types, but what about on parameters to a member function? One has to be the opposite actually: contravariant. Unfortunately, gcc (or the version I'm using at least) doesn't give us an error about contravariance, instead we have to add a little bit of code using B::foo to see the problem:

class X {};
class Y : public X {};
class Z : public Y {};

class A {
public:
  virtual void foo(Y &y) { }
};

class B : public A {
public:
  virtual void foo(X &x) { }
};


int main() {
  B b;
  Y y;
  Z z;
  
  b.foo(y);
  b.foo(z);
  return 0;
}

This behaves as we would expect. We can call B::foo with an X, Y, or Z, since B::foo takes the superclass to all of them (X). But what if we modify it to take Z:

class X {};
class Y : public X {};
class Z : public Y {};

class A {
public:
  virtual void foo(Y &y) { }
};

class B : public A {
public:
  virtual void foo(Z &z) { }
};


int main() {
  B b;
  Y y;
  Z z;
  
  b.foo(y);
  b.foo(z);
  return 0;
}

$ g++ -W -Wall -ansi -pedantic  foo.cc
foo.cc: In function ‘int main()’:
foo.cc:24: error: no matching function for call to ‘B::foo(Y&)’
foo.cc:14: note: candidates are: virtual void B::foo(Z&)

The error we get is about B::foo(Y&) not existing. Because Z is a narrower type than Y, we have broken the interface of A, so B is no longer a valid substitute for A.

But You Already Knew This

Intuitively, this is the way that makes the most sense. The use case for inheritance is generally when a function takes a certain type as input but you want to pass your own type. In order to ensure that the function, bar, can successfully use the type you pass it, all of the methods that bar calls on your type need to take types that bar can pass to it and return types that bar knows how to use. Take this code:

class X {};
class Y : public X { /* ... */ };
class Z : public Y { /* ... */ };

class A {
public:
  virtual Y *foo(Y &y) { /* ... */ }
};

void bar(A &a) {
  Y y;
  Y *y_ptr = a.foo(y);
}

If you were to pass your own type, B, which is a subtype of A (that is, B <: A) the only way we can implement B::foo such that our function bar still works is if B::foo takes an object as input that is a supertype to Y and returns a type that is a subtype of Y. Consider if it were valid for B::foo to want a Z. B::foo could try to call a member function that exists only in Z. But since bar is passing an object of type X, which we can't guarantee has member function that exists only in Z, our code would be unsafe. A similar argument can be made with the return type of Y and X.

What About Arrays?

Now that we get what these terms mean, how do they apply when it comes to arrays of objects? In C++, arrays are not covariant, however in Java they are:

public class Test {
    public static void foo() {
        Test[] tests = new Test[10];
        Object[] objects = tests;
    }
}

This code looks innocent enough, but there is a problem: what if I modify objects? For example:

public class Test {
    public static void foo() {
        Test[] tests = new Test[10];
        Object[] objects = tests;
        objects[0] = new Integer(1);
    }
}

This code compiles fine, but executing it gives the following error (changed foo to main in order to run it):

Exception in thread "main" java.lang.ArrayStoreException: java.lang.Integer
 at Test.main(Test.java:5)

This is what people mean when they refer to covariant arrays being a mistake. A compile-time check has been moved to runtime, which means you don't know if your code is safe without running it. Note, though, that what makes covariant arrays a problem in Java is modifying the array, not reading it. If we could somehow ensure that the objects array was immutable, covariant arrays in Java would be safe. Why does Java have covariant arrays then? Originally Java did not have parametric polymorphism (generics), without covariant arrays it would be impossible to write a generic function that copies one array to another. Instead one would have to write a copy function for every type of array they wish to copy (Pierce, 188). While the behavior is not ideal, at the least the JVM can provide some level of safety by performing type checks at runtime.

Unfortunately, I cannot shed any light on why Dart has covariant arrays. It has done everything from befuddle to enrage type theorists (null pointers, in 2011, why?!). Dart appears to support parametric polymorphism, the lack of which lead to Java having covariant arrays. It is a shame that Dart seems to be repeating, what some, would classify as mistakes for no obvious benefit. Dart is not the only one sinning though, C# also has covariant arrays and seems to have simply copied them from Java.

Conclusion

I'm no type theorist, this post went through several iterations with comments and corrections from friends smarter than I, so this post only scratches the surface of covariance and contravariance. Hopefully it is enough to understand what people are referring to when the terms come up in casual Reddit conversation. If you're interested in more, Benjamin Pierce's book, Types and Programming Languages, has an entire chapter dedicated to subtyping. While I haven't finished the book yet, what I have read is excellent. Even if you skip the math that looks complicated you will come out better.

Thanks

Special thanks to @dklee. Much of this post is based off of emails we have exchanged. Thanks to @j2labs, and @apgwoz, as well, for help in making this post.

Your Favorite Language is Probably Terrible at Concurrency too

2011-10-02T22:41:00.002-04:00

The internet has been ablaze with posts on NodeJS, to some people's joy and to others chagrin. Some have claimed that Node solves a long standing problem in concurrency, saying:

People are starting to build more on Node.js because it’s superior and it solves these problems that have always existed. I/O has been done wrong for the last 30 years

In my opinion, Node is bad at concurrency, and guess what? Your language probably isn't any better. But let's make sure we're on the same page first.

Language/Framework - Most languages do not have concurrency as a first class citizen. So when I say "your language is bad at concurrency", what I really mean is "the options available for doing concurrent things in your language are bad". The former just rolls off your tongue better.
Concurrency - What do I mean by concurrency? I mean a model by which you can define actions that can happen at the same time. That could mean running multiple pieces of code in parallel or interleaving them. Specifically in this post I am concerned with solving problems where the number of things you want to do concurrently is significantly larger than the number of cores you have.

There are a lot of options for concurrency out there. You may have heard of things like Pi calculus, Join calculus, Communicating Sequential Processes, Event-loops and Coroutines. Your language probably has an implementation of one of these, or a conceptual subset. NodeJS and Twisted implement an event-loop. Coroutines is the path Python's Gevent has taken, as well as libraries for Ruby, C, and C++. Go has chosen Communicating Sequential Processes. But all these distinctions aren't important unless I can say what I consider a good solution to concurrency.

Ideally, a good solution should have the following properties:

Scaling - If you are writing concurrent software you've already decided handling one thing at a time is not a scalable solution, so now you want to handle multiple things at a time. An ideal solution should scale to the limits of the machine. That means making use of multiple cores, if available.
Reasoning - It should be easy for a reader of your code to reason about what it does. Edge cases and gotcha's should be limited. Preferably one shouldn't even be aware of the concurrent aspects of the code unless they need to be.
Debugging - Debugging should not be painful. Standard tools like stacktraces should be meaningful. Tracing the path a piece of code takes shouldn't be harder than launching the space shuttle.

My claim is that very few concurrent solutions meet these criteria. But let me be clear, I'm not saying this is the only way you should judge selecting a solution. There is a Python library that does basically everything you think you need and it will be really hard to re-implement that functionality in another language? Well, maybe dealing with Python's concurrency shortcomings is less work than rewriting the library.

Scaling

Most languages were built for writing serial code. Memory is accessible by any piece of code in the process and it is assumed that nothing interesting happens between two function calls. But modern computers are not fast enough to do all the work programmers want them to do in serial and these languages have a lot of momentum behind them. For valid reasons, it is challenging to just move to another solution. Instead, we duct tape concurrency on top of these serial languages. One problem is that some of these languages can't even run code in parallel (that is, have two functions running at the same exact time) even if they wanted to. Python and Ocaml have a global lock that restricts this. In other languages it's just too much coordination to do safely. In C and C++ it can be too hard and time consuming to coordinate distributing concurrent work over multiple threads. For this reason, many mainstream solutions to concurrency are limited to running on a single core. It's insane, right? I can buy a laptop with, ostensibly, 8 cores now, yet a program written in most mainstream languages cannot make use of more than one.

For this reason, most solutions fail to be scalable. For example, NodeJS, Twisted, Ocaml/Lwt, and Gevent: from the point-of-view of a user of these frameworks, their code not only cannot run on multiple cores, but it depends on it. Consider some Twisted code that downloads N web pages and appends the result to a list:

def downloadUrls(urls):
    d = defer.Deferred()
    ret = []
    def _returnWhenDone(_):
        if len(ret) == len(urls):
            d.callback(ret)
    for url in urls:
        downloadDefer = downloadUrlAsString(url)
        downloadDefer.addCallback(lambda s : ret.append(s))
        downloadDefer.addCallback(_returnWhenDone)
    return d

Ignoring my failure to handle failures, this code is acceptable Twisted, and it could not work if Python suddenly got the ability to run code on multiple cores and Twisted used it. The reason being, there is no coordination around the ret.append(s) line. What if two threads were to try to append to ret at the same time? NodeJS and Gevent have the same idea in mind. Almost no data access is surrounded by a mechanism to coordinate multiple pieces of code accessing it at the same time. The result is, none of the code using these frameworks can be run on multiple cores. If CPython or V8 got multicore support it would take a rewrite of all of the code to make use of it.

But, you say, who cares? "I can just spin up N instances of my program, where N is the number of cores on my machine. I can easily scale that way". You can't even get concurrency right and now you want to move into distributed programming? Who are you fooling? But seriously, the problem is your code now needs to be "location aware". If you want to do something with object X, you have to be aware of where object X lives. This adds another layer of complexity to your system. Without a good way of communicating between instances you are limited to solving embarrassingly parallel problems or pushing the concurrency to another software layer. Either way, you aren't actually solving the problem with your framework. Luckily, a lot of what people want concurrency for is serving webpages, which requires almost no interprocess communication right now.

Reasoning

No matter how you slice it, writing concurrent code is hard. When it comes to serial code, looking at it and knowing what it does is as simple as understanding how each function operates given the current state of the program. But with concurrent code, the state of the program is changing while a function runs. Understanding a concurrent program involves understanding how the concurrent components are interacting with each other. Some solutions make this easier than others.

Take the following piece of example NodeJS code:

var db = require('somedatabaseprovider');
app.get('/price', function(req, res) {
  db.openConnection('host', 12345, function(err, conn) {
    conn.query('select * from products where id=?', [req.param('product')], function(err, results) {
      conn.close();
      res.send(results[0]);
    });
  });
});

The amount of syntax is enormous. There is a huge amount of line noise for what should look, at worst, like this:

var db = require('somedatabaseprovider');
app.get('/price', function(req, res) {
    var conn = db.openConnection('host', 12345)
    var result = conn.query('select * from products where id=?', [req.param('product')])
    conn.close();
    res.send(results[0]);
});

If you want to add proper error handling, the situation gets worse with callback code. Twisted has attempted to solve this by encapsulating code flow in an object called a Deferred, but the problem remains: a unit of work in callback-based code is not a function, like one is used to in serial code, it is work to do between events. Like the above example code showed, there isn't a function that connects to a db, does a query, and returns the result. There is a function to open a db connection, another function for when that is done and to do the db query, and another function to handle the result. You have defined three functions where you previously needed one. More importantly, you have to define functions not because it makes your code clearer but because the framework requires it.

Given how negatively this affects code, there are a lot of attempted solutions. Twisted, for example, allows one to use the defer.inlineCallbacks decorator so a function can use generators to express asynchronous code. Our previous NodeJS code might look like this:

@defer.inlineCallbacks
def handlePrice(req, res):
    conn = yield db.openConnection('host', 12345)
    result = yield conn.query('select * from products where id=?', [req.param('product')])
    yield conn.close()
    res.send(results[0])

app.get('/price', handlePrice)

In many ways this is an improvement but it does have its limitations.

The NodeJS community has been at work solving this problem for themselves too. One person added coroutines to V8, and gave it a C#-like syntax. OKCupid gave us TameJS. Both of these solutions have their problems which are deal breakers for many.

There are also, less complete, solutions like Step. But library solutions, like Step, only give you access to a subset of functionality you would get from the sequential code you really want to write. To do that you need a full CPS transformation (which is what TameJS gives you, at a cost of debugging). This is actually how the syntax extensions for Ocaml/Lwt work. The previous NodeJS code might look like this in Ocaml/Lwt (the relevant part is that lwt causes a CPS transformation to turn the code into the appropriate callback-based code):

let handle_price req res =
  lwt conn = DB.open_connection "host" 12345 in
  lwt result = DB.query conn (SQL.sprintf "select * from products where id=?" (req#param "product")) in
  DB.close conn;
  res#send results.[0]

App.get "/price" handle_price

This is one reason for Gevent/Eventlet's popularity in Python. Gevent uses coroutines to give you asynchronous code that looks sequential. The trick is, underneath the hood, some function calls actually result in all of the state for your current function call being saved, another one switched to, executed, rinse, repeat. Gevent has a cooperative scheduler that tries to intelligently decide which function to switch to.

Say you want to write the earlier NodeJS code in sequential Python, you might get:

def handlePrice(req, res):
    conn = db.openConnection('host', 12345)
    result = conn.query('select * from products where id=?', [req.param('product')])
    conn.close()
    res.send(results[0])

app.get('/price', handlePrice)

How would this look in Gevent? Exactly the same. The openConnection and query functions have an I/O call which actually jumps back to the Gevent scheduler so it can do something else while the I/O happens.

But Gevent is not without its cost when it comes to reasoning about code. Consider this:

def foo(data):
    print data.bar
    do_something()
    print data.bar

Looking at this code, will the same value be printed twice? The answer is: no idea. Even though do_something does not take data as input, it could do something that causes Gevent to context switch to another function, another function which also has access to data and modifies it. There is no way to tell, simply by looking at the code, if it will context switch or not.

Debugging

The previous Gevent code is printing out two different values for data.bar and you don't want this, how do you fix it? The first thing you might try, from your serial programming days, is a debugger. But that might not work very well. Why? You're in concurrent-land now, multiple things are happening at once! That means timing is important. If you set a break point somewhere, you've disrupted the time things happen and your program could take a completely different path, not the one you want to debug.

If you're smart and you control access to data.bar through function calls, you can do some printf debugging. Perhaps print out a stacktrace when one modifies it. But let's say, even those prints are causing the timing of your program to change, so now data.bar is coming out as the same value at each print. What do you do?!

The point is, debugging concurrent code can be very hard. Event-loop code adds another problem to debugging: your code doesn't have a linear path. If you could visualize sequential code, it would be a line. You start at point A, you do the things in order to get to point B, at any point if you have an error your callstack represents the path you took to get there. Event-loop code always needs to hit the event-loop for a blocking call though. The callstack you see is always limited to the path from the last event you got. A callstack in the code handling a database query may not contain the how you got there. If that query is part of a piece of fairly generic code you don't have many leads to go on to track it down.

Who got it right then?

Three languages come to mind: Erlang, Oz, Haskell. There are more out there but I'm not omnipotent. In my opinion, these languages are capable of the three properties I previously mentioned. Right now you are probably rolling your eyes and saying "I should have known, one of THOSE guys". But my argument is conservative: based on the properties that I believe are important for concurrent solution to be good, these languages excel (or are capable of it) at them. Real world problems contain more than just concurrency issues though, so this does not mean you're wrong to use a language that doesn't meet my criteria, but it does mean you are sacrificing something. Perhaps that sacrifice is acceptable. But don't fool yourself into thinking your language is not terrible at concurrency, because it probably is.

C Gotcha Of The Day: Pointers aren't integers

2011-07-31T14:35:00.003-04:00

The C standard is clear that pointers are not required to be convertible to or from an integer.

Section 6.3.2.3.5-6 in the C99 draft

An integer may be converted to any pointer type. The result is implementation-defined, might not be properly aligned, and might not point to an entity of the referenced type.)

Any pointer type may be converted to an integer type; the result is implementation-deﬁned. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

Basically, you can't depend on the following code doing anything useful:

int *p = (int*)0xff;

The C standard does not define a machine with a flat memory model. Old Intel systems are an example of a non flat memory model, they had a segmented memory model where a pointer needed a segment and offset.

Conversion to a string is also implementation defined in C, from 7.19.6.1 fprintf, the section on the %p format specifier:

The argument shall be a pointer to void. The value of the pointer is converted to a sequence of printable characters, in an implementation-defined manner.

And from fscanf:

Matches an implementation-defined set of sequences, which should be the same as the set of sequences that may be produced by the %p conversion of the fprintf function. The corresponding argument shall be a pointer to a pointer to void. The interpretation of the input item is implementation-defined. If the input item is a value converted earlier during the same program execution, the pointer that results shall compare equal to that value; otherwise the behavior of the %p conversion is undefined.

It does appear that even though the result of %p is implementation defined, it is guaranteed that you can fprintf and fscanf pointers and get back the same result inside the same program execution.

C Gotcha Of The Day: ptrdiff_t

2011-07-29T13:26:00.000-04:00

Excerpt from C99 Draft (AFAIK this has not changed):

The size of the result is implementation-deﬁned, and its type (a signed integer type) is ptrdiff_t deﬁned in the <stddef.h> header. If the result is not representable in an object of that type, the behavior is undefined. In other words, if the expressions P and Q point to, respectively, the i-th and j-th elements of an array object, the expression (P) - (Q) has the value i−j provided the value fits in an object of type ptrdiff_t.

That means, while the type size_t is capable of expressing the size of any object, you cannot guarantee that the subtraction of two pointers inside your object will result in defined behavior. That is because ptrdiff_t is signed (so it can give you the direction of the difference) and size_t is unsigned. You can use the macros PTRDIFF_MAX and SIZE_MAX to determine if your subtraction is safe though.

Sometimes I forget the full degree in which Python's "scoping" is broken.

2011-05-03T13:31:00.000-04:00

>>> [s for s in [1, 2,3]]
[1, 2, 3]
>>> s
3

[JC] Efficient Parallel Programming in Poly/ML and Isabelle/ML

2011-04-27T22:53:00.003-04:00

Authors: David C.J. Matthews and Makarius Wenzel
URL: http://www4.in.tum.de/~wenzelm/papers/parallel-ml.pdf
Year: 2010

I had not heard of Poly/ML or Isabelle/ML prior to reading this paper. I tried to do a bit of background research to get an idea for it, but I may have gotten some details wrong. If I made any mistakes below I will be happy to correct them.

The Paper

Poly/ML is a full implementation of Standard ML originally developed by David Matthews. Larry Paulson was an early adopter and implemented the Isabelle theorem prover. Isabelle/ML is the implementation of Isabelle on Poly/ML. Poly/ML was originally implemented with a single threaded run time system (RTS). With the ubiquity of multicore machines the RTS was modified to support threading, the garbage collector was modified and threading APIs were introduced. Finally, the modifications were tested in Isabelle. This paper is relevant to the frequent debates that occur on the Ocaml mailing list. Ocaml currently has no support for parallelism and this deficiency is frequently brought up as a negative.

The primary motivation for adding parallelism to Poly/ML is Isabelle/ML, where proofs can take hours to days to execute. Poly/ML and Isabelle/ML have been developed for many years which restricted the modifications to not break existing code and not grossly negatively effect single threaded applications. The authors decided to start with a fairly low level, pthreads-like, implementation and build layers on top of it, each more abstract than the previous. The majority of the changes took place in the RTS, Poly/ML base library, and Isabelle/ML library. In the end, no user's Isabelle code need be modified to take advantage of the performance benefits.

As stated before, Poly/ML is an implementation of Standard ML. The original definition of SML included an asynchronous exception called Interrupt. This was removed from the 1997 definition however Poly/ML has kept in for interrupting threads. This exception can be triggered by one thread in another thread to stop a computation. Interrupt is a useful mechanism to get a threads attention.

The RTS is written in C++, provides memory management, access to the underlying OS and required the most modifications. The RTS provides a one-to-one relationship between OS threads and ML threads. The OS is in charge of scheduling threads on cores. Synchronization primitives are not implement as a direct call to the underlying OS's, however. Each thread is given a single condition variable which the RTS can signal for the unlocking of a mutex, or signaling of a condition variable, in the ML code. A single threaded stop-the-world GC is used. While it is known that this is a likely bottleneck as the number of cores increase, it is shown that it not an issue up to 8 cores.

The base libraries were hardly altered, beyond adding the necessary threading APIs. The low-level APIs provided are:

Threads:

fork: (unit -> unit) * attribute list -> thread
interrupt: thread -> unit
setAttributes: attribute list -> unit
getAttributes: unit -> attribute list

Mutexes:

mutex: unit -> mutex
lock: mutex -> unit
unlock: mutex -> unit

Condition Variables:

condVar: unit -> condVar
wait: condVar * mutex -> unit
signal: condVar -> unit
boradcast: condVar -> unit

The only one that might need explaining is interrupt. This raises the asynchronous exception Interrupt in the provided thread. A few other locations of the standard library were modified, such as the I/O module which had locks added to each operation. The authors do point out:

... this overhead can be almost entirely avoided by using the functional IO layer of the ML library in which reading a character returns the character and a new stream since in this case a lock is only needed when the buffer is empty and needs to be refilled.

Abstractions over these were added to the Isabelle/ML library. The first is a synchronized variable with the following definition:

type 'a var
val var: 'a -> 'a var
val value: 'a var -> 'a
val guarded_access: 'a var -> ('a -> ('b * 'a) option) -> 'b

guarded_access takes a synchronized variable, and a function that takes the value of the synchronized variable as input and returns an option containing a tuple of the new value to put in the synchronized variable and the value to return. The combinator combines the idea of a mutex and condition variable. In the code guarded_access v f, f will be applied to the value stored in v. If f returns NONE then guarded_access waits for the next signal on that variable and applies f again. If f returns Some (a, b), then the value b will be put back into the synchronized variable and the value a will be returned.

This simple construct can be used to implement variables that can be safely shared, and updated, between threads. mvar's, variables where a thread waits for a value if none is there when getting and waits for the value to be removed when putting, can be easily implemented with this:

type 'a mvar = 'a option var
fun mvar () = var NONE
fun take v = guarded_access v 
               (fn NONE => NONE | Some x => Some (x, NONE))
fun put v x = guarded_access v 
               (fn SOME _ => NONE | NONE => (SOME ((), SOME x)))

The other abstraction presented is futures, which are a way to represent the result of a computation before it is completed. The futures interfaces looks a lot like a typical lazy evaluation interface, except the value will always be computed unless it is canceled. Futures are an attractive way to represent parallel computations. Consider this contrived example:

val x = future some_expensive_computation
val y = future some_other_expensive_computation
val z = join x + join y

We create two futures, x and y which evaluate to integers. Then we create z which will be the sum of x and y. Assuming you have enough cores, x and y will be computed in parallel. The join function waits for the future and evaluates to the value of the future. Here is the interface for futures in Isabelle/ML:

type 'a future
val future: (unit -> 'a) -> 'a future
val join: 'a future -> 'a
val cancel: 'a future -> unit

If a future is cancelled or produces an exception, that exception will be reraised when join is called. Isabelle/ML also contains a few future combinators, such as future groups which create a hierarchy of execution that kills all futures in the hierarchy if one fails.

The implied implementation of futures is a thread pool where the future function queues work for the thread pool. A scheduler will then decide which futures get work or are cancelled.

That is the end of additions to Poly/ML and Isabelle/ML. How well does it work? While subjective, performance improvements were more than acceptable with tests plateauing at around 8 cores. The tests were performed using large Isabelle/Isar proofs available in the standard distribution. Isabelle/Isar is non computational language and has a modular nature that makes it possible to exploit implicit parallelism. Four Isabelle applications were used for testing, all of which are considered reasonably large. The results showed a maximum of 3.2x speedup for 4 cores and 6.5x on 8 cores. Past 8 cores the performance increase plateaus. Two bottlenecks were investigated further. The first is garbage collection. As stated earlier Poly/ML has a single threaded stop-the-world GC. At 16 cores GC becomes about 15-30% of the run time. The second bottleneck is insufficient parallelization. Of the applications chosen, there was an insufficient amount of work to keep all of the cores busy.

While there are many implementations for functional languages out there, Poly/ML remains one of the few that supports true multicore threads in a stable state and used in realistic applications. The modifications to Poly/ML took about 1 person year of work. Future work includes parallelizing the garbage collector.

The Discussion

Every few months the question of if Ocaml will get multicore support comes up. The answer is generally "no" (although it's coming closer to "yes" thanks to OcamlPro) with the reason being it is too hard to provide a proper implementation, garbage collection being given as the primary reason. I think that the results of Poly/ML show that even a single threaded stop-the-world GC can provide a sufficient performance advantage while also providing more powerful abstractions for parallelism. A parallel garbage collector can be worked in later, if needed, without affecting existing code.

I liked that the authors chose to add low-level threading support and build layers on top. It gives developers options if the provided abstractions are not meeting their needs. Futures appear to be a very appealing abstraction but I worry somewhat that they are insufficient. In order to maximize parallelism, should every function consume and produce futures? That seems a bit overkill and the overhead would likely be enormous. But it is easy to think of situations where either decision hurts you, either because of the overhead or because you aren't utilizing all of the cores effectively. I do like futures as an option though and hope to see them if/when Ocaml gets multicore support.

This is my first exposure to Poly/ML and the authors have done excellent work. I hope it provides motivation for bringing Ocaml into the multicore world.

A little gotcha in overloading comparison methods in Python

2011-04-14T10:50:00.011-04:00

Python supports chaining relational operators so you can express 2 < 3 < 4 and get true. It looks like this is implemented as a little compiler trick that actually creates the expression "2 < 3 and 3 < 4". This can be confusing if you try to do something insane like implement Haskell's (>>=) in Python using (>=). Something like:

Just(10) >= (lambda x : Just(x + 1)) >= (lambda x : Just(x / 2))

Will actually become:

Just(10) >= (lambda x : Just(x + 1)) and \
    (lambda x : Just(x + 1)) >= (lambda x : Just(x / 2))

This only seems to only apply to the relational operators, choosing to implement (>>=) with (>>) in Python seems to work fine. Here is a Gist showing the problem from @apgwoz: https://gist.github.com/916132.

Unicorn is Unix, What?

2011-04-12T10:51:00.000-04:00

I'm late to this blog post but @apgwoz just sent it to me. I found it pretty silly. Apparently we should like Unicorn because it uses a lot of Unix system calls...

There’s another problem with Unix programming in Ruby that I’ll just touch on briefly: Java people and Windows people. They’re going to tell you that fork(2) is bad because they don’t have it on their platform, or it sucks on their platform, or whatever, but it’s cool, you know, because they have native threads, and threads are like, way better anyways.

Fuck that.

Don’t ever let anyone tell you that fork(2) is bad. Thirty years from now, there will still be a fork(2) and a pipe(2) and a exec(2) and smart people will still be using them to solve hard problems reliably and predictably, just like they were thirty years ago.

False dichotomies are the best form of logic.

Summary of CUFP 2010

2010-10-05T22:34:00.009-04:00

This year was my first attending CUFP and I had a great time. I was pleasantly surprised at how strong of a showing the OCaml community had. I knew Jane Street would be there but I ran into several other people working in OCaml. The star of the show was definitely F# in my opinion. The weakest part of the conference was the lack of outlets. My laptop battery ran out by the second session of the first day and it was really quite difficult to find an outlet to charge it.

Day 1

The first day was broken into two session, each in a tutorial style. For the first session I was in the Building Robust Servers Using Erlang presented by Martin Logan from Orbitz. This stumbled a bit at the beginning, I think Martin was hoping people would be more familiar with Erlang as a language so he could delve into how to build a robust server. It picked up in the end though and I think he successfully drove his message home. The people I talked to after the session expected it to be a basic description on how to write Erlang but were impressed by the power of OTP, especially the supervisor model. A few people remarked that Erlang seemed great for anything that needed to be long running, so I think Martin was successful.

I jumped between all of the presentations in the second session.

F# - This was interesting, I hadn't seen F# much before. The presenter was teaching it through an ant simulation and had a contest with prizes.

Camlp4 and Template Haskell - I was a bit let down by what I saw of this one. It didn't seem like the presenters really gave a good introduction to templating languages. They presented a problem and let everyone work on it and would go around answering any questions. I wish my laptop battery was working so I could have taken a shot at playing with Camlp4. To their credit they were very helpful when asked but the initial presentation seemed lacking to me. Perhaps it was just too far over my head at this time.

Scala and Lift - This was the presentation that I had the least interest in but I think was the most well done. David's presentation was interactive and had no slides. He simply wrote code with you and explained what it did and I think that worked well. Everyone I talked to after seemed impressed by what Lift was capable of accomplishing so easily.

Day 2

Day 2 was all talks done in serial. I enjoyed most of the talks quite a bit. Yaron Minksy from Jane Street started out by saying something I think was important and easy to forget if you are heavily in the FP community. Despite the clear progress FP seems to be making (F# in Visual Studio, Real World Haskell, FP's in several big companies), we really aren't growing like we'd like to think. For most people management either says no to a functional language or it has to be snuck in through the back door. That is why they chose the keynote to be about F#. Microsoft including it in Visual Studio is a big leap and probably the biggest news in terms of FP going mainstream. But is it enough? We'll find out in the coming years.

F# - This was the keynote presented by Luke Hoban from Microsoft and he painted a really great picture of F#. His talk spanned how they introduce F# to non-functional programmers, a demo of F#, and some experiences in productizing it. The integration with Visual Studio was topnotch. Luke showed off how easy it is to create a GUI, handle events, and run asynchronous code. It almost made me wish I was running Windows, it looked so nice. The power of F#, to me, was making GUIs. The language looked like it had to be weakened a bit in order to successfully exist in the .Net ecosystem but if I ever find myself working on Windows I will gladly use F#. How much will F# be adopted by mainstream programmers? Who knows, I'm hoping quite a bit though.

Scaling Scala at Twitter - I knew Twitter used Scala but I did not realize they were such a large Scala shop. Scala was another language that seemed to have good representation at CUFP. I am still not sold on it but people seem to be doing great things with it. This talk was mostly about experiences in building the geolocation in Twitter. It was impressive that geolocation was built very quickly by two engineers who had no Scala or Java experience. There were two takeaways from this talk. The first is that the data center is the new computer. When you are designing a distributed application you really need to think differently about it than you would a non distributed application. This should not be a surprise if you really think about it but the emphasis seemed to be that in many cases people don't realize there is a difference. The second was that we should be honest about GC and realize it is a leaky abstraction. It would be nice if the application could get information back from the GC. The application really knows best how to handle working under heavy load and it would be nice if it could query the GC to figure out what kind of load it is under. I am not quite sure how much I buy the second one, couldn't the application monitor itself based on some metric relevant to its operations and modify its behavior based on that?

Cryptol, a DSL for Cryptographic Algorithms - This was from the people at Galois. I don't know much about Galois other than dons works there and they do Haskell, but it looks like they get nice government contracts too. I had no idea how complex the world of cryptology is. I knew the algorithms were sophisticated but not the rest of it. Cryptol seems powerful but much of the talk was over my head.

Naïveté vs. Experience - or, How We Thought We Could Use Scala and Clojure, and How We Actually Did - This talk was by Michael Fogus and my favorite. MIchael was entertaining and insightful. Most of this talk was about Scala and it included why they moved to Scala from Java, what they expected to use in Scala, what they actually did use in Scala, and the problems with Scala. Michael talked a lot about how he convinced his team to move to Scala as well. The experiences were positive but it did take a lot of convincing. The slides to his talk can be found here.

Reactive Extensions (Rx): Curing Your Asynchronous Programming Blues - Sadly Erik Meijer was unable to present this. I forgot to write down the name of who did present it, Wes something, but he did a great job. Rx looks really cool. I don't know how it scales up in writing an application but Wes was able to throw together some interesting programs very quickly using Rx. Rx is a Reactive Programming library for .Net. In short, it treats events like a collection and you simply iterate over the collection to get events (you can even use LINQ). This makes writing even driven software easier to think about and easier to compose events together. All of his examples were in C# but, because Rx is on .Net, it can be used seamlessly with F# (is the impression that I got).

Eden: An F#/WPF framework for building GUI tools - Eden is built by the Credit Suisse guys so they were unable to actually show Eden, however a subset of its functionality was built for the talk. This showed off more of how pretty GUIs can be created with F#. WPF has really great graphics and looked easy to produce. The portion of Eden shown was using a graph-based layout to calculate output on demand. It is difficult to explain succinctly but this talk showed off GUIs in F# as well as how easy it is to create asynchronous code. F#'s two strongest points seem to be OCaml's two weakest points.

Functional Language Compiler Experiences at Intel - The speaker couldn't talk too much about what they were working on (apparently Intel is making a functional language designed to be used for their processors with many many cores) they they did have some interesting meta-things to say. The first was, even in the FP world, sometimes you just want impurity. The second was, if your FP language is going to allow you to write code imperatively, don't make the syntax terrible. In this case they were writing SML. Finally, it is harder to teach someone FP if they have programming experience than something completely fresh. In their case they were looking at 8 - 12 months before really getting a return from the people they were training.

Riak Core: Building Distributed Applications Without Shared State - This talk was great. Rusty from Basho gave a great look at the important functionality in Riak. Riak is broken into three components: Riak Core - a core library for building robust distributed applications in Erlang, Riak KV - A key-value store using Riak Core, and Riak Search - a full text search engine using Riak Core. The message here, again, was the data center is the computer. I thought Riak's usage of virtual nodes was interesting too, and it seemed obvious in hindsight. Rather than break your distributed application up by physical nodes, create a ton of virtual nodes (more than you'll ever have of physical nodes) and then map those to physical nodes. Take sharding, for example, if you map to physical nodes, once you add a new physical node you'll have to repartition your shards all over again. But if you have a few hundred virtual nodes, adding a physical node just means you have to remap some data to it and point the new virtual nodes at it, but your upstream code doesn't need to change at all. Riak Core helps take care of the virtual node mapping for you as well as how to push data around when you add or take away physical nodes.

Functional Programming at Freebase - I was excited for this talk but sadly let down. This involved rewriting Freebase's query language parser and executor from Python to OCaml. It looked more like mental masturbation to me though. Several times I found myself simply wonder why some choices were made. Many of the choices came off as wishing he were writing Haskell. In the end the speaker got a 10x speed up, which was pointed out to not be very good, and it looked like they had to go through a lot of headaches a long the way.

ACL2: Eating One's Own Dogfood - I was unable to attend this talk.

I enjoyed CUFP quite a bit. It was great to meet the people I read papers from or see on videos about my favorite languages. In terms of being mainstream, Scala seemed to be making the fastest gains, most likely because it is so close to Java, it is an easy switch. In many ways I felt like we are all slowly catching up to Haskell. Many of the technical ideas presented here have already existed in Haskell for quite awhile and I could almost see the frustration on faces of the Haskell people wondering why the rest of us haven't figured out that we should be writing it. I'm hoping that next year the number of companies adopting functional languages continues to grow so we can see more examples of FP in industry at the next CUFP.

Learn You Some Erlang renamed Learn You Some Scala

2010-03-31T15:19:00.004-04:00

I'm excited to announce that Frederic Trottier-Hebert has decided to change the name of Learn You Some Erlang to Learn You Some Scala! This should come as no surprise to most of us. As I demonstrated here Scala and Erlang are really the same language. With the growing popularity of Scala it only makes sense to target the Scala audience (whom we can thank for Erlang's actors). I got the chance to talk to Frederic about the change. When asked what finally prompted the change he said:

<MononcQc> Well yeah, I mean I was there when you first were talking with Virding about the migration of Erlang to the JVM. I'm quoted in that blog post

<MononcQc> that discovery was pretty much a shock to me too, and so it's why I've pondered this and discussed the whole issue over #erlang on the course of the last few weeks

<MononcQc> I picked up one of the many great books about Scala and realized that 'damn, they're the same stuff!'

<MononcQc> Scala being bigger with the JVM being stress tested in production environment (sometimes claiming 9 nines of uptime)

<MononcQc> I decided to do the switch.

<MononcQc> So LYSE becomes LYSS

<MononcQc> It's much more marketable anyway

Some of the changes he has told me are upcoming:

OTP In Scala - How to work with some of the Scala specific OTP libraries to get better soft real time guarantees and performance

Mnesia and Scala - Mnesia is written in Erlang/Scala so moving your databases should Just Work. There should be a pretty big performance increase due to the JIT too (performance improvements have been shown to be about 20%-25.4%) I'm pretty excited about this one.

JVM Performance tuning - When to use -client and when to use -server will play a big part in this chapter. Frederic plans on really covering the nitty details of JVM tuning. Frederic admits that he hasn't done much work with the JVM but given the similarity to beam doesn't forsee that being a problem

Java interop - No more need to use jinterface, Java interop is much easier when running on the JVM!

What does Frederic have to say about possible backlash from the Erlang community about the name change? "I see none. I'm moving for the best". There you have it folks. Frederic said the rebranding is still a work in progress but he hopes to have the entire book moved over to Scala terminology in a few weeks.

functional orbitz

Gen_server in Ocaml

How To Get It?

Experimenting in API Design: Riakc

Siblings

Setting 2i

2i Searching

Conclusion

Riakc In Five Minutes

...And More Detail

Riakc.Robj

Riakc.Conn

Riakc.Opts

Setting Up NixOps On Mac OS X With VirtualBox

Disclaimer

Preamble

1. Install VirtualBox

2. Creating a case-sensitive file system

3. Install Nix

4. Setup Nix

5. Install NixOps

5. Setup Distributed Builds

6. Start An Instance

Troubleshooting

Known Bugs

Further Reading

[ANN] Riakc 0.0.0

A note on GET

Usage

Riakc.Conn

Riakc.Robj

[ANN] Protobuf 0.0.2

Examples

[ANN] ocaml-vclock - 0.0.0

Deconstructing Zed's K&R2 Deconstruction

The Argument

Proposed Solution

What's Wrong With This

K&R2

Experiences using Result.t vs Exceptions in Ocaml

The Good

Expected Result

Not Cumbersome

Refactoring Easier

Works No Matter The Concurrent Framework

The Bad

Prototyping Easier With Exceptions

Cannot Express All Invariants In Type System

Many Useful Libraries Throw Exceptions

A Few Examples

Building Things

Looping

try/with

Conclusion

Introduction to Result.t vs Exceptions in Ocaml

What's the difference?

Checked exceptions

The Claim

Good Returns

Better Returns

An Example

Two Points

Conclusion

C++11 is unsafe

The Erl Next Door

So Pythonistas, you want to get rid of the GIL...

My Mental Evolution In Making A Language

Why?

Can I stand on the shoulders of giants?

Sometimes it's better to stay silent

Keep It Simple

REPL

Phantom type examples in Ocaml

What are phantom types

Examples

estring - Encoded strings

rstring - Read only string

Other ideas

What is Covariance and Contravariance Anyways?

What is Covariance and Contravariance All About?