Tuesday, July 9, 2013

Experimenting in API Design: Riakc

Disclaimer: Riakc's API is in flux so not all of the code here is guaranteed to work by the time you read this post. However the general principles should hold.

While not perfect, Riakc attempts to provide an API that is very hard to use incorrectly, and hopefully easy to use correctly. The idea being that using Riakc incorrectly will result in a compile-time error. Riakc derives its strength from being written in Ocaml, a language with a very expressive type system. Here are some examples of where I think Riakc is successful.

Siblings

In Riak, when you perform a GET you can get back multiple values associated with the a single key. This is known as siblings. However, a PUT can only associate one value with a key. However, it is convenient to use the same object type for both GET and PUT. In the case of Riakc, that is a Riakc.Robj.t. But, what to do if you create a Robj.t with siblings and try to PUT? In the Ptyhon client you will get a runtime error. Riakc solves this by using phantom types. A Robj.t isn't actually just that, it's a 'a Robj.t. The API requires that 'a to be something specific at different parts of the code. Here is the simplified type for GET:

val get :
  t ->
  b:string ->
  string ->
  ([ `Maybe_siblings ] Robj.t, error) Deferred.Result.t

And here is the simplified type for PUT:

val put :
  t ->
  b:string ->
  ?k:string ->
  [ `No_siblings ] Robj.t ->
  (([ `Maybe_siblings ] Robj.t * key), error) Deferred.Result.t

The important part of the API is that GET returns a [ `Maybe_siblings ] Riak.t and PUT takes a [ `No_siblings ] Riak.t. How does one convert something that might have siblings to something that definitely doesn't? With Riakc.Robj.set_content

val set_content  : Content.t -> 'a t -> [ `No_siblings ] t

set_content takes any kind of Robj.t, and a single Content.t and produces a [ `No_siblings ] Riak.t, because if you set contents to one value obviously you cannot have siblings. Now the type system can ensure that any call to PUT must have a set_content prior to it.

Setting 2i

If you use the LevelDB backend you can use secondary indices, known as 2i, which allow you to find a set of keys based on some mapping. When you create an object you specify the mappings to which it belongs. Two types are supported in Riak: bin and int. And two query types are supported: equal and range. For example, if you encoded the time as an int you could use a range query to find all those keys that occurred within a range of times.

Riak encodes the type of the index in the name. As an example, if you want to allow people to search by a field called "foo" which is a binary secondary index, you would name that index "foo_bin". In the Python Riak client, one sets an index with something like the following code:

obj.add_index('field1_bin', 'val1')
obj.add_index('field2_int', 100000)

In Riakc, the naming convention is hidden from the user. Instead, the the name the field will become is encoded in the value. The Python code looks like the following in Riakc:

let module R = Riakc.Robj in
let index1 =
  R.index_create
    ~k:"field1"
    ~v:(R.Index.String "val1")
in
let index2 =
  R.index_create
    ~k:"field2"
    ~v:(R.Index.Integer 10000)
in
R.set_content
  (R.Content.set_indices [index1; index2] content)
  robj

When the Robj.t is written to the DB, "field1" and "field2" will be transformed into their appropriate names.

Reading from Riak results in the same translation happening. If Riakc cannot determine the type of the value from the field name, for example if Riak gets a new index type, the field name maintains its precise name it got from Riak and the value is a Riakc.Robj.Index.Unknown string.

In this way, we are guaranteed at compile-time that the name of the field will always match its type.

2i Searching

With objects containing 2i entries, it is possible to search by values in those fields. Riak allows for searching fields by their exact value or ranges of values. While it's unclear from the Riak docs, Riakc enforces the two values in a range query are of the same type. Also, like in setting 2i values, the field name is generated from the type of the value. It is more verbose than the Python client but it enforces constraints.

Here is a Python 2i search followed by the equivalent search in Riakc.

results = client.index('mybucket', 'field1_bin', 'val1', 'val5').run()
Riakc.Conn.index_search
  conn
  ~b:"mybucket"
  ~index:"field1"
  (range_string
     ~min:"val1"
     ~max:"val2"
     ~return_terms:false)

Conclusion

It's a bit unfair comparing an Ocaml API to a Python one, but hopefully this has demonstrated that with a reasonable type system one can express safe and powerful APIs without being inconvenient.

No comments:

Post a Comment