Tuesday, July 9, 2013

Experimenting in API Design: Riakc

Disclaimer: Riakc's API is in flux so not all of the code here is guaranteed to work by the time you read this post. However the general principles should hold.

While not perfect, Riakc attempts to provide an API that is very hard to use incorrectly, and hopefully easy to use correctly. The idea being that using Riakc incorrectly will result in a compile-time error. Riakc derives its strength from being written in Ocaml, a language with a very expressive type system. Here are some examples of where I think Riakc is successful.

Siblings

In Riak, when you perform a GET you can get back multiple values associated with the a single key. This is known as siblings. However, a PUT can only associate one value with a key. However, it is convenient to use the same object type for both GET and PUT. In the case of Riakc, that is a Riakc.Robj.t. But, what to do if you create a Robj.t with siblings and try to PUT? In the Ptyhon client you will get a runtime error. Riakc solves this by using phantom types. A Robj.t isn't actually just that, it's a 'a Robj.t. The API requires that 'a to be something specific at different parts of the code. Here is the simplified type for GET:

val get :
  t ->
  b:string ->
  string ->
  ([ `Maybe_siblings ] Robj.t, error) Deferred.Result.t

And here is the simplified type for PUT:

val put :
  t ->
  b:string ->
  ?k:string ->
  [ `No_siblings ] Robj.t ->
  (([ `Maybe_siblings ] Robj.t * key), error) Deferred.Result.t

The important part of the API is that GET returns a [ `Maybe_siblings ] Riak.t and PUT takes a [ `No_siblings ] Riak.t. How does one convert something that might have siblings to something that definitely doesn't? With Riakc.Robj.set_content

val set_content  : Content.t -> 'a t -> [ `No_siblings ] t

set_content takes any kind of Robj.t, and a single Content.t and produces a [ `No_siblings ] Riak.t, because if you set contents to one value obviously you cannot have siblings. Now the type system can ensure that any call to PUT must have a set_content prior to it.

Setting 2i

If you use the LevelDB backend you can use secondary indices, known as 2i, which allow you to find a set of keys based on some mapping. When you create an object you specify the mappings to which it belongs. Two types are supported in Riak: bin and int. And two query types are supported: equal and range. For example, if you encoded the time as an int you could use a range query to find all those keys that occurred within a range of times.

Riak encodes the type of the index in the name. As an example, if you want to allow people to search by a field called "foo" which is a binary secondary index, you would name that index "foo_bin". In the Python Riak client, one sets an index with something like the following code:

obj.add_index('field1_bin', 'val1')
obj.add_index('field2_int', 100000)

In Riakc, the naming convention is hidden from the user. Instead, the the name the field will become is encoded in the value. The Python code looks like the following in Riakc:

let module R = Riakc.Robj in
let index1 =
  R.index_create
    ~k:"field1"
    ~v:(R.Index.String "val1")
in
let index2 =
  R.index_create
    ~k:"field2"
    ~v:(R.Index.Integer 10000)
in
R.set_content
  (R.Content.set_indices [index1; index2] content)
  robj

When the Robj.t is written to the DB, "field1" and "field2" will be transformed into their appropriate names.

Reading from Riak results in the same translation happening. If Riakc cannot determine the type of the value from the field name, for example if Riak gets a new index type, the field name maintains its precise name it got from Riak and the value is a Riakc.Robj.Index.Unknown string.

In this way, we are guaranteed at compile-time that the name of the field will always match its type.

2i Searching

With objects containing 2i entries, it is possible to search by values in those fields. Riak allows for searching fields by their exact value or ranges of values. While it's unclear from the Riak docs, Riakc enforces the two values in a range query are of the same type. Also, like in setting 2i values, the field name is generated from the type of the value. It is more verbose than the Python client but it enforces constraints.

Here is a Python 2i search followed by the equivalent search in Riakc.

results = client.index('mybucket', 'field1_bin', 'val1', 'val5').run()
Riakc.Conn.index_search
  conn
  ~b:"mybucket"
  ~index:"field1"
  (range_string
     ~min:"val1"
     ~max:"val2"
     ~return_terms:false)

Conclusion

It's a bit unfair comparing an Ocaml API to a Python one, but hopefully this has demonstrated that with a reasonable type system one can express safe and powerful APIs without being inconvenient.

Thursday, July 4, 2013

Riakc In Five Minutes

This is a simple example using Riakc to PUT a key into a Riak database. It assumes that you already have a Riak database up and running.

First you need to install riakc. Simply do: opam install riakc. As of this writing, the latest version of riakc is 2.0.0 and the code given depends on that version.

Now, the code. The following is a complete CLI tool that will PUT a key and print back the result from Riak. It handles all errors that the library can generate as well as outputting siblings correctly.

(*
 * This example is valid for version 2.0.0, and possibly later
 *)
open Core.Std
open Async.Std

(*
 * Take a string of bytes and convert them to hex string
 * representation
 *)
let hex_of_string =
  String.concat_map ~f:(fun c -> sprintf "%X" (Char.to_int c))

(*
 * An Robj can have multiple values in it, each one with its
 * own content type, encoding, and value.  This just prints
 * the value, which is a string blob
 *)
let print_contents contents =
  List.iter
    ~f:(fun content ->
      let module C = Riakc.Robj.Content in
      printf "VALUE: %s\n" (C.value content))
    contents

let fail s =
  printf "%s\n" s;
  shutdown 1

let exec () =
  let host = Sys.argv.(1) in
  let port = Int.of_string Sys.argv.(2) in
  (*
   * [with_conn] is a little helper function that will
   * establish a connection, run a function on the connection
   * and tear it down when done
   *)
  Riakc.Conn.with_conn
    ~host
    ~port
    (fun c ->
      let module R = Riakc.Robj in
      let content  = R.Content.create "some random data" in
      let robj     = R.create [] |> R.set_content content in
      (*
       * Put takes a bucket, a key, and an optional list of
       * options.  In this case we are setting the
       * [Return_body] option which returns what the key
       * looks like after the put.  It is possible that
       * siblings were created.
       *)
      Riakc.Conn.put
        c
        ~b:"test_bucket"
        ~k:"test_key"
        ~opts:[Riakc.Opts.Put.Return_body]
        robj)

let eval () =
  exec () >>| function
    | Ok (robj, key) -> begin
      (*
       * [put] returns a [Riakc.Robj.t] and a [string
       * option], which is the key if Riak had to generate
       * it
       *)
      let module R = Riakc.Robj in
      (*
       * Extract the vclock, if it exists, and convert it to
       * to something printable
       *)
      let vclock =
 Option.value
   ~default:"<none>"
   (Option.map ~f:hex_of_string (R.vclock robj))
      in
      let key = Option.value ~default:"<none>" key in
      printf "KEY: %s\n" key;
      printf "VCLOCK: %s\n" vclock;
      print_contents (R.contents robj);
      shutdown 0
    end
    (*
     * These are the various errors that can be returned.
     * Many of then come directly from the ProtoBuf layer
     * since there aren't really any more semantics to apply
     * to the data if it matches the PB frame.
     *)
    | Error `Bad_conn           -> fail "Bad_conn"
    | Error `Bad_payload        -> fail "Bad_payload"
    | Error `Incomplete_payload -> fail "Incomplete_payload"
    | Error `Notfound           -> fail "Notfound"
    | Error `Incomplete         -> fail "Incomplete"
    | Error `Overflow           -> fail "Overflow"
    | Error `Unknown_type       -> fail "Unknown_type"
    | Error `Wrong_type         -> fail "Wrong_type"

let () =
  ignore (eval ());
  never_returns (Scheduler.go ())

Now compile it:

ocamlfind ocamlopt -thread -I +camlp4 -package riakc -c demo.ml
ocamlfind ocamlopt -package riakc -thread -linkpkg \
-o demo.native demo.cmx

Finally, you can run it: ./demo.native hostname port

...And More Detail

The API for Riakc is broken up into two modules: Riakc.Robj and Riakc.Conn with Riakc.Opts being a third helper module. Below is in reference to version 2.0.0 of Riakc.

Riakc.Robj

Riakc.Robj defines a representation of an object stored in Riak. Robj is completely pure code. The API can be found here.

Riakc.Conn

This is the I/O layer. All interaction with the actual database happens through this module. Riakc.Conn is somewhat clever in that it has a compile-time requirement that you have called Riakc.Robj.set_content on any value you want to PUT. This guarantees you have resolved all siblings, somehow. Its API can be found here.

Riakc.Opts

Finally, various options are defined in Riakc.Opts. These are options that GET and PUT take. Not all of them are actually supported but support is planned. The API can be viewed here.

Hopefully Riakc has a fairly straight forward API. While the example code might be longer than other clients, it is complete and correct (I hope).