tag:blogger.com,1999:blog-43485197413583441232024-03-08T14:01:26.418-05:00functional orbitzUnknownnoreply@blogger.comBlogger65125tag:blogger.com,1999:blog-4348519741358344123.post-79185542698724931212013-12-23T18:17:00.000-05:002013-12-23T18:23:43.078-05:00Gen_server in Ocaml<p>
<i>Note, this post is written against the 2.0.1 version of <code>gen_server</code></i>
</p>
<p>
Erlang comes with a rich set of small concurrency primitives to make handling and manipulating state easier. The most generic of the frameworks is the <code>gen_server</code> which is also the most commonly used. A <code>gen_server</code> provides a way to control state over multiple requests. It serializes operations and handles both synchronous and asynchronous communication with clients. The strength of a <code>gen_server</code> is the ability to create multiple, lightweight, servers inside an application where each operation inside of it runs in serial but individually the <code>gen_server</code>s run concurrently.
</p>
<p>
While it is not possible to provide all of the Erlang semantics in Ocaml, we can create something roughly analogous. We can also get some properties that Erlang can not give us. In particular, the implementation of <code>gen_server</code> provided here:
</p>
<a name='more'></a>
<p>
<ul>
<li>Does not have the concept of a process or a process id. A <code>gen_server</code> is an abstract type that is parameterized by a message type.</li>
<li>Uses queues to communicate messages between clients and servers.</li>
<li><code>gen_server</code>s are typesafe, only messages that they can handle can be sent to them.</li>
<li>You can only communicate with <code>gen_server</code>s in your own process, there is no concept of location ignorance.</li>
<li>Only provides an asynchronous communication function, called <code>send</code> that has pushback. That means a <code>send</code> will be evaluated when the <code>gen_server</code> accepts the message but will not wait for the <code>gen_server</code> to complete the processing of the message.</li>
<li>Has the concept of process linking, however it is not preemptive. When a <code>gen_server</code> stops, for any reason, any calls to <code>send</code> will return an error stating the <code>gen_server</code> has closed itself. This will not force the termination of any other <code>gen_server</code>s in Ocaml, but the termination can at least be detected.</li>
<li>Any thrown exceptions are handled by the <code>gen_server</code> framework and result in the <code>gen_server</code> being gracefully terminated.</li>
</ul>
</p>
<p>
Relative to Erlang the Ocaml version isn't very impressive, however it's still a useful technique for encapsulating state in a concurrent environment.
</p>
<p>
This implementation of <code>gen_server</code> is on top of Jane St's Async. What does it look like? The primary interface looks like this:
</p>
<pre><code><b><font color="#0000FF">val</font></b> start <font color="#990000">:</font>
'i <font color="#990000">-></font>
<font color="#990000">(</font>'i<font color="#990000">,</font> 's<font color="#990000">,</font> 'm<font color="#990000">,</font> 'ie<font color="#990000">,</font> 'he<font color="#990000">)</font> <b><font color="#000080">Server</font></b><font color="#990000">.</font>t <font color="#990000">-></font>
<font color="#990000">(</font>'m t<font color="#990000">,</font> <font color="#990000">[></font> 'ie init_ret <font color="#990000">])</font> <b><font color="#000080">Deferred</font></b><font color="#990000">.</font><b><font color="#000080">Result</font></b><font color="#990000">.</font>t
<b><font color="#0000FF">val</font></b> stop <font color="#990000">:</font>
'm t <font color="#990000">-></font>
<font color="#990000">(</font><font color="#009900">unit</font><font color="#990000">,</font> <font color="#990000">[></font> `<font color="#009900">Closed</font> <font color="#990000">])</font> <b><font color="#000080">Deferred</font></b><font color="#990000">.</font><b><font color="#000080">Result</font></b><font color="#990000">.</font>t
<b><font color="#0000FF">val</font></b> send <font color="#990000">:</font>
'm t <font color="#990000">-></font>
'm <font color="#990000">-></font>
<font color="#990000">(</font>'m<font color="#990000">,</font> <font color="#990000">[></font> send_ret <font color="#990000">])</font> <b><font color="#000080">Deferred</font></b><font color="#990000">.</font><b><font color="#000080">Result</font></b><font color="#990000">.</font>t
</code></pre>
<p>
The interface is only three functions: <code>start</code>, <code>stop</code> and <code>send</code>.
</p>
<p>
<ul>
<li>The <code>start</code> function is a bit harry looking but don't be put off by the server type parameterized on five type variables. The <code>start</code> function takes two parameters, the first is the initial parameters to pass to the <code>gen_server</code>, the second is the callbacks of the <code>gen_server</code>.</li>
<li><code>stop</code> takes a <code>gen_server</code> and returns <code>Ok ()</code> on success and <code>Error `Closed</code> if the <code>gen_server</code> is not running.</li>
<li><code>send</code> takes a <code>gen_server</code> and a message. The message must be the same type the <code>gen_server</code> accepts. It returns <code>Ok msg</code> on success and <code>Error `Closed</code> if the <code>gen_server</code> is not running.</li>
</ul>
</p>
<p>
The most confusion part is probably the <code>('i, 's, 'm, 'ie, 'he) Server.t</code>. This is the type that the implementer of the <code>gen_server</code> writes. It is three callbacks: <code>init</code>, <code>handle_call</code> and <code>terminate</code>. Let's breakdown the type variables:
</p>
<p>
<ul>
<li>'i - This is the type of the variable that you pass to <code>start</code> and will be given to the <code>init</code> callback.</li>
<li>'s - This is the type of the state that the <code>gen_server</code> will encapsulate. This will be passed to <code>handle_call</code> and <code>terminate</code>. The <code>handle_call</code> callback will manipulate the state and return a new one.</li>
<li>'m - This is the message type that the <code>gen_server</code> will accept.</li>
<li>'ie - This is the type of error that the <code>init</code> callback can return.</li>
<li>'he - This is the type of error that the <code>handle_call</code> callback can return.</li>
</ul>
</p>
<p>
While the server type looks complicated, as you can see each variable corresponds to all of the type information needed to understand a <code>gen_server</code>. So what does a server look like? While the types are big it's actually not too bad. Below is an example of a call to <code>start</code>. The full source code can be found <a href="https://github.com/orbitz/gen_server/blob/master/examples/simple.ml">here</a>.
</p>
<pre><code><i><font color="#9A1900">(* Package the callbacks *)</font></i>
<b><font color="#0000FF">let</font></b> callbacks <font color="#990000">=</font>
<font color="#FF0000">{</font> <b><font color="#000080">Gen_server</font></b><font color="#990000">.</font><b><font color="#000080">Server</font></b><font color="#990000">.</font>init<font color="#990000">;</font> handle_call<font color="#990000">;</font> terminate <font color="#FF0000">}</font>
<b><font color="#0000FF">let</font></b> start <font color="#990000">()</font> <font color="#990000">=</font>
<b><font color="#000080">Gen_server</font></b><font color="#990000">.</font>start <font color="#990000">()</font> callbacks
</code></pre>
<p>
And what do the callbacks look like? Below is a simplified version of what a set of callbacks could look like, with comments.
</p>
<pre><code><b><font color="#0000FF">module</font></b> <font color="#009900">Resp</font> <font color="#990000">=</font> <b><font color="#000080">Gen_server</font></b><font color="#990000">.</font><font color="#009900">Response</font>
<b><font color="#0000FF">module</font></b> <font color="#009900">Gs</font> <font color="#990000">=</font> <font color="#009900">Gen_server</font>
<i><font color="#9A1900">(* Callbacks *)</font></i>
<b><font color="#0000FF">let</font></b> init self init <font color="#990000">=</font>
<b><font color="#000080">Deferred</font></b><font color="#990000">.</font>return <font color="#990000">(</font><font color="#009900">Ok</font> <font color="#990000">())</font>
<b><font color="#0000FF">let</font></b> handle_call self state <font color="#990000">=</font> <b><font color="#0000FF">function</font></b>
<font color="#990000">|</font> <b><font color="#000080">Msg</font></b><font color="#990000">.</font><font color="#009900">Msg1</font> <font color="#990000">-></font>
<i><font color="#9A1900">(* Success *)</font></i>
<b><font color="#000080">Deferred</font></b><font color="#990000">.</font>return <font color="#990000">(</font><b><font color="#000080">Resp</font></b><font color="#990000">.</font><font color="#009900">Ok</font> state<font color="#990000">)</font>
<font color="#990000">|</font> <b><font color="#000080">Msg</font></b><font color="#990000">.</font><font color="#009900">Msg2</font> <font color="#990000">-></font>
<i><font color="#9A1900">(* Error *)</font></i>
<b><font color="#000080">Deferred</font></b><font color="#990000">.</font>return <font color="#990000">(</font><b><font color="#000080">Resp</font></b><font color="#990000">.</font><font color="#009900">Error</font> <font color="#990000">(</font>reason<font color="#990000">,</font> state<font color="#990000">))</font>
<font color="#990000">|</font> <b><font color="#000080">Msg</font></b><font color="#990000">.</font><font color="#009900">Msg3</font> <font color="#990000">-></font>
<i><font color="#9A1900">(* Exceptions can be thrown too *)</font></i>
failwith <font color="#FF0000">"blowin' up"</font>
<i><font color="#9A1900">(* Exceptions thrown from terminate are silently ignored *)</font></i>
<b><font color="#0000FF">let</font></b> terminate reason state <font color="#990000">=</font>
<b><font color="#0000FF">match</font></b> reason <b><font color="#0000FF">with</font></b>
<font color="#990000">|</font> <b><font color="#000080">Gs</font></b><font color="#990000">.</font><b><font color="#000080">Server</font></b><font color="#990000">.</font><font color="#009900">Normal</font> <font color="#990000">-></font>
<i><font color="#9A1900">(* Things exited normally *)</font></i>
<b><font color="#000080">Deferred</font></b><font color="#990000">.</font><font color="#009900">unit</font>
<font color="#990000">|</font> <b><font color="#000080">Gs</font></b><font color="#990000">.</font><b><font color="#000080">Server</font></b><font color="#990000">.</font><font color="#009900">Exn</font> <font color="#009900">exn</font> <font color="#990000">-></font>
<i><font color="#9A1900">(* An exception was thrown *)</font></i>
<b><font color="#000080">Deferred</font></b><font color="#990000">.</font><font color="#009900">unit</font>
<font color="#990000">|</font> <b><font color="#000080">Gs</font></b><font color="#990000">.</font><b><font color="#000080">Server</font></b><font color="#990000">.</font><font color="#009900">Error</font> err <font color="#990000">-></font>
<i><font color="#9A1900">(* User returned an error *)</font></i>
<b><font color="#000080">Deferred</font></b><font color="#990000">.</font><font color="#009900">unit</font>
</code></pre>
<p>
There isn't much more to it than that.
</p>
<p>
A functor implementation is also provided. I prefer the non-functor version, I think it's a bit less verbose and easier to work with, but some people like them.
</p>
<h1>How To Get It?</h1>
<p>
You can install <code>gen_server</code> through <code>opam</code>, simply: <code>opam install gen_server</code>
</p>
<p>
The source can be found <a href="https://github.com/orbitz/gen_server">here</a>. Only the tags should be trusted as working.
</p>
<p>
There are a few examples <a href="https://github.com/orbitz/gen_server/tree/master/examples">here</a>.
</p>
<p>
Enjoy.
</p>
Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-4348519741358344123.post-35059152533338677442013-07-09T14:37:00.000-04:002013-07-09T14:38:48.608-04:00Experimenting in API Design: Riakc<p>
<i>
Disclaimer: Riakc's API is in flux so not all of the code here is guaranteed to work by the time you read this post. However the general principles should hold.
</i>
</p>
<p>
While not perfect, Riakc attempts to provide an API that is very hard to use incorrectly, and hopefully easy to use correctly. The idea being that using Riakc incorrectly will result in a compile-time error. Riakc derives its strength from being written in Ocaml, a language with a very expressive type system. Here are some examples of where I think Riakc is successful.
</p>
<h1>Siblings</h1>
<p>
In Riak, when you perform a <code>GET</code> you can get back multiple values associated with the a single key. This is known as siblings. However, a <code>PUT</code> can only associate one value with a key. However, it is convenient to use the same object type for both <code>GET</code> and <code>PUT</code>. In the case of Riakc, that is a <code>Riakc.Robj.t</code>. But, what to do if you create a <code>Robj.t</code> with siblings and try to <code>PUT</code>? In the Ptyhon client you will get a runtime error. Riakc solves this by using phantom types. A <code>Robj.t</code> isn't actually just that, it's a <code>'a Robj.t</code>. The API requires that <code>'a</code> to be something specific at different parts of the code. Here is the simplified type for <code>GET</code>:
</p>
<pre><code><b><font color="#0000FF">val</font></b> get <font color="#990000">:</font>
t <font color="#990000">-></font>
b<font color="#990000">:</font><font color="#009900">string</font> <font color="#990000">-></font>
<font color="#009900">string</font> <font color="#990000">-></font>
<font color="#990000">([</font> `<font color="#009900">Maybe_siblings</font> <font color="#990000">]</font> <b><font color="#000080">Robj</font></b><font color="#990000">.</font>t<font color="#990000">,</font> error<font color="#990000">)</font> <b><font color="#000080">Deferred</font></b><font color="#990000">.</font><b><font color="#000080">Result</font></b><font color="#990000">.</font>t
</code></pre>
<p>
And here is the simplified type for <code>PUT</code>:
</p>
<pre><code><b><font color="#0000FF">val</font></b> put <font color="#990000">:</font>
t <font color="#990000">-></font>
b<font color="#990000">:</font><font color="#009900">string</font> <font color="#990000">-></font>
<font color="#990000">?</font>k<font color="#990000">:</font><font color="#009900">string</font> <font color="#990000">-></font>
<font color="#990000">[</font> `<font color="#009900">No_siblings</font> <font color="#990000">]</font> <b><font color="#000080">Robj</font></b><font color="#990000">.</font>t <font color="#990000">-></font>
<font color="#990000">(([</font> `<font color="#009900">Maybe_siblings</font> <font color="#990000">]</font> <b><font color="#000080">Robj</font></b><font color="#990000">.</font>t <font color="#990000">*</font> key<font color="#990000">),</font> error<font color="#990000">)</font> <b><font color="#000080">Deferred</font></b><font color="#990000">.</font><b><font color="#000080">Result</font></b><font color="#990000">.</font>t
</code></pre>
<p>
The important part of the API is that <code>GET</code> returns a <code>[ `Maybe_siblings ] Riak.t</code> and <code>PUT</code> takes a <code>[ `No_siblings ] Riak.t</code>. How does one convert something that might have siblings to something that definitely doesn't? With <code>Riakc.Robj.set_content</code>
</p>
<pre><code><b><font color="#0000FF">val</font></b> set_content <font color="#990000">:</font> <b><font color="#000080">Content</font></b><font color="#990000">.</font>t <font color="#990000">-></font> 'a t <font color="#990000">-></font> <font color="#990000">[</font> `<font color="#009900">No_siblings</font> <font color="#990000">]</font> t
</code></pre>
<p>
<code>set_content</code> takes any kind of <code>Robj.t</code>, and a single <code>Content.t</code> and produces a <code>[ `No_siblings ] Riak.t</code>, because if you set contents to one value obviously you cannot have siblings. Now the type system can ensure that any call to <code>PUT</code> must have a <code>set_content</code> prior to it.
</p>
<h1>Setting 2i</h1>
<p>
If you use the LevelDB backend you can use secondary indices, known as 2i, which allow you to find a set of keys based on some mapping. When you create an object you specify the mappings to which it belongs. Two types are supported in Riak: bin and int. And two query types are supported: equal and range. For example, if you encoded the time as an int you could use a range query to find all those keys that occurred within a range of times.
</p>
<p>
Riak encodes the type of the index in the name. As an example, if you want to allow people to search by a field called "foo" which is a binary secondary index, you would name that index "foo_bin". In the Python Riak client, one sets an index with something like the following code:
</p>
<pre><code>obj<font color="#990000">.</font><b><font color="#000000">add_index</font></b><font color="#990000">(</font><font color="#FF0000">'field1_bin'</font><font color="#990000">,</font> <font color="#FF0000">'val1'</font><font color="#990000">)</font>
obj<font color="#990000">.</font><b><font color="#000000">add_index</font></b><font color="#990000">(</font><font color="#FF0000">'field2_int'</font><font color="#990000">,</font> <font color="#993399">100000</font><font color="#990000">)</font>
</code></pre>
<p>
In Riakc, the naming convention is hidden from the user. Instead, the the name the field will become is encoded in the value. The Python code looks like the following in Riakc:
</p>
<pre><code><b><font color="#0000FF">let</font></b> <b><font color="#0000FF">module</font></b> <font color="#009900">R</font> <font color="#990000">=</font> <b><font color="#000080">Riakc</font></b><font color="#990000">.</font><font color="#009900">Robj</font> <b><font color="#0000FF">in</font></b>
<b><font color="#0000FF">let</font></b> index1 <font color="#990000">=</font>
<b><font color="#000080">R</font></b><font color="#990000">.</font>index_create
<font color="#990000">~</font>k<font color="#990000">:</font><font color="#FF0000">"field1"</font>
<font color="#990000">~</font>v<font color="#990000">:(</font><b><font color="#000080">R</font></b><font color="#990000">.</font><b><font color="#000080">Index</font></b><font color="#990000">.</font><font color="#009900">String</font> <font color="#FF0000">"val1"</font><font color="#990000">)</font>
<b><font color="#0000FF">in</font></b>
<b><font color="#0000FF">let</font></b> index2 <font color="#990000">=</font>
<b><font color="#000080">R</font></b><font color="#990000">.</font>index_create
<font color="#990000">~</font>k<font color="#990000">:</font><font color="#FF0000">"field2"</font>
<font color="#990000">~</font>v<font color="#990000">:(</font><b><font color="#000080">R</font></b><font color="#990000">.</font><b><font color="#000080">Index</font></b><font color="#990000">.</font><font color="#009900">Integer</font> <font color="#993399">10000</font><font color="#990000">)</font>
<b><font color="#0000FF">in</font></b>
<b><font color="#000080">R</font></b><font color="#990000">.</font>set_content
<font color="#990000">(</font><b><font color="#000080">R</font></b><font color="#990000">.</font><b><font color="#000080">Content</font></b><font color="#990000">.</font>set_indices <font color="#990000">[</font>index1<font color="#990000">;</font> index2<font color="#990000">]</font> content<font color="#990000">)</font>
robj
</code></pre>
<p>
When the <code>Robj.t</code> is written to the DB, "field1" and "field2" will be transformed into their appropriate names.
</p>
<p>
Reading from Riak results in the same translation happening. If Riakc cannot determine the type of the value from the field name, for example if Riak gets a new index type, the field name maintains its precise name it got from Riak and the value is a <code>Riakc.Robj.Index.Unknown string</code>.
</p>
<p>
In this way, we are guaranteed at compile-time that the name of the field will always match its type.
</p>
<h1>2i Searching</h1>
<p>
With objects containing 2i entries, it is possible to search by values in those fields. Riak allows for searching fields by their exact value or ranges of values. While it's unclear from the Riak docs, Riakc enforces the two values in a range query are of the same type. Also, like in setting 2i values, the field name is generated from the type of the value. It is more verbose than the Python client but it enforces constraints.
</p>
<p>
Here is a Python 2i search followed by the equivalent search in Riakc.
</p>
<pre><code>results <font color="#990000">=</font> client<font color="#990000">.</font><b><font color="#000000">index</font></b><font color="#990000">(</font><font color="#FF0000">'mybucket'</font><font color="#990000">,</font> <font color="#FF0000">'field1_bin'</font><font color="#990000">,</font> <font color="#FF0000">'val1'</font><font color="#990000">,</font> <font color="#FF0000">'val5'</font><font color="#990000">).</font><b><font color="#000000">run</font></b><font color="#990000">()</font>
</code></pre>
<pre><code><b><font color="#000080">Riakc</font></b><font color="#990000">.</font><b><font color="#000080">Conn</font></b><font color="#990000">.</font>index_search
conn
<font color="#990000">~</font>b<font color="#990000">:</font><font color="#FF0000">"mybucket"</font>
<font color="#990000">~</font>index<font color="#990000">:</font><font color="#FF0000">"field1"</font>
<font color="#990000">(</font>range_string
<font color="#990000">~</font>min<font color="#990000">:</font><font color="#FF0000">"val1"</font>
<font color="#990000">~</font>max<font color="#990000">:</font><font color="#FF0000">"val2"</font>
<font color="#990000">~</font>return_terms<font color="#990000">:</font><b><font color="#0000FF">false</font></b><font color="#990000">)</font>
</code></pre>
<h1>Conclusion</h1>
<p>
It's a bit unfair comparing an Ocaml API to a Python one, but hopefully this has demonstrated that with a reasonable type system one can express safe and powerful APIs without being inconvenient.
</p>
Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4348519741358344123.post-77736687346727538642013-07-04T13:01:00.000-04:002013-07-04T13:18:35.954-04:00Riakc In Five Minutes<p>
This is a simple example using Riakc to PUT a key into a Riak database. It assumes that you already have a Riak database up and running.
</p>
<p>
First you need to install riakc. Simply do: <code>opam install riakc</code>. As of this writing, the latest version of riakc is 2.0.0 and the code given depends on that version.
</p>
<p>
Now, the code. The following is a complete CLI tool that will PUT a key and print back the result from Riak. It handles all errors that the library can generate as well as outputting siblings correctly.
</p>
<pre><code><i><font color="#9A1900">(*</font></i>
<i><font color="#9A1900"> * This example is valid for version 2.0.0, and possibly later</font></i>
<i><font color="#9A1900"> *)</font></i>
<b><font color="#000080">open</font></b> <b><font color="#000080">Core</font></b><font color="#990000">.</font><font color="#009900">Std</font>
<b><font color="#000080">open</font></b> <b><font color="#000080">Async</font></b><font color="#990000">.</font><font color="#009900">Std</font>
<i><font color="#9A1900">(*</font></i>
<i><font color="#9A1900"> * Take a string of bytes and convert them to hex string</font></i>
<i><font color="#9A1900"> * representation</font></i>
<i><font color="#9A1900"> *)</font></i>
<b><font color="#0000FF">let</font></b> hex_of_string <font color="#990000">=</font>
<b><font color="#000080">String</font></b><font color="#990000">.</font>concat_map <font color="#990000">~</font>f<font color="#990000">:(</font><b><font color="#0000FF">fun</font></b> c <font color="#990000">-></font> sprintf <font color="#FF0000">"%X"</font> <font color="#990000">(</font><b><font color="#000080">Char</font></b><font color="#990000">.</font>to_int c<font color="#990000">))</font>
<i><font color="#9A1900">(*</font></i>
<i><font color="#9A1900"> * An Robj can have multiple values in it, each one with its</font></i>
<i><font color="#9A1900"> * own content type, encoding, and value. This just prints</font></i>
<i><font color="#9A1900"> * the value, which is a string blob</font></i>
<i><font color="#9A1900"> *)</font></i>
<b><font color="#0000FF">let</font></b> print_contents contents <font color="#990000">=</font>
<b><font color="#000080">List</font></b><font color="#990000">.</font>iter
<font color="#990000">~</font>f<font color="#990000">:(</font><b><font color="#0000FF">fun</font></b> content <font color="#990000">-></font>
<b><font color="#0000FF">let</font></b> <b><font color="#0000FF">module</font></b> <font color="#009900">C</font> <font color="#990000">=</font> <b><font color="#000080">Riakc</font></b><font color="#990000">.</font><b><font color="#000080">Robj</font></b><font color="#990000">.</font><font color="#009900">Content</font> <b><font color="#0000FF">in</font></b>
printf <font color="#FF0000">"VALUE: %s\n"</font> <font color="#990000">(</font><b><font color="#000080">C</font></b><font color="#990000">.</font>value content<font color="#990000">))</font>
contents
<b><font color="#0000FF">let</font></b> fail s <font color="#990000">=</font>
printf <font color="#FF0000">"%s\n"</font> s<font color="#990000">;</font>
shutdown <font color="#993399">1</font>
<b><font color="#0000FF">let</font></b> exec <font color="#990000">()</font> <font color="#990000">=</font>
<b><font color="#0000FF">let</font></b> host <font color="#990000">=</font> <b><font color="#000080">Sys</font></b><font color="#990000">.</font>argv<font color="#990000">.(</font><font color="#993399">1</font><font color="#990000">)</font> <b><font color="#0000FF">in</font></b>
<b><font color="#0000FF">let</font></b> port <font color="#990000">=</font> <b><font color="#000080">Int</font></b><font color="#990000">.</font>of_string <b><font color="#000080">Sys</font></b><font color="#990000">.</font>argv<font color="#990000">.(</font><font color="#993399">2</font><font color="#990000">)</font> <b><font color="#0000FF">in</font></b>
<i><font color="#9A1900">(*</font></i>
<i><font color="#9A1900"> * [with_conn] is a little helper function that will</font></i>
<i><font color="#9A1900"> * establish a connection, run a function on the connection</font></i>
<i><font color="#9A1900"> * and tear it down when done</font></i>
<i><font color="#9A1900"> *)</font></i>
<b><font color="#000080">Riakc</font></b><font color="#990000">.</font><b><font color="#000080">Conn</font></b><font color="#990000">.</font>with_conn
<font color="#990000">~</font>host
<font color="#990000">~</font>port
<font color="#990000">(</font><b><font color="#0000FF">fun</font></b> c <font color="#990000">-></font>
<b><font color="#0000FF">let</font></b> <b><font color="#0000FF">module</font></b> <font color="#009900">R</font> <font color="#990000">=</font> <b><font color="#000080">Riakc</font></b><font color="#990000">.</font><font color="#009900">Robj</font> <b><font color="#0000FF">in</font></b>
<b><font color="#0000FF">let</font></b> content <font color="#990000">=</font> <b><font color="#000080">R</font></b><font color="#990000">.</font><b><font color="#000080">Content</font></b><font color="#990000">.</font>create <font color="#FF0000">"some random data"</font> <b><font color="#0000FF">in</font></b>
<b><font color="#0000FF">let</font></b> robj <font color="#990000">=</font> <b><font color="#000080">R</font></b><font color="#990000">.</font>create <font color="#990000">[]</font> <font color="#990000">|></font> <b><font color="#000080">R</font></b><font color="#990000">.</font>set_content content <b><font color="#0000FF">in</font></b>
<i><font color="#9A1900">(*</font></i>
<i><font color="#9A1900"> * Put takes a bucket, a key, and an optional list of</font></i>
<i><font color="#9A1900"> * options. In this case we are setting the</font></i>
<i><font color="#9A1900"> * [Return_body] option which returns what the key</font></i>
<i><font color="#9A1900"> * looks like after the put. It is possible that</font></i>
<i><font color="#9A1900"> * siblings were created.</font></i>
<i><font color="#9A1900"> *)</font></i>
<b><font color="#000080">Riakc</font></b><font color="#990000">.</font><b><font color="#000080">Conn</font></b><font color="#990000">.</font>put
c
<font color="#990000">~</font>b<font color="#990000">:</font><font color="#FF0000">"test_bucket"</font>
<font color="#990000">~</font>k<font color="#990000">:</font><font color="#FF0000">"test_key"</font>
<font color="#990000">~</font>opts<font color="#990000">:[</font><b><font color="#000080">Riakc</font></b><font color="#990000">.</font><b><font color="#000080">Opts</font></b><font color="#990000">.</font><b><font color="#000080">Put</font></b><font color="#990000">.</font><font color="#009900">Return_body</font><font color="#990000">]</font>
robj<font color="#990000">)</font>
<b><font color="#0000FF">let</font></b> eval <font color="#990000">()</font> <font color="#990000">=</font>
exec <font color="#990000">()</font> <font color="#990000">>>|</font> <b><font color="#0000FF">function</font></b>
<font color="#990000">|</font> <font color="#009900">Ok</font> <font color="#990000">(</font>robj<font color="#990000">,</font> key<font color="#990000">)</font> <font color="#990000">-></font> <b><font color="#0000FF">begin</font></b>
<i><font color="#9A1900">(*</font></i>
<i><font color="#9A1900"> * [put] returns a [Riakc.Robj.t] and a [string</font></i>
<i><font color="#9A1900"> * option], which is the key if Riak had to generate</font></i>
<i><font color="#9A1900"> * it</font></i>
<i><font color="#9A1900"> *)</font></i>
<b><font color="#0000FF">let</font></b> <b><font color="#0000FF">module</font></b> <font color="#009900">R</font> <font color="#990000">=</font> <b><font color="#000080">Riakc</font></b><font color="#990000">.</font><font color="#009900">Robj</font> <b><font color="#0000FF">in</font></b>
<i><font color="#9A1900">(*</font></i>
<i><font color="#9A1900"> * Extract the vclock, if it exists, and convert it to</font></i>
<i><font color="#9A1900"> * to something printable</font></i>
<i><font color="#9A1900"> *)</font></i>
<b><font color="#0000FF">let</font></b> vclock <font color="#990000">=</font>
<b><font color="#000080">Option</font></b><font color="#990000">.</font>value
<font color="#990000">~</font>default<font color="#990000">:</font><font color="#FF0000">"<none>"</font>
<font color="#990000">(</font><b><font color="#000080">Option</font></b><font color="#990000">.</font>map <font color="#990000">~</font>f<font color="#990000">:</font>hex_of_string <font color="#990000">(</font><b><font color="#000080">R</font></b><font color="#990000">.</font>vclock robj<font color="#990000">))</font>
<b><font color="#0000FF">in</font></b>
<b><font color="#0000FF">let</font></b> key <font color="#990000">=</font> <b><font color="#000080">Option</font></b><font color="#990000">.</font>value <font color="#990000">~</font>default<font color="#990000">:</font><font color="#FF0000">"<none>"</font> key <b><font color="#0000FF">in</font></b>
printf <font color="#FF0000">"KEY: %s\n"</font> key<font color="#990000">;</font>
printf <font color="#FF0000">"VCLOCK: %s\n"</font> vclock<font color="#990000">;</font>
print_contents <font color="#990000">(</font><b><font color="#000080">R</font></b><font color="#990000">.</font>contents robj<font color="#990000">);</font>
shutdown <font color="#993399">0</font>
<b><font color="#0000FF">end</font></b>
<i><font color="#9A1900">(*</font></i>
<i><font color="#9A1900"> * These are the various errors that can be returned.</font></i>
<i><font color="#9A1900"> * Many of then come directly from the ProtoBuf layer</font></i>
<i><font color="#9A1900"> * since there aren't really any more semantics to apply</font></i>
<i><font color="#9A1900"> * to the data if it matches the PB frame.</font></i>
<i><font color="#9A1900"> *)</font></i>
<font color="#990000">|</font> <font color="#009900">Error</font> `<font color="#009900">Bad_conn</font> <font color="#990000">-></font> fail <font color="#FF0000">"Bad_conn"</font>
<font color="#990000">|</font> <font color="#009900">Error</font> `<font color="#009900">Bad_payload</font> <font color="#990000">-></font> fail <font color="#FF0000">"Bad_payload"</font>
<font color="#990000">|</font> <font color="#009900">Error</font> `<font color="#009900">Incomplete_payload</font> <font color="#990000">-></font> fail <font color="#FF0000">"Incomplete_payload"</font>
<font color="#990000">|</font> <font color="#009900">Error</font> `<font color="#009900">Notfound</font> <font color="#990000">-></font> fail <font color="#FF0000">"Notfound"</font>
<font color="#990000">|</font> <font color="#009900">Error</font> `<font color="#009900">Incomplete</font> <font color="#990000">-></font> fail <font color="#FF0000">"Incomplete"</font>
<font color="#990000">|</font> <font color="#009900">Error</font> `<font color="#009900">Overflow</font> <font color="#990000">-></font> fail <font color="#FF0000">"Overflow"</font>
<font color="#990000">|</font> <font color="#009900">Error</font> `<font color="#009900">Unknown_type</font> <font color="#990000">-></font> fail <font color="#FF0000">"Unknown_type"</font>
<font color="#990000">|</font> <font color="#009900">Error</font> `<font color="#009900">Wrong_type</font> <font color="#990000">-></font> fail <font color="#FF0000">"Wrong_type"</font>
<b><font color="#0000FF">let</font></b> <font color="#990000">()</font> <font color="#990000">=</font>
ignore <font color="#990000">(</font>eval <font color="#990000">());</font>
never_returns <font color="#990000">(</font><b><font color="#000080">Scheduler</font></b><font color="#990000">.</font>go <font color="#990000">())</font>
</code></pre>
<p>
Now compile it:
</p>
<pre><code>ocamlfind ocamlopt -thread -I +camlp4 -package riakc -c demo.ml
ocamlfind ocamlopt -package riakc -thread -linkpkg \
-o demo.native demo.cmx
</code></pre>
<p>
Finally, you can run it: <code>./demo.native <i>hostname</i> <i>port</i></code>
</p>
<h1>...And More Detail</h1>
<p>
The API for Riakc is broken up into two modules: <code>Riakc.Robj</code> and <code>Riakc.Conn</code> with <code>Riakc.Opts</code> being a third helper module. Below is in reference to version 2.0.0 of Riakc.
</p>
<h2>Riakc.Robj</h2>
<p>
<code>Riakc.Robj</code> defines a representation of an object stored in Riak. <code>Robj</code> is completely pure code. The API can be found <a href="https://github.com/orbitz/ocaml-riakc/blob/2.0.0/lib/riakc/robj.mli">here</a>.
</p>
<h2>Riakc.Conn</h2>
<p>
This is the I/O layer. All interaction with the actual database happens through this module. <code>Riakc.Conn</code> is somewhat clever in that it has a compile-time requirement that you have called <code>Riakc.Robj.set_content</code> on any value you want to PUT. This guarantees you have resolved all siblings, somehow. Its API can be found <a href="https://github.com/orbitz/ocaml-riakc/blob/2.0.0/lib/riakc/conn.mli">here</a>.
</p>
<h2>Riakc.Opts</h2>
<p>
Finally, various options are defined in <code>Riakc.Opts</code>. These are options that GET and PUT take. Not all of them are actually supported but support is planned. The API can be viewed <a href="https://github.com/orbitz/ocaml-riakc/blob/2.0.0/lib/riakc/opts.mli">here</a>.
</p>
<p>
Hopefully Riakc has a fairly straight forward API. While the example code might be longer than other clients, it is complete and correct (I hope).
</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4348519741358344123.post-69113895398159482932013-05-25T13:40:00.000-04:002013-05-25T13:58:50.423-04:00Setting Up NixOps On Mac OS X With VirtualBox<h2>Disclaimer</h2>
<p>
I am a new user of nixops, so I cannot guarantee these directions work for everyone. I have successfully set it up on two machines.
</p>
<h1>Preamble</h1>
<p>
The following directions describe how to setup nixops on a Mac OS X machine in VirtualBox. By the end of this you should be able to spawn as many NixOS instances in VirtualBox as your machine can handle. NixOps is similar to vagrant, except it deploys NixOS instances. It can deploy them locally, using VirtualBox, or remotely using EC2. It allows you to deploy clusters of machines, automatically allowing them to communicate with each other. At a high-level, nixops deploys an instance by doing the following:
</p>
<ol>
<li>It builds the environment you ask for on another NixOS instance. This could be your local machine or a build server.</li>
<li>It creates a VM on the service or system you defined (VirtualBox, EC2, etc).</li>
<li>It uploads the environment you've defined to the machine.</li>
</ol>
<p>
The main problem is that nixops must build the environment on the same OS and arch it is deploying. NixOS is a linux distro, that means you cannot built the environment on your Mac. The minor problem is that, by default, the OS X filesystem that everyone gets is case insensitive and that doesn't play well with nix, the package manager.
</p>
<p>
This post will accomplish the following:
</p>
<ol>
<li>Install and setup VirtualBox.</li>
<li>If your OS X file system is case insensitive (assume it is if you haven't done anything to change it), we will create a loopback mount to install nix on.</li>
<li>Install nix on OS X.</li>
<li>An initial NixOS VirtualBox instance will be created to bootstrap the process and act as a distributed build server.</li>
<li>Create a user on the build system.</li>
<li>Setup up signing keys, so we can copy environments between build server, host, and deployed VM.</li>
<li>Setup local nix to use this VM as a build server.</li>
<li>Deploy a VM.</li>
</ol>
<h1>1. Install VirtualBox</h1>
<p>
Download VirtualBox and install it. Just follow the directions. The only interesting thing you have to do is make sure you have the <b>vboxnet0</b> adapter setup in networking. To do this:
</p>
<ol>
<li>Start VirtualBox.</li>
<li>Go to preferences (Cmd-,).</li>
<li>Click on Network.</li>
<li>If <b>vboxnet0</b> is not present, add it by clicking the green +.</li>
<li>Edit <b>vboxnet0</b> and make sure DHCP Server is turned on. The settings I use are below.</li>
</ol>
<ul>
<li><b>Server Address:</b> 192.168.56.100</li>
<li><b>Server Mask:</b> 255.255.255.0</li>
<li><b>Lower Address Bound:</b> 192.168.56.101</li>
<li><b>Upper Address Bound:</b> 192.168.56.254</li>
</ul>
<h1>2. Creating a case-sensitive file system</h1>
<p>
Unless you have explicitly changed it, your OS X machine likely has a case insensitive file system. This means nix build some packages. The method I have chosen to get around this is to create a loopback filesystem and mount that.
</p>
<ol>
<li>Create a image. I have been using one 5GB successfully, but if you plan on being a heavy user of nix, you should make it larger.<br/><i>hdiutil create ~/nix-loopback -megabytes 5000 -ov -type UDIF</i></li>
<li>Load it but do not mount:<br/><i>hdiutil attach -imagekey diskimage-class=CRawDiskImage -nomount ~/nix-loopback.dmg</i></li>
<li>Determine which disk and partition your newly created image corresponds to. Specifically you want to find the image that corresponds to the Apple_HFS entry you just created. It will probably be something like disk2s2, but could be anything.<br/><i>diskutil list</i></li>
<li>Create a case-sensitive file system on this partition:<br/><i>newfs_hfs -s /dev/disk2s2</i></li>
<li>Make the mountpoint:<br/><i>sudo mkdir /nix</i></li>
<li>Mount it:<br/><i>sudo mount -t hfs /dev/disk2s2 /nix</i></li>
</ol>
<p>
At this point if you run <i>mount</i> you should see something mounted on /nix.
</p>
<p>
<b>NOTE:</b> I don't know how to make this point on reboot, which you will need to do if you want to use nix after restarting your system.
</p>
<h1>3. Install Nix</h1>
<ol>
<li>Download the binary nix darwin package from <a href="http://nixos.org">nixos.org</a>.</li>
<li>Go to root:<br/><i>cd /</i></li>
<li>Untar nix:<br/><i>sudo tar -jxvf /path/to/nix-1.5.2-x86_64-darwin.tar.bz2</i></li>
<li>Chown it to your user:<br/><i>sudo chown -R your-user /nix</i></li>
<li>Finish the install:<br/><i>nix-finish-install</i></li>
<li><i>nix-finish-install</i> will print out some instructions, you should copy the 'source' to your ~/.profile and run it in your current shell (and any other shell you plan on not restarting but using nix in).</li>
<li>Delete the installer:<br/><i>sudo rm /usr/bin/nix-finish-install</i></li>
</ol>
<h1>4. Setup Nix</h1>
<ol>
<li>Add the nixos channel:<br/><i>nix-channel --add http://nixos.org/releases/nixos/channels/nixos-unstable</i></li>
<li>Update:<br/><i>nix-channel --update</i></li>
<li>Install something:<br/><i>nix-env -i tmux</i>
</ol>
<h1>5. Install NixOps</h1>
<ol>
<li>Set NIX_PATH:<br/><i>export NIX_PATH=/nix/var/nix/profiles/per-user/`whoami`/channels/nixos</i></li>
<li>Get nixops:<br/><i>git clone git://github.com/NixOS/nixops.git</i></li>
<li><i>cd nixops</i></li>
<li>Install:<br/><i>nix-env -f . -i nixops</i></li>
<li>Verify it is installed:<br/><i>nixops --version</i></li>
</ol>
<h1>5. Setup Distributed Builds</h1>
<p>
When deploying an instance, nixops needs to build the environment somewhere then it will transfer it to the instance. In order to do this, it needs an already existing NixOS instance to build on. If you were running NixOS already, this would be the machine you are deploying from. To accomplish this, you need a a NixOS running in a VM. Eventually nixops will probably accomplish this for you, but for now it needs to be done manually. Luckily, installing NixOS on VirtualBox is pretty straight forward.
</p>
<ol>
<li>Install a NixOS on VirtualBox from the directions <a href="http://nixos.org/wiki/Installing_NixOS_in_a_VirtualBox_guest">here</a>. This doesn't need any special settings, just SSH.</li>
<li>Setup a port forward so you can SSH into the machine. I'll assume this port forward is 3223.</li>
<li>Make a user called 'nix' on the VM. This is the user that we will SSH through for building. The name of the user doesn't matter, but these directions will assume its name is 'nix'.</li>
<li>On OS X, create two pairs of passwordless SSH keys. One pair will be the login for the nix user. The other will be signing keys.</li>
<li>Install the login public key.</li>
<li>On OS X, create /etc/nix/ (<i>mkdir /etc/nix</i>)</li>
<li>Copy the private signing key to /etc/nix/signing-key.sec. Make sure this is owned by the user you'll be running nixops as and is readable only by that user.</li>
<li>Create a public signing key from your private signing key using openssl. This needs to be in whatever format openssl produces which is not the same as what ssh-keygen created. This output should be in /etc/nix/signing-key.pub. The owner and permissions don't matter as long as the user you'll run nixops as can read it.<br/><i>openssl rsa -in /etc/nix/signing-key.sec -pubout > /etc/nix/signing-key.pub</i></li>
<li>Copy the signing keys to the build server, putting them in the same location. Make sure the nix user owns the private key and is the only one that can read it.</li>
<li>Tell nix to do distributed builds:<br/><i>export NIX_BUILD_HOOK=$HOME/.nix-profile/libexec/nix/build-remote.pl</i></li>
<li>Tell the distributed builder where to store load content:<br/><i>export NIX_CURRENT_LOAD=/tmp/current-load</i><br/><i>mkdir /tmp/current-load</i></li>
<li>Go into a directory you can create files in:<br/>
<pre><i>cat <<EOF > remote-systems.conf
nix@nix-build-server x86_64-linux /Users/`whoami`/.ssh/id_rsa 1 1
EOF</i></pre></li>
<li>Tell the remote builder where to find machine information:<br/><i>export NIX_REMOTE_SYSTEMS=$PWD/remote-systems.conf</i></li>
<li>Add an entry to ~/.ssh/config the fake host 'nix-build-server' turns into your actual VM:<br/>
<pre><i>Host nix-build-server
HostName localhost
Port 3223</i></pre></li>
</ol>
<h1>6. Start An Instance</h1>
<ol>
<li>Create your machine's nix expression:<br/>
<pre><i>cat <<EOF > test-vbox.nix
{
test =
{ config, pkgs, ... }:
{ deployment.targetEnv = "virtualbox";
deployment.virtualbox.memorySize = 512; # megabytes
};
}
EOF</i></pre></li>
<li>Create a machine instance named <i>test</i>:<br/><i>nixops create ./test-vbox.nix --name test</i></li>
<li>Deploy it:<br/><i>nixops deploy -d test</i></li>
</ol>
<p>
This could take awhile, and at some points it might not seem like it's doing anything because it's waiting for a build or a transfer. It will push around a fair amount of data. After all is said and done you should be able to do <i>nixops ssh -d test test</i> to connect to it.
</p>
<h1>Troubleshooting</h1>
<ul>
<li><b>I do a deploy and it sits forever waiting for SSH</b> - You probably forgot to setup your vboxnet0 adapter properly. See Section 1.</li>
<li><b>It dies while building saying a store isn't signed</b> - Only root an import unsigned stores, this means your signing keys aren't stup properly. Double check your permissions.</li>
</ul>
<p>
Other problems? Post them in the comments and I'll add them to the list.
</p>
<h1>Known Bugs</h1>
<ul>
<li><b><i>nixops stop -d test</i> never returns</b> - I've only experienced this on one of my installations. It is okay, though. Wait a bit and exit out of the command, then you can do any command as if stop succeeded</li>
<li><b>My Mac grey-screens of death!</b> - This has happened to be once. I update my version of VirtualBox and installed any updates from Apple and I have not experienced it again.</li>
</ul>
<h1>Further Reading</h1>
<ul>
<li><a href="http://hydra.nixos.org/build/4961057/download/2/nixops/manual.html">NixOps Manual</a></li>
</ul>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-4348519741358344123.post-33638101011270756632013-03-17T10:42:00.001-04:002013-03-18T16:35:20.861-04:00[ANN] Riakc 0.0.0<p><i>
Note, since writing this post, Riakc 1.0.0 has already been released and merged into opam. It fixes the below issue of Links (there is a typo in the release notes, 'not' should be 'now'. The source code can be found <a href="https://github.com/orbitz/ocaml-riakc/tree/1.0.0">here</a>. The 1.0.0 version number does not imply any stability or completeness of the library, just that it is not backwards compatible with 0.0.0.
</i></p>
<p>
Riakc is a Riak Protobuf client for Ocaml. Riakc uses Jane St Core/Async for concurrency. Riakc is in early development and so far supports a subset of the Riak API. The supported methods are:
</p>
<p>
<ul>
<li>ping</li>
<li>client_id</li>
<li>server_info</li>
<li>list_buckets</li>
<li>list_keys</li>
<li>bucket_props</li>
<li>get</li>
<li>put</li>
<li>delete</li>
</ul>
</p>
<h2>A note on GET</h2>
<p>
Links are currently dropped all together in the implementation, so if you read a value with links and write it back, you will have lost them. This will be fixed in the very near future.
</p>
<p>
As with anything, please feel free to submit issues and pull requests.
</p>
<p>
The source code can be found <a href="https://github.com/orbitz/ocaml-riakc/tree/0.0.0">here</a>. Riakc is in opam and you can install it by doing <code>opam install riakc</code>.
</p>
<h1>Usage</h1>
<p>
There are two API modules in Riakc. Examples of all existing API functions can be found <a href="https://github.com/orbitz/ocaml-riakc/tree/0.0.0/examples">here</a>.
</p>
<h2>Riakc.Conn</h2>
<p>
<code>Riakc.Conn</code> provides the API for performing actions on the database. The module interface can be read <a href="https://github.com/orbitz/ocaml-riakc/blob/0.0.0/lib/riakc/conn.mli">here</a>.
</p>
<h2>Riakc.Robj</h2>
<p>
<code>Riakc.Robj</code> provides the API for objects stored in Riak. The module interface can be read <a href="https://github.com/orbitz/ocaml-riakc/blob/0.0.0/lib/riakc/robj.mli">here</a>. <code>Riakc.Conn.get</code> returns a <code>Riakc.Robj.t</code> and <code>Riakc.Conn.put</code> takes one. <code>Robj.t</code> supports representing siblings, however <code>Riakc.Conn.put</code> cannot PUT objects with siblings, this is enforced using phantom types. A value of <code>Riakc.Robj.t</code> that might have siblings is converted to one that doesn't using <code>Riakc.Robj.set_content</code>.
</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4348519741358344123.post-40090132793605451212013-03-17T10:21:00.000-04:002013-03-17T10:21:25.013-04:00[ANN] Protobuf 0.0.2<p>
Protobuf is an Ocaml library for communicating with Google's protobuf format. It provides a method for writing parsers and builders. There is no protoc support, yet and writing it is not a top goal right now. Protobuf is meant to be fairly lightweight and straight forward to use. The only other Protobuf support for Ocaml I am aware of is through <a href="http://piqi.org/">piqi</a>, however that was too heavy for my needs.
</p>
<p>
Protobuf is meant to be very low level, mostly dealing with representation of values and not semantics. For example, the <code>fixed32</code> and <code>sfixed32</code> values are both parsed as <code>Int32.t</code>'s. Dealing with being signed or not is left up to the user.
</p>
<p>
The source code can be viewed <a href="https://github.com/orbitz/ocaml-protobuf/tree/0.0.2">here</a>. Protobuf is in opam, to install it <code>opam install protobuf</code>.
</p>
<p>
The hope is that parsers and builders look reasonably close to the <code>.proto</code> files such that translation is straight forward, at least until protoc support is added. This is an early release and, without a doubt, has bugs in it please submit pull requests and issues.
</p>
<p>
<a href="https://github.com/orbitz/ocaml-protobuf/tree/0.0.2/">https://github.com/orbitz/ocaml-protobuf/tree/0.0.2/</a>
</p>
<h1>Examples</h1>
<p>
The best collection of examples right now is the <a href="https://github.com/orbitz/ocaml-protobuf/blob/0.0.2/lib/protobuf/protobuf_test.ml">tests</a>. An example from the file:
</p>
<p>
<pre><code><b><font color="#0000FF">let</font></b> simple <font color="#990000">=</font>
<b><font color="#000080">P</font></b><font color="#990000">.</font><font color="#009900">int32</font> <font color="#993399">1</font> <font color="#990000">>>=</font> <b><font color="#000080">P</font></b><font color="#990000">.</font>return
<b><font color="#0000FF">let</font></b> complex <font color="#990000">=</font>
<b><font color="#000080">P</font></b><font color="#990000">.</font><font color="#009900">int32</font> <font color="#993399">1</font> <font color="#990000">>>=</font> <b><font color="#0000FF">fun</font></b> num <font color="#990000">-></font>
<b><font color="#000080">P</font></b><font color="#990000">.</font><font color="#009900">string</font> <font color="#993399">2</font> <font color="#990000">>>=</font> <b><font color="#0000FF">fun</font></b> s <font color="#990000">-></font>
<b><font color="#000080">P</font></b><font color="#990000">.</font>embd_msg <font color="#993399">3</font> simple <font color="#990000">>>=</font> <b><font color="#0000FF">fun</font></b> emsg <font color="#990000">-></font>
<b><font color="#000080">P</font></b><font color="#990000">.</font>return <font color="#990000">(</font>num<font color="#990000">,</font> s<font color="#990000">,</font> emsg<font color="#990000">)</font>
<b><font color="#0000FF">let</font></b> run_complex str <font color="#990000">=</font>
<b><font color="#0000FF">let</font></b> <b><font color="#000080">open</font></b> <b><font color="#000080">Result</font></b><font color="#990000">.</font><font color="#009900">Monad_infix</font> <b><font color="#0000FF">in</font></b>
<b><font color="#000080">P</font></b><font color="#990000">.</font><b><font color="#000080">State</font></b><font color="#990000">.</font>create <font color="#990000">(</font><b><font color="#000080">Bitstring</font></b><font color="#990000">.</font>bitstring_of_string str<font color="#990000">)</font>
<font color="#990000">>>=</font> <b><font color="#0000FF">fun</font></b> s <font color="#990000">-></font>
<b><font color="#000080">P</font></b><font color="#990000">.</font>run complex s
</code></pre>
</p>
<p>
The builder for this message looks like:
</p>
<p>
<pre><code><b><font color="#0000FF">let</font></b> build_simple i <font color="#990000">=</font>
<b><font color="#0000FF">let</font></b> <b><font color="#000080">open</font></b> <b><font color="#000080">Result</font></b><font color="#990000">.</font><font color="#009900">Monad_infix</font> <b><font color="#0000FF">in</font></b>
<b><font color="#0000FF">let</font></b> b <font color="#990000">=</font> <b><font color="#000080">B</font></b><font color="#990000">.</font>create <font color="#990000">()</font> <b><font color="#0000FF">in</font></b>
<b><font color="#000080">B</font></b><font color="#990000">.</font><font color="#009900">int32</font> b <font color="#993399">1</font> i <font color="#990000">>>=</font> <b><font color="#0000FF">fun</font></b> <font color="#990000">()</font> <font color="#990000">-></font>
<font color="#009900">Ok</font> <font color="#990000">(</font><b><font color="#000080">B</font></b><font color="#990000">.</font>to_string b<font color="#990000">)</font>
<b><font color="#0000FF">let</font></b> build_complex <font color="#990000">(</font>i1<font color="#990000">,</font> s<font color="#990000">,</font> i2<font color="#990000">)</font> <font color="#990000">=</font>
<b><font color="#0000FF">let</font></b> <b><font color="#000080">open</font></b> <b><font color="#000080">Result</font></b><font color="#990000">.</font><font color="#009900">Monad_infix</font> <b><font color="#0000FF">in</font></b>
<b><font color="#0000FF">let</font></b> b <font color="#990000">=</font> <b><font color="#000080">B</font></b><font color="#990000">.</font>create <font color="#990000">()</font> <b><font color="#0000FF">in</font></b>
<b><font color="#000080">B</font></b><font color="#990000">.</font><font color="#009900">int32</font> b <font color="#993399">1</font> i1 <font color="#990000">>>=</font> <b><font color="#0000FF">fun</font></b> <font color="#990000">()</font> <font color="#990000">-></font>
<b><font color="#000080">B</font></b><font color="#990000">.</font><font color="#009900">string</font> b <font color="#993399">2</font> s <font color="#990000">>>=</font> <b><font color="#0000FF">fun</font></b> <font color="#990000">()</font> <font color="#990000">-></font>
<b><font color="#000080">B</font></b><font color="#990000">.</font>embd_msg b <font color="#993399">3</font> i2 build_simple <font color="#990000">>>=</font> <b><font color="#0000FF">fun</font></b> <font color="#990000">()</font> <font color="#990000">-></font>
<font color="#009900">Ok</font> <font color="#990000">(</font><b><font color="#000080">B</font></b><font color="#990000">.</font>to_string b<font color="#990000">)</font>
</code></pre>
</p>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-4348519741358344123.post-81765748911531016032013-02-07T16:52:00.000-05:002013-02-07T16:53:25.685-05:00[ANN] ocaml-vclock - 0.0.0<p>
I ported some Erlang vector clock code to Ocaml for fun and learning. It's not well tested and it hasn't any performance optimizations. I'm not ready yet but I have some projects in mind to use it so it will likely get fleshed out more.
</p>
<p>
Vector clocks are a system for determining the partial ordering of events in a distributed environment. You can determine if one value is the ancestor of another, equal, or was concurrently updated. It is one mechanism that distributed databases, such as Riak, use to automatically resolve some conflicts in data while maintaining availability.
</p>
<p>
The vector clock implementation allows for user defined site id type. It also allows metadata to be encoded in the site id, which is useful if you want your vector clock to be prunable by encoding timestamps in it.
</p>
<p>
The repo can be found <a href="https://github.com/orbitz/ocaml-vclock/tree/0.0.0">here</a>. If you'd like to learn more about vector clocks read the wikipedia page <a href="http://en.wikipedia.org/wiki/Vector_clock">here</a>. The Riak website also has some content on vector clocks <a href="http://docs.basho.com/riak/latest/references/appendices/concepts/Vector-Clocks/">here</a>.
</p>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-4348519741358344123.post-14493541163749943092013-01-04T19:26:00.000-05:002013-01-04T19:26:13.930-05:00Deconstructing Zed's K&R2 Deconstruction<p>
I recently stumbled upon Zed Shaw's <a href="http://c.learncodethehardway.org/book/krcritique.html">deconstruction of K&R2</a>. The post is well intended but, in my opinion, flawed. I believe Zed fails to make a valid argument and also fails to provide a valid solution to the issue he raises.
</p>
<p>
The chapter is clearly not finished, so this rebuttal might not be valid at the time of reading.
</p>
<h1>The Argument</h1>
<p>
The primary argument is that K&R2 is not an appropriate tool for learning C in our modern age. The example given is a function called <code>copy</code> which is effectively <code>strcpy</code>. Zed points out that if the function is not given a valid string, as C defines it, the behaviour of the function is undefined.
</p>
<blockquote>
<p>
This provides a formal proof that the function is defective because there are possible inputs that causes the while-loop to run forever or overflow the target.
</p>
</blockquote>
<p>
When presented with the rebuttal that the cases where it fails are not valid C strings, the response is that it doesn't matter:
</p>
<blockquote>
<p>
... but I'm saying the function is defective because most of the possible inputs cause it to crash the software.
</p>
<p>
The problem with this mindset is there's no way to confirm that a C string is valid.</p>
</blockquote>
<p>
Also:
</p>
<blockquote>
<p>
Another argument in favor of this copy() function is when the proponents of K&RC state that you are "just supposed to not use bad strings". Despite the mountains of empirical evidence that this is impossible in C code...
</p>
</blockquote>
<p>
To reiterate, the problem with <code>copy</code> is that:
</p>
<p>
<ol>
<li>It depends on valid C strings to operate correctly</li>
<li>C strings are impossible to validate at run-time</li>
<li>The behaviour of <code>copy</code> is undefined for most values that are possible to be put into a <code>char*</code></li>
</ol>
</p>
<h1>Proposed Solution</h1>
<p>
The solution is a function called <code>safercopy</code> which takes the lengths of the storages as input, allegedly guaranteeing the termination of <code>safercopy</code>:
</p>
<blockquote>
<p>
In every case the for-loop variant with string length given as arguments will terminate no matter what.
</p>
</blockquote>
<h1>What's Wrong With This</h1>
<p>
We can write what is wrong with <code>safercopy</code> using the exact same criteria Zed used for <code>copy</code>:
</p>
<p>
<ol>
<li>It depends on valid lengths to operate correctly</li>
<li>The lengths are impossible to validate at run-time</li>
<li>The behaviour of <code>safercopy</code> is undefined for most values that are possible to be put into a <code>size_t</code> (I am presuming that the lengths would be a <code>size_t</code>)</li>
</ol>
</p>
<p>
Additionally, Zed instills a false confidence in his <code>safercopy</code>. The function is no more guaranteed to terminate than <code>copy</code> when given bad input. Specifically, if the lengths are wrong causing the copy loop to go out of bounds of its storage it could easily overwrite the value of anything, including the lengths and pointer values in the loop its in. It could blow up, it could loop forever, who knows. It's undefined.
</p>
<p>
Finally, if it is hard to properly handle C strings, why should we think it is any easier to track the length of a string separately? Remember, in C the length of a string is encoded in the string itself by the location of the '\0'. The solution provided by Zed takes the length of the strings as separate input. But he provides no reason to believe developers will get this correct. If the solution proposed had been implementing a safe string library, I might be able to agree.
</p>
<p>
And that is the crux of the problem. It's not if K&R2 is any good or not, it's that the solution given isn't any better. It doesn't address the faults of C strings. There are known way safely handle C strings, the problem tends to be that its tedious so people get it wrong. Many C strings issues have to do with lack of proper allocation rather than forgetting the '\0'-terminator. In what way does the solution solve this problem?
</p>
<p>
If the solution given is no better than the problem it's solving, then it isn't a very good solution.
</p>
<h1>K&R2</h1>
<p>
Is K&R2 not suitable for teaching people C in this day? It has plenty of faults in it but I don't think this particular article, as it exists now, makes a compelling argument. Nor does it provide anything better than what it's critiquing.
</p>Unknownnoreply@blogger.com4tag:blogger.com,1999:blog-4348519741358344123.post-57454119651723281452013-01-04T15:37:00.000-05:002013-01-04T15:37:21.088-05:00Experiences using Result.t vs Exceptions in Ocaml<p>
<i>Disclaimer: I have not compiled any of the example code in this post. Mostly because they are snippets meant to illustrate a point rather than be complete on their own. If they have any errors then apologies.</i>
</p>
<p>
Previously I gave an <a href="http://functional-orbitz.blogspot.se/2013/01/introduction-to-resultt-vs-exceptions.html">introduction to return values vs exceptions</a> in Ocaml. But a lot of ideas in software engineering sound good, how does this particular one work out in real software?
</p>
<p>
I have used this style in two projects. The first is a project that was originally written using exceptions and I have converted most of it to using return values. The second is one that was written from the start using return values. They can be found <a href="http://code.google.com/p/para-mugsy/">here</a> and <a href="https://github.com/orbitz/opass">here</a>. <i>I make no guarantees about the quality of the code, in fact I believe some of it to be junk. These are just my subjective opinions in writing software with a particular attribute</i>.
</p>
<h1>The Good</h1>
<h3>Expected Result</h3>
<p>
The whole system worked as expected. I get compile-time errors for all failure cases I do not handle. This has helped me catch some failure cases I had forgotten about previously, some of which would require an unlikely chain of events to hit, which would have made finding in a test harder, but obviously not impossible. In particular, ParaMugsy is (although the current rewrite does not cover this yet) meant to run in a distributed environment, which increases the cost of errors. Both in debugging and reproducing. In the case of opass, writing the DB is important to get right. Missing handling a failure here can mean the users database of passwords can be lost, a tragic event.
</p>
<h3>Not Cumbersome</h3>
<p>
In the Introduction I showed that for a simple program, return-values are no more cumbersome than exceptions. In these larger projects the same holds. This shouldn't really be a surprise though, as the monadic operators actually simulate the exact flow of exception code. But the 'not cumbersome' is half of a lie, which is explained more below.
</p>
<h3>Refactoring Easier</h3>
<p>
Ocaml is a great language when it comes to refactoring. Simply make the change you want and iterate on compiler errors. This style has made it even easier for me. I can add new failures to my functions and work through the compiler errors to make sure the change is handled in every location.
</p>
<h3>Works No Matter The Concurrent Framework</h3>
<p>
The original implementation of ParaMugsy used Lwt. In the rewrite I decided to use Core's Async library. Both are monadic. And both handle exceptions quite differently. Porting functions over that did return-values was much easier because they didn't rely on the framework to handle and propagate failures. Exceptions are tricky in a concurrent framework and concurrency is purely library based in Ocaml rather than being part of the language, which means libraries can choose incompatible ways to handle them. Return-values give one less thing to worry about when porting code or trying to get code to work in multiple frameworks.
</p>
<h1>The Bad</h1>
<h3>Prototyping Easier With Exceptions</h3>
<p>
The whole idea is to make it hard to miss an error case. But that can be annoying when you just want to get something running. Often times we write software in such a way that the success path is the first thing we write and we handle the errors after that. I don't think there is necessarily a good reason for this other than it's much more satisfying to see the results of the hard work sooner rather than later. In this case, my solution is to relax the ban on exceptions temporarily. Any place that I will return an <code>Error</code> I instead write <code>failwith "not yet implemented"</code>. That way there is an easily grepable string to ensure I have replaced all exceptions with <code>Error</code>'s when I am done. This is an annoyance but thankfully with a fairly simple solution.
</p>
<h3>Cannot Express All Invariants In Type System</h3>
<p>
Sometimes there are sections of code where I know something is true, but it is not expressible in the type system. For example, perhaps I have a data structure that updates multiple pieces of information together. I know when I access one piece of information it will be in the other place. Or perhaps I have a pattern match that I need to handle due to exhaustiveness but I know that it cannot happen given some invariants I have established earlier. In the case where I am looking up data that I know will exist, I will use a lookup function that can throw an exception if it is easiest. In the case where I have a pattern match that I know will never happen, I use <code>assert</code>. But note, these are cases where I have metaphysical certitude that such events will not happen. Not cases where I'm just pretty sure they work.
</p>
<h3>Many Useful Libraries Throw Exceptions</h3>
<p>
Obviously a lot of libraries throw exceptions. Luckily the primary library I use is Jane St's Core Suite, where they share roughly the same aversion of exceptions. Some functions still do throw exceptions though, most notably <code>In_channel.with_file</code> and <code>Out_channel.with_file</code>. This can be solved by wrapping those functions in return-value ones. The problem comes in: what happens when the function being wrapped is poorly documented or at some point can throw more exceptional cases than when it was originally wrapped. One option is to always catch <code>_</code> and turn it into a fairly generic variant type. Or maybe a function only has a few logical failure conditions so collapsing them to a few variant types makes sense. I'm not aware of any really good solution here.
</p>
<h1>A Few Examples</h1>
<p>
There are a few transformations that come up often when converting exception code to return-value code. Here are some in detail.
</p>
<h3>Building Things</h3>
<p>
It's common to want to do some work and then construct a value from it. In exception-land that is as simple, just something like <code>Constructor (thing_that_may_throw_exception ())</code>. This doesn't work with return-values. Instead we have to do what we did in the Introduction post. Here is an example:
</p>
<pre><code><b><font color="#0000FF">let</font></b> f <font color="#990000">()</font> <font color="#990000">=</font>
<b><font color="#0000FF">let</font></b> <b><font color="#000080">open</font></b> <b><font color="#000080">Result</font></b><font color="#990000">.</font><font color="#009900">Monad_infix</font> <b><font color="#0000FF">in</font></b>
thing_that_may_fail <font color="#990000">()</font> <font color="#990000">>>=</font> <b><font color="#0000FF">fun</font></b> v <font color="#990000">-></font>
<font color="#009900">Ok</font> <font color="#990000">(</font><font color="#009900">Constructor</font> v<font color="#990000">)</font>
</code></pre>
<h3>Looping</h3>
<p>
Some loops cannot be written in their most obvious style. Consider an implementation of <code>map</code> that expects the function passed to it to use <code>Result.t</code> to signal failures. The very naive implementation of <code>map</code> is:
</p>
<pre><code><b><font color="#0000FF">let</font></b> map f <font color="#990000">=</font> <b><font color="#0000FF">function</font></b>
<font color="#990000">|</font> <font color="#990000">[]</font> <font color="#990000">-></font> <font color="#990000">[]</font>
<font color="#990000">|</font> x<font color="#990000">::</font>xs <font color="#990000">-></font> <font color="#990000">(</font>f x<font color="#990000">)::(</font>map xs<font color="#990000">)</font>
</code></pre>
<p>
There are two ways to write this. The first requires two passes over the elements. The first pass applies the function and the second one checks which value each function returned or the first error that was hit.
</p>
<pre><code><b><font color="#0000FF">let</font></b> map f l <font color="#990000">=</font>
<b><font color="#000080">Result</font></b><font color="#990000">.</font>all <font color="#990000">(</font><b><font color="#000080">List</font></b><font color="#990000">.</font>map f l<font color="#990000">)</font>
</code></pre>
<p>
<code>Result.all</code> has the type <code>('a, 'b) Core.Std.Result.t list -> ('a list, 'b) Core.Std.Result.t</code></code>
</p>
<p>
The above is simple but could be inefficient. The entire map is preformed regardless of failure and then walked again. If the function being applied is expensive this could be a problem. The other solution is a pretty standard pattern in Ocaml of using an accumulator and reversing it on output. The monadic operator could be replaced by a <code>match</code> in this example, I just prefer the operator.
</p>
<pre><code><b><font color="#0000FF">let</font></b> map f l <font color="#990000">=</font>
<b><font color="#0000FF">let</font></b> <b><font color="#0000FF">rec</font></b> map' f acc <font color="#990000">=</font> <b><font color="#0000FF">function</font></b>
<font color="#990000">|</font> <font color="#990000">[]</font> <font color="#990000">-></font> <font color="#009900">Ok</font> <font color="#990000">(</font><b><font color="#000080">List</font></b><font color="#990000">.</font>rev acc<font color="#990000">)</font>
<font color="#990000">|</font> x<font color="#990000">::</font>xs <font color="#990000">-></font> <b><font color="#0000FF">begin</font></b>
<b><font color="#0000FF">let</font></b> <b><font color="#000080">open</font></b> <b><font color="#000080">Result</font></b><font color="#990000">.</font><font color="#009900">Monad_infix</font> <b><font color="#0000FF">in</font></b>
f x <font color="#990000">>>=</font> <b><font color="#0000FF">fun</font></b> v <font color="#990000">-></font>
map' f <font color="#990000">(</font>v<font color="#990000">::</font>acc<font color="#990000">)</font> xs
<b><font color="#0000FF">end</font></b>
<b><font color="#0000FF">in</font></b>
map' f <font color="#990000">[]</font> l
</code></pre>
<p>
I'm sure someone cleverer in Ocaml probably has a superior solution but this has worked well for me.
</p>
<h3>try/with</h3>
<p>
A lot of exception code looks like the following.
</p>
<pre><code><b><font color="#0000FF">let</font></b> <font color="#990000">()</font> <font color="#990000">=</font>
<b><font color="#0000FF">try</font></b>
thing1 <font color="#990000">();</font>
thing2 <font color="#990000">();</font>
thing3 <font color="#990000">()</font>
<b><font color="#0000FF">with</font></b>
<font color="#990000">|</font> <font color="#009900">Error1</font> <font color="#990000">-></font> handle_error1 <font color="#990000">()</font>
<font color="#990000">|</font> <font color="#009900">Error2</font> <font color="#990000">-></font> handle_error2 <font color="#990000">()</font>
<font color="#990000">|</font> <font color="#009900">Error3</font> <font color="#990000">-></font> handle_error3 <font color="#990000">()</font>
</code></pre>
<p>
The scheme I use would break this into two functions. The one inside the try and the one handling its result. This might sound heavy but the syntax to define a new function in Ocaml is very light. In my experience this hasn't been a problem.
</p>
<pre><code><b><font color="#0000FF">let</font></b> do_things <font color="#990000">()</font> <font color="#990000">=</font>
<b><font color="#0000FF">let</font></b> <b><font color="#000080">open</font></b> <b><font color="#000080">Result</font></b><font color="#990000">.</font><font color="#009900">Monad_infix</font> <b><font color="#0000FF">in</font></b>
thing1 <font color="#990000">()</font> <font color="#990000">>>=</font> <b><font color="#0000FF">fun</font></b> <font color="#990000">()</font> <font color="#990000">-></font>
thing2 <font color="#990000">()</font> <font color="#990000">>>=</font> <b><font color="#0000FF">fun</font></b> <font color="#990000">()</font> <font color="#990000">-></font>
thing3
<b><font color="#0000FF">let</font></b> <font color="#990000">()</font> <font color="#990000">=</font>
<b><font color="#0000FF">match</font></b> do_things <font color="#990000">()</font> <b><font color="#0000FF">with</font></b>
<font color="#990000">|</font> <font color="#009900">Ok</font> _ <font color="#990000">-></font> <font color="#990000">()</font>
<font color="#990000">|</font> <font color="#009900">Error</font> <font color="#009900">Error1</font> <font color="#990000">-></font> handle_error1 <font color="#990000">()</font>
<font color="#990000">|</font> <font color="#009900">Error</font> <font color="#009900">Error2</font> <font color="#990000">-></font> handle_error2 <font color="#990000">()</font>
<font color="#990000">|</font> <font color="#009900">Error</font> <font color="#009900">Error3</font> <font color="#990000">-></font> handle_error3 <font color="#990000">()</font>
</code></pre>
<h1>Conclusion</h1>
<p>
Using return-values instead of exceptions in my Ocaml projects has had nearly the exact output I anticipated. I have compile-time guarantees for handling failure cases and the cost to my code has been minimal. Any difficulties I've run into have had straight forward solutions. In some cases it's simply a matter of thinking about the problems from a new perspective and the solution is clear. I plan on continuing to develop code with these principles and creating larger projects. I believe that this style scales well in larger projects and actually becomes less cumbersome as the project increases since the guarantees can help make it easier to reason about the project.
</p>Unknownnoreply@blogger.com7tag:blogger.com,1999:blog-4348519741358344123.post-91597224978120329602013-01-03T17:55:00.000-05:002013-01-05T07:15:26.270-05:00Introduction to Result.t vs Exceptions in Ocaml<p>
<i>This post uses Jane St's Core suite. Specifically the <code>Result</code> module. It assumes some basic knowledge of Ocaml. Please check out <a href="http://ocaml.org">Ocaml.org</a> for more Ocaml reading material.</i>
</p>
<p>
There are several articles and blog posts out there arguing for or against return values over exceptions. I'll add to the discussion with my reasons for using return values in the place of exceptions in Ocaml.
</p>
<h3>What's the difference?</h3>
<p>
Why does the debate even exist? Because each side has decent arguments for why their preference is superior when it comes to writing reliable software. Pro-return-value developers, for example, argue that their code is easier identify if the code is wrong simply by reading it (if it isn't handling a return value of a function, it's wrong), while exception based code requires understanding all of the functions called to determine if and how they will fail. Pro-exception developers argue that it is much harder to get their program into an undefined state because an exception has to be handled or else the program fails, where in return based code one can simply forget to check a function's return value and the program continues on in an undefined state.
</p>
<p>
I believe that Ocaml has several features that make return values the preferable way to handle errors. Specifically variants, polymorphic variants, exhaustive pattern matching, and a powerful static type system make return values attractive.
</p>
<p>
This debate is only worth your time if you are really passionate about writing software that has fairly strong guarantees about its quality in the face of errors. For a majority of software, it doesn't matter which paradigm you choose. Most errors will be stumbled upon during debugging and fairly soon after going into production or through writing unit and integration tests. But, tests cannot catch everything. And in distributed and concurrent code rare errors can now become common errors and it can be near impossible to reconstruct the conditions that caused it. But in some cases it is possible to make whole classes of errors either impossible or catchable at compile-time with some discipline. Ocaml is at least one language that makes this possible.
</p>
<h3>Checked exceptions</h3>
<p>
A quick aside on checked exceptions, as in Java. Checked exceptions provide some of the functionality I claim is valuable, the main problem with how checked exceptions are implemented in Java (the only language I have any experience in that uses them), is they have a very heavy syntax, to the point where using them can seem too burdensome.
</p>
<h3>The Claim</h3>
<p>
The claim is that if one cares about ensuring they are handling all failure cases in their software, return-values are superior to exceptions because, with the help of a good type system, their handling can be validated at compile-time. Ocaml provides a fairly light, non intrusive, syntax to make this feasible.
</p>
<h3>Good Returns</h3>
<p>
The goal of a good return value based error handling system is to make sure that all errors are handled at compile-time. This is because there is no way to enforce this at run-time, as an exception does. This is a good reason to prefer exceptions in a dynamically typed language like Python or Ruby, your static analyzers are few and far between.
</p>
<p>
In C this is generally accomplished by using a linting tool that will report an error if a function's return value is ignored in a call. This is why you might see <code>printf</code> casted to <code>void</code> in some code, to make it clear the return value is meant to be ignored. But a problem with this solution is that it only enforces that the developer handles the return value, not all possible errors. For example, POSIX functions return a value saying the function failed and put the actual failure in <code>errno</code>. How, then, to enforce that all of the possible failures are handled? Without encoding all of that information in a linting tool, the options in C (and most languages) are pretty weak. Linting tools are also separate from the compiler and vary in quality. Writing code that takes proper advantage of a linting tool, in C, is a skill all of its own as well.
</p>
<h3>Better Returns</h3>
<p>
Ocaml supports exceptions but the compiler provides no guarantees that the exceptions are actually handled anywhere in the code. So what happens if the documentation of a function is incomplete or a dependent function is changed to add a new exception being thrown? The compiler won't help you.
</p>
<p>
But Ocaml's rich type system, combined with some discipline, gives you more power than a C linter. The primary strength is that Ocaml lets you encode information in your types. For example, in POSIX many functions return an integer to indicate error. But an <code>int</code> has no interesting meaning to the compiler other than it holds values between <code>INT_MIN</code> and <code>INT_MAX</code>. In Ocaml, we can instead create a type to represent the errors a function can return and the compiler can enforce that all possible errors are handled in some way thanks to exhaustive pattern matching.
</p>
<h3>An Example</h3>
<p>
What does all of this look like? Below a contrived example. The goal is to provide a function, called <code>parse_person</code> that takes a string and turns it into a <code>person</code> record. The requirements of the code is that if a valid person cannot be parsed out, the part of the string that failed is specified in the error message.
</p>
<p>
Here is a version using exceptions, <a href="https://github.com/orbitz/blog_post_src/blob/master/intro_return_t/ex1.ml">ex1.ml</a>:
</p>
<pre><code><b><font color="#000080">open</font></b> <b><font color="#000080">Core</font></b><font color="#990000">.</font><font color="#009900">Std</font>
<b><font color="#0000FF">exception</font></b> <font color="#009900">Int_of_string</font> <b><font color="#0000FF">of</font></b> <font color="#009900">string</font>
<b><font color="#0000FF">exception</font></b> <font color="#009900">Bad_line</font> <b><font color="#0000FF">of</font></b> <font color="#009900">string</font>
<b><font color="#0000FF">exception</font></b> <font color="#009900">Bad_name</font> <b><font color="#0000FF">of</font></b> <font color="#009900">string</font>
<b><font color="#0000FF">exception</font></b> <font color="#009900">Bad_age</font> <b><font color="#0000FF">of</font></b> <font color="#009900">string</font>
<b><font color="#0000FF">exception</font></b> <font color="#009900">Bad_zip</font> <b><font color="#0000FF">of</font></b> <font color="#009900">string</font>
<b><font color="#0000FF">type</font></b> person <font color="#990000">=</font> <font color="#FF0000">{</font> name <font color="#990000">:</font> <font color="#990000">(</font><font color="#009900">string</font> <font color="#990000">*</font> <font color="#009900">string</font><font color="#990000">)</font>
<font color="#990000">;</font> age <font color="#990000">:</font> <b><font color="#000080">Int</font></b><font color="#990000">.</font>t
<font color="#990000">;</font> zip <font color="#990000">:</font> <font color="#009900">string</font>
<font color="#FF0000">}</font>
<i><font color="#9A1900">(* A little helper function *)</font></i>
<b><font color="#0000FF">let</font></b> int_of_string s <font color="#990000">=</font>
<b><font color="#0000FF">try</font></b>
<b><font color="#000080">Int</font></b><font color="#990000">.</font>of_string s
<b><font color="#0000FF">with</font></b>
<font color="#990000">|</font> <font color="#009900">Failure</font> _ <font color="#990000">-></font>
raise <font color="#990000">(</font><font color="#009900">Int_of_string</font> s<font color="#990000">)</font>
<b><font color="#0000FF">let</font></b> parse_name name <font color="#990000">=</font>
<b><font color="#0000FF">match</font></b> <b><font color="#000080">String</font></b><font color="#990000">.</font>lsplit2 <font color="#990000">~</font>on<font color="#990000">:</font>' ' name <b><font color="#0000FF">with</font></b>
<font color="#990000">|</font> <font color="#009900">Some</font> <font color="#990000">(</font>first_name<font color="#990000">,</font> last_name<font color="#990000">)</font> <font color="#990000">-></font>
<font color="#990000">(</font>first_name<font color="#990000">,</font> last_name<font color="#990000">)</font>
<font color="#990000">|</font> <font color="#009900">None</font> <font color="#990000">-></font>
raise <font color="#990000">(</font><font color="#009900">Bad_name</font> name<font color="#990000">)</font>
<b><font color="#0000FF">let</font></b> parse_age age <font color="#990000">=</font>
<b><font color="#0000FF">try</font></b>
int_of_string age
<b><font color="#0000FF">with</font></b>
<font color="#990000">|</font> <font color="#009900">Int_of_string</font> _ <font color="#990000">-></font>
raise <font color="#990000">(</font><font color="#009900">Bad_age</font> age<font color="#990000">)</font>
<b><font color="#0000FF">let</font></b> parse_zip zip <font color="#990000">=</font>
<b><font color="#0000FF">try</font></b>
ignore <font color="#990000">(</font>int_of_string zip<font color="#990000">);</font>
<b><font color="#0000FF">if</font></b> <b><font color="#000080">String</font></b><font color="#990000">.</font>length zip <font color="#990000">=</font> <font color="#993399">5</font> <b><font color="#0000FF">then</font></b>
zip
<b><font color="#0000FF">else</font></b>
raise <font color="#990000">(</font><font color="#009900">Bad_zip</font> zip<font color="#990000">)</font>
<b><font color="#0000FF">with</font></b>
<font color="#990000">|</font> <font color="#009900">Int_of_string</font> _ <font color="#990000">-></font>
raise <font color="#990000">(</font><font color="#009900">Bad_zip</font> zip<font color="#990000">)</font>
<b><font color="#0000FF">let</font></b> parse_person s <font color="#990000">=</font>
<b><font color="#0000FF">match</font></b> <b><font color="#000080">String</font></b><font color="#990000">.</font>split <font color="#990000">~</font>on<font color="#990000">:</font>'<font color="#990000">\</font>t' s <b><font color="#0000FF">with</font></b>
<font color="#990000">|</font> <font color="#990000">[</font>name<font color="#990000">;</font> age<font color="#990000">;</font> zip<font color="#990000">]</font> <font color="#990000">-></font>
<font color="#FF0000">{</font> name <font color="#990000">=</font> parse_name name
<font color="#990000">;</font> age <font color="#990000">=</font> parse_age age
<font color="#990000">;</font> zip <font color="#990000">=</font> parse_zip zip
<font color="#FF0000">}</font>
<font color="#990000">|</font> _ <font color="#990000">-></font>
raise <font color="#990000">(</font><font color="#009900">Bad_line</font> s<font color="#990000">)</font>
<b><font color="#0000FF">let</font></b> <font color="#990000">()</font> <font color="#990000">=</font>
<i><font color="#9A1900">(* Pretend input came from user *)</font></i>
<b><font color="#0000FF">let</font></b> input <font color="#990000">=</font> <font color="#FF0000">"Joe Mama\t25\t11425"</font> <b><font color="#0000FF">in</font></b>
<b><font color="#0000FF">try</font></b>
<b><font color="#0000FF">let</font></b> person <font color="#990000">=</font> parse_person input <b><font color="#0000FF">in</font></b>
printf <font color="#FF0000">"Name: %s %s\nAge: %d\nZip: %s\n"</font>
<font color="#990000">(</font>fst person<font color="#990000">.</font>name<font color="#990000">)</font>
<font color="#990000">(</font>snd person<font color="#990000">.</font>name<font color="#990000">)</font>
person<font color="#990000">.</font>age
person<font color="#990000">.</font>zip
<b><font color="#0000FF">with</font></b>
<font color="#990000">|</font> <font color="#009900">Bad_line</font> l <font color="#990000">-></font>
printf <font color="#FF0000">"Bad line: '%s'\n"</font> l
<font color="#990000">|</font> <font color="#009900">Bad_name</font> name <font color="#990000">-></font>
printf <font color="#FF0000">"Bad name: '%s'\n"</font> name
<font color="#990000">|</font> <font color="#009900">Bad_age</font> age <font color="#990000">-></font>
printf <font color="#FF0000">"Bad age: '%s'\n"</font> age
<font color="#990000">|</font> <font color="#009900">Bad_zip</font> zip <font color="#990000">-></font>
printf <font color="#FF0000">"Bad zip: '%s'\n"</font> zip
</code></pre>
<p>
<a href="https://github.com/orbitz/blog_post_src/blob/master/intro_return_t/ex2.ml">ex2.ml</a> is a basic translation of the above but using variants. The benefit is that the type system will ensure that all failure case are handled. The problem is the code is painful to read and modify. Every function that can fail has its own variant type to represent success and error. Composing the functions is painful since every thing returns a different type. We have to create a type that can represent all of the failures the other functions returned. It would be nice if each function could return an error and we could use that value instead. It would also be nice if everything read as a series of steps, rather than pattern matching on a tuple which makes it hard to read.
</p>
<p>
<a href="https://github.com/orbitz/blog_post_src/blob/master/intro_return_t/ex3.ml">ex3.ml</a> introduces Core's <code>Result.t</code> type. The useful addition is that we only need to define a type for <code>parse_person</code>. Every other function only has one error condition so we can just encode the error in the <code>Error</code> variant. This is still hard to read, though. The helper functions aren't so bad but the main function is still painful.
</p>
<p>
While the previous solutions have solved the problem of ensuring that all errors are handled, they introduced the problem of being painful to develop with. The main problem is that nothing composes. The helpers have their own error types and for every call to them we have to check their return and then encompass their error in any function above it. What would be nice is if the compiler could automatically union all of the error codes we want to return from itself and any function it called. Enter polymorphic variants.
</p>
<p>
<a href="https://github.com/orbitz/blog_post_src/blob/master/intro_return_t/ex4.ml">ex4.ml</a> Shows the version with polymorphic variants. The nice bit of refactoring we were able to do is in <code>parse_person</code>. Rather than an ugly match, the calls to the helper functions can be sequenced:
</p>
<pre><code><b><font color="#0000FF">let</font></b> parse_person s <font color="#990000">=</font>
<b><font color="#0000FF">match</font></b> <b><font color="#000080">String</font></b><font color="#990000">.</font>split <font color="#990000">~</font>on<font color="#990000">:</font>'<font color="#990000">\</font>t' s <b><font color="#0000FF">with</font></b>
<font color="#990000">|</font> <font color="#990000">[</font>name<font color="#990000">;</font> age<font color="#990000">;</font> zip<font color="#990000">]</font> <font color="#990000">-></font>
<b><font color="#0000FF">let</font></b> <b><font color="#000080">open</font></b> <b><font color="#000080">Result</font></b><font color="#990000">.</font><font color="#009900">Monad_infix</font> <b><font color="#0000FF">in</font></b>
parse_name name <font color="#990000">>>=</font> <b><font color="#0000FF">fun</font></b> name <font color="#990000">-></font>
parse_age age <font color="#990000">>>=</font> <b><font color="#0000FF">fun</font></b> age <font color="#990000">-></font>
parse_zip zip <font color="#990000">>>=</font> <b><font color="#0000FF">fun</font></b> zip <font color="#990000">-></font>
<font color="#009900">Ok</font> <font color="#FF0000">{</font> name<font color="#990000">;</font> age<font color="#990000">;</font> zip <font color="#FF0000">}</font>
<font color="#990000">|</font> _ <font color="#990000">-></font>
<font color="#009900">Error</font> <font color="#990000">(</font>`<font color="#009900">Bad_line</font> s<font color="#990000">)</font>
</code></pre>
<p>
Don't worry about the monad syntax, it's really just to avoid the nesting to make the sequencing easier on the eyes. Except for the <code>>>=</code>, this looks a lot like code using exceptions. There is a nice linear flow and only the success path is shown. But! The compiler will ensure that all failures are handled.
</p>
<p>
The final version of the code is <a href="https://github.com/orbitz/blog_post_src/blob/master/intro_return_t/ex5.ml">ex5.ml</a>. This takes ex4 and rewrites portions of it to be prettier. As a disclaimer, I'm sure someone else would consider writing this differently even with the same restrictions I put on it, I might even write it different on a different day, but this version of the code demonstrates the points I am making.
</p>
<p>
A few points of comparison between ex1 and ex5:
</p>
<p>
<ul>
<li>The body of <code>parse_person</code> is definitely simpler and easier to read in the exception code. It is short and concise.</li>
<li>The rest of the helper functions are a bit of a toss-up between the exception and return-value code. I think one could argue either direction.</li>
<li>The return-value code has fulfilled my requirements in terms of handling failures. The compiler will complain if any failure <code>parse_person</code> could return is not handled. If I add another error type the code will not compile. It also fulfilled the requirements without bloating the code. The return-value code and exception code are roughly the same number of lines. Their flows are roughly equal. But the return-value code is much safer.</li>
</ul>
</p>
<h3>Two Points</h3>
<p>
It's not all sunshine and lollipops. There are two issues to consider:
</p>
<p>
<ul>
<li><b>Performance</b> - Exceptions in Ocaml are really, really, fast. Like any performance issue, I suggest altering code only when needed based on measurements and encapsulating those changes as well as possible. This also means if you want to provide a safe and an exception version of a function, you should probably implement the safe version in terms of the exception verson.</li>
<li><b>Discipline</b> - I referred to discipline a few times above. This whole scheme is very easy to mess up with a single mistake: pattern matching on anything (<code>_</code>). The power of exhaustive pattern matching means you need to match on every error individually. This is effectively for the same reason catching the exception base class in other languages is such a bad idea, you lose a lot of information.</li>
</ul>
</p>
<h3>Conclusion</h3>
<p>
The example given demonstrates an important point: code can become much safer at compile time without detriment to its length or readability. The cost is low and the benefit is high. This is a strong reason to prefer a return-value based solution over exceptions in Ocaml.
</p>
Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-4348519741358344123.post-12969460690566896402012-08-03T12:53:00.000-04:002012-08-04T06:53:03.041-04:00C++11 is unsafe<p>
With all due respect for Mr. Sutter, his claim that C++11 is <a href="http://herbsutter.com/2012/03/29/interview-ca-language-for-modern-times/">"as clean and safe as any other modern language, and still the king of fast"</a>, is simply false. Clean and fast are up to the particular developer and benchmark but safe is more objective, and C++11 is not safe.
</p>
<ol>
<li>It is important for C++ to maintain backwards compatibility with previous versions, so C++11 supports C++03. C++03 is not safe. So, by definition, C++11 is also not safe. For those who are new to the term, safety generally comes down to how bad my program can screw up. Even a wrong Java application will not segfault the VM. C++11 adds some tools to make causing this behavior harder, but it is by no means impossible. C++11 still has pointers. It still has pointer arithmetic.</li>
<li>"But", you say, "we are talking about just the features C++11 added to the language". Still false. Take <code>std::array</code>, which was added in C++11. This code is unsafe: <code>std::array<int 1> arr; arr[2] = 1;</code>. And consider a lambda that captures a reference to a variable that goes out of scope. Perfectly valid C++11. Perfectly unsafe.</li>
<li>Threading in C++11 also allows you to do unsafe things. Just try modifying two non-atomic variables concurrently. You have no guarantees of what will happen.</li>
<li>"BUT", you yell, "he said 'modern language' so...." Indeed, so? I'm not sure what Herb Sutter considers a modern language but let's just take some languages that are somewhat popular today:
<ul>
<li>Java, C# - Considered safe.</li>
<li>Clojure, Scala - On the JVM, so one would consider them safe.</li>
<li>Python, Ruby, Perl - These are all considered safe languages. You cannot, without effort, access memory you should not.</li>
<li>Ada - Hah!</li>
<li>F# - Runs on .Net, safe.</li>
<li>Ocaml, Haskell - The languages themselves are considered safe but you can do whatever you want if you drop down to C.</li>
</ul>
So which languages, exactly, is Mr. Sutter referring to when he says C++11 is as safe as any modern language? I have no idea.
</li>
</ol>
<p>
The problem, though, is that it's OK to be honest about C++'s lack of safety. That is the compromise I am agreeing to when I use C++. I want the benefits of it and I understand that I am making a sacrifice. I don't want to be told C++ is something that it is not. This "C++11 is safe" talk is nonsense and not a reflection of reality.
</p>
<p>
You can follow the discussion on reddit: <a href="http://www.reddit.com/r/programming/comments/xml97/c11_is_unsafe/">http://www.reddit.com/r/programming/comments/xml97/c11_is_unsafe/</a>
</p>Unknownnoreply@blogger.com3tag:blogger.com,1999:blog-4348519741358344123.post-53602810626723167322012-07-09T03:33:00.000-04:002012-07-09T03:38:31.738-04:00The Erl Next Door<p>
The Erl Next Door, also known as TEND, is a project created for SpawnFest2012 by MononcQc and I. The hope of TEND is to make playing with Erlang easier.
</p>
<p>
Ever wanted to show someone your cool Erlang hack or teach them a something about Erlang, but getting their project setup was too complicated? With TEND, you can now provide them a URL that will setup all of the dependencies.
</p>
<a name='more'></a>
<p>
TEND takes three kinds of URLs:
<ol>
<li>An HTML document. The document can link to other document types through LINK or A tags but setting REL to "erlang-tend".</li>
<li>A raw .erl file.</li>
<li>A zip of an OTP application. This will be compiled for you, assuming it has a Makefile, rebar, or Emakfile. You can link to the ZIP link in Github.</li>
</ol>
</p>
<p>
The project page has all of the details, but the basic idea is in the shell you can run <code>tend:load</code> with a URL and everything will be loaded. In fact, this blog post can be loaded with TEND. This post links to the <code>calc</code> example from LYSE. Once it's loaded you can do <code>calc:rpn("10 10 + 2 /"). </code>. Just do:
</p>
<p>
<code><pre>tend:load("http://functional-orbitz.blogspot.se/2012/07/erl-next-door.html").</pre></code>
</p>
<p>
The official github repo is here: <a href="https://github.com/ferd/tend">https://github.com/ferd/tend</a>
</p>
<p>
We have also made a small demonstration site: <a href="http://ferd.ca/tend/">http://ferd.ca/tend/</a>
</p>
<p>
The SpawnFest2012 repo: <a href="https://github.com/Spawnfest2012/tend">https://github.com/Spawnfest2012/tend</a>
</p>
<a rel="erlang-tend" href="http://learnyousomeerlang.com/static/erlang/calc.erl"></a>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4348519741358344123.post-2647995968673983502012-07-06T02:43:00.001-04:002012-07-06T02:43:40.315-04:00So Pythonistas, you want to get rid of the GIL...<p>
<a href="http://www.reddit.com/r/programming/comments/aqq8q/presentation_on_the_new_python_gil/c0ix071">There</a> is <a href="http://jacobian.org/writing/hate-python/">no</a> <a href="http://teddziuba.com/post/26426290981/python-3s-marketing-problem">shortage</a> of <a href="http://blog.bitsrc.org/2009/05/python-threading-and-gil.html">hate</a> for the <a href="http://www.grouplens.org/node/244">GIL</a>. There is a slight problem, though. The GIL might be the cause of Python's single-core-only utilization, but it's not the root reason.
</p>
<a name='more'></a>
<p>
The origin of the GIL is to keep the interpreter internals sane when running with multiple kernel threads. When it comes to problems involving parallelism, the easy solution is simple: serialize everything. So you get the GIL. Now, if that were the end of it, getting rid of the GIL would probably not that challenging. But, for better or for worse, the GIL also made a number of operations atomic that would not be in other languages. The Python FAQ has this <a href="http://docs.python.org/faq/library#what-kinds-of-global-value-mutation-are-thread-safe">example</a>. Python programmers made use of these benefits in CPython, regardless of if the language designers actually guaranteed them. But at this point, it doesn't matter. The amount of code that depends on this behavior is large.
</p>
<p>
Greg Stein attempted to remove the GIL in Python 1.5, but programs ran about 2x slower than with the GIL. The reason being: in order to give people the guarantees they have grown accustom to in the previous paragraph, you need to do the locking on those operations for them. Where there was one a single lock, you now have a lock per object. And it is difficult to determine if an object will be accessed by multiple threads so the naive solution is to lock the object every time it's accessed in a way that needs to be atomic. This kind of fine-grained locking is expensive.
</p>
<p>
So this attempt didn't work. And it wasn't really a big deal. Multicore CPUs weren't that ubiquitous and people weren't doing things that would benefit that much from multiple cores. But now multicore is the rage and people believe that their Python programs will benefit from it. The common advice is simply to use <code>multiprocessing</code>, but people tend to find this inadequate.
</p>
<p>
PyPy is trying to solve this using STM, and blog their <a href="http://morepypy.blogspot.se/2012/06/stm-with-threads.html">progress</a>. PyPy doesn't seem to be a solution for a lot of people yet and it's unclear how successful the STM approach will be.
</p>
<p>
If you really think multicore support is important to Python, then you don't want to pitch a fit about the GIL. What you want to do is convince the Python designers that you are OK with giving up those guarantees you have been taking advantage of over the years. You will rewrite your code to not make use of the guarantees. Then they can get rid of the fine-grained locking.
</p>
<p>
But... before you say "sure", make sure you know what you're getting into. If <code>L1.pop()</code> is no longer thread-safe, then what does it mean if two threads access <code>L1</code> in parallel? I don't know much about threaded memory models but it could get pretty complicated. You might not be able to define all states of the program at that point.
</p>
<p>
In the future, before you pour too much hate on the GIL, remember: really just a symptom. The actual problem is, for simplicity, Python makes a number of guarantees that make executing performant code harder without the GIL than with it. And also, not everyone hates the GIL, some people are fond of the guarantees. Like this <a href="http://aboutsimon.com/2012/03/09/python-global-interpreter-lock-gil/">guy</a>.
</p>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-4348519741358344123.post-60261556678461311702012-06-06T07:17:00.000-04:002012-07-05T16:31:23.005-04:00My Mental Evolution In Making A Language<p>
Over the past year I've been thinking about how to make my own language. It's a pretty big undertaking and I've been too busy with other projects to put any serious effort into it, so I just think about it when I go for a walk. Mostly I try to convince myself to not make a language but it sounds like a lot of fun. With the explosion of JavaScript, and so few people wanting to write JavaScript, it seems like other people think writing a language sounds like fun too. We have <a href="http://coffeescript.org/">CoffeeScript</a>, <a href="http://maxtaco.github.com/coffee-script/">IcedCoffeeScript</a>, <a href="http://roy.brianmckenna.org/">Roy</a>, <a href="http://jsx.github.com/">JSX</a>, <a href="http://amber-lang.net/">Amber</a>, <a href="http://www.dartlang.org/">Dart</a>, and many <a href="http://altjs.org/">others</a>. And that's just a list of recent languages that compile to JavaScript. Many more languages have come out, relatively recently, such as <a href="http://golang.org/">Go</a>, <a href="http://www.fancy-lang.org/">Fancy</a>, <a href="http://elixir-lang.org/">Elixir</a> and <a href="http://looplang.org/">Loop</a>. Most of these languages will be minor dents in the history of programming languages, but that is OK. Not everything has to be important to be worth doing. But they have gotten me thinking. Should I create a language? What would it have to offer? Is the effort worth it? Below are thoughts that have steered my decision process for when I get some time to start hacking on a language. The thoughts are targeted at me, someone who has little experience building a language, not a professional.
</p>
<a name='more'></a>
<h1>Why?</h1>
<p>
I think there are three reasons I should consider making a language:
<ul>
<li><b>Just for fun</b> - Every hobby project should be fun! If I built a language just for fun, though, I think I would prefer to implement someone else's language. Depending on what language I chose, I would learn a lot about language design without having to make a lot of mistakes myself, I could learn from others mistakes. I would also have other implementations of it to compare to my own.</li>
<li><b>Experiment with semantics</b> - This is the biggest reason for me. I have some unoriginal semantic ideas I'd like to understand better and I think creating them is the best way to go about that.</li>
<li><b>Experiment with syntax</b> - C'mon now, there is no reason to experiment with syntax any more. God already gave <a href="http://en.wikipedia.org/wiki/Standard_ml">it</a> to us. But seriously, I find this the least compelling reason. I haven't seen recent language that does something with syntax I consider all that important. I think it will take a very clever person to change syntax enough to really matter. Removing semicolons doesn't really matter much to me. I can represent the semantics I'm interested in just fine in an existing syntax.</li>
</ul>
</p>
<h1>Can I stand on the shoulders of giants?</h1>
<p>
What languages already exist that have most of the semantics I care about? That way I can just extend it with the new semantics I care about. The clear upside is, even if the community for that language is small, they already know it so they just have to learn the extensions I added. It makes evaluating the new ideas easier. There are also a lot of language options that I'm likely to mess up or just not be interested in. I might as well go with whatever the professionals went with unless I have a strong opinion. <a href="https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/ObjectiveC/Introduction/introObjectiveC.html">Objective C</a>, <a href="http://www.ps.uni-saarland.de/alice/">AliceML</a> and <a href="https://live.gnome.org/Vala">Vala</a> are examples of what I mean. In each case, the languages either took an existing language and extended it or was heavily inspired by an existing language.
</p>
<p>
The other side of this is deciding what backend to use. Should I build an interpreter or should I target a VM, like the JVM? Maybe JavaScript? Or should I target making native binaries? Maybe produce C. Or just write LLVM-IR myself? Or build the optimizer and backend myself? It depends on the real reason I want to make the language. Do I want people to use it? Or do I just want learn the entire stack on a compiler? If I implement an existing language, for example SML, maybe the twist I could add is having it target LLVM-IR. Just because I'd be implementing someone else's language doesn't mean there isn't any room for some innovation.
</p>
<p>
Disirregardlessly, a lot of smart people have thought very hard about building languages. I should take knowledge from them whenever possible (so basically, always). If I think I'm being original in an idea, I'm probably not. <a href="http://en.wikipedia.org/wiki/ALGOL">ALGOL</a> probably has it...it always does. And I do not mean that in a defeatist way, new combinations of old ideas is progress. What I mean is a lot of these ideas have been thought out already, we don't know about them because they failed, and as a creator I should be aware of that.
</p>
<h1>Sometimes it's better to stay silent</h1>
<p>
Anders Hejlsberg was <a href="http://www.artima.com/intv/handcuffs.html">interviewed</a> about the lack of checked exceptions in C# and said:
<blockquote>I'm a strong believer that if you don't have anything right to say, or anything that moves the art forward, then you'd better just be completely silent and neutral, as opposed to trying to lay out a framework.</blockquote>
I like this idea. Language design is often about trade-offs. If I am going to introduce a new concept because I don't like an old one, I better be sure it doesn't just swap one set of problems for another. At the very least, I've created a new complexity for people who want to evaluate my language to learn, and if it doesn't move the language forward then there isn't any benefit to the new complexity.
</p>
<h1>Keep It Simple</h1>
<p>
I think we should all take a lesson from <a href="http://en.wikipedia.org/wiki/Niklaus_Wirth">Niklaus Wirth</a>. When he was designing <a href="http://en.wikipedia.org/wiki/Oberon_programming_language">Oberon</a> he took Modula-2 and greatly simplified it. The syntax is minimal and the language definition is tiny. Maybe it's too small, I don't know. But the language is very easy to think about and implement because of it's minimalism. Complexity is a burden. It's really hard to avoid complexity too, just look at the <a href="http://zero.milosz.ca/">results</a> of some trivial operators in JavaScript. JavaScript is simple in many ways but those ways interact to make complex results. These little, harmless at first, interactions cause painful bugs later on. On top of that, too much complexity makes for a more cumbersome implementation, which makes for taking longer to create. A simple language can get me something to play with sooner.
</p>
<h1>REPL</h1>
<p>
I don't actually mean "should my language have a REPL" here (one would be nice), but I mean every few weeks I take this list of questions and integrate them with what I've learned and evaluate if my previous conclusions still apply. Since I'm so new to designing a language the results tend to change. I think anyone interested in making their own language should do this. I've come up with ideas I thought were new and cool only to do some research and find out their are silly and absurd. Unfortunately, that's how I feel when I see a lot of these new toy languages being created. With a little bit of research the author could have made something much more impressive and easier to use. But then, I also feel a similar way about C++. I think a lot of people believe you can just cobble together a language and you'll get something great. It's hard to design a good, consistent, simple, extendable language. Language design is a profession and should be respected as such.
</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4348519741358344123.post-63626695454773267332012-05-17T15:55:00.000-04:002012-07-05T16:24:16.143-04:00Phantom type examples in Ocaml<h1>
What are phantom types
</h1>
<p>
Phantom types are a way for multiple types to have the same exact underlying representation but still be seen as a distinct type by the compiler. The common example is units of measurements. Feet and meters are both distances and are best represented as an <code>int</code> or <code>float</code>, but it makes no sense to add a foot to a meter. Phantom types allow you to solve this and still represent a foot and a meter exactly the same. One can think of this as the opposite of duck typing. In duck typing, if you only care if the current value you are working with has four legs, a cat a dog and a table are the effectively the same thing. In phantom types you care about what something actually represents and that means they are different, even though underneath they are represented with the same type.
</p>
<a name='more'></a>
<p>
The trick to phantom types is the module system. In Ocaml you have a module and that module defines the interface it exposes to the world. For phantom types part of that interface is providing a type, for example <code>rstring</code>, but not giving a concrete definition of it, leaving it as an abstract concept. Then the module definition is explicit that <code>rstring</code> is actually a <code>string</code> and it can use a type of <code>rstring</code> just like a type of <code>string</code>. So outside the module the compiler just knows some type exists by a name but no more, and inside the module the compiler knows that the same type is actually a <code>string</code> (for example) and all the string APIs are valid on it.
</p>
<p>
The repo for the examples is located <a href="https://github.com/orbitz/phantom_types">here</a>.
</p>
<h1>
Examples
</h1>
<h2>
estring - Encoded strings
</h2>
<p>
This code is my paraphrasing of Martin Jambon's example of phantom typing located <a href="http://www.quora.com/Martin-Jambon/answers/Functional-Programming">here</a>.
</p>
<p>
When working with strings you might want to make sure you don't mix and match strings of incompatible encodings. You probably don't want to append a utf8 string to a utf16 string. But underneath it all, both of these can be represented the same way and a bunch of operations are going to be the same. Concatenating two utf8 strings is the same operation as concatenating two utf16 strings. The <code>estring</code> module lets you express this. For example, we can write code like this:
</p>
<p>
<code>
<pre><tt><b><font color="#0000FF">let</font></b> <font color="#990000">()</font> <font color="#990000">=</font>
<b><font color="#0000FF">let</font></b> s1 <font color="#990000">=</font> <b><font color="#000080">Estring</font></b><font color="#990000">.</font>make_utf8 <font color="#FF0000">"hello"</font> <b><font color="#0000FF">in</font></b>
<b><font color="#0000FF">let</font></b> s2 <font color="#990000">=</font> <b><font color="#000080">Estring</font></b><font color="#990000">.</font>make_utf8 <font color="#FF0000">"world"</font> <b><font color="#0000FF">in</font></b>
<b><font color="#0000FF">let</font></b> s3 <font color="#990000">=</font> <b><font color="#000080">Estring</font></b><font color="#990000">.</font>concat <font color="#990000">(</font><b><font color="#000080">Estring</font></b><font color="#990000">.</font>make_utf8 <font color="#FF0000">" "</font><font color="#990000">)</font> <font color="#990000">[</font>s1<font color="#990000">;</font> s2<font color="#990000">]</font> <b><font color="#0000FF">in</font></b>
<b><font color="#000080">Printf</font></b><font color="#990000">.</font>printf <font color="#FF0000">"%s\n"</font> <font color="#990000">(</font><b><font color="#000080">Estring</font></b><font color="#990000">.</font>str s3<font color="#990000">)</font>
</tt>
</pre></code>
</p>
<p>
This says that we want to make two utf8 strings from a regular string, and then concatenate them together with <code>" "</code> as the separator. Then we convert that back to a string for output. This works because everything is the same <code>estring</code> type. The following code will not compile because things aren't the same type:
</p>
<p>
<code><pre><tt><b><font color="#0000FF">let</font></b> <font color="#990000">()</font> <font color="#990000">=</font>
<b><font color="#0000FF">let</font></b> s1 <font color="#990000">=</font> <b><font color="#000080">Estring</font></b><font color="#990000">.</font>make_utf8 <font color="#FF0000">"hello"</font> <b><font color="#0000FF">in</font></b>
<b><font color="#0000FF">let</font></b> s2 <font color="#990000">=</font> <b><font color="#000080">Estring</font></b><font color="#990000">.</font>make_utf16 <font color="#FF0000">"world"</font> <b><font color="#0000FF">in</font></b>
<b><font color="#0000FF">let</font></b> s3 <font color="#990000">=</font> <b><font color="#000080">Estring</font></b><font color="#990000">.</font>concat <font color="#990000">(</font><b><font color="#000080">Estring</font></b><font color="#990000">.</font>make_utf8 <font color="#FF0000">" "</font><font color="#990000">)</font> <font color="#990000">[</font>s1<font color="#990000">;</font> s2<font color="#990000">]</font> <b><font color="#0000FF">in</font></b>
<b><font color="#000080">Printf</font></b><font color="#990000">.</font>printf <font color="#FF0000">"%s\n"</font> <font color="#990000">(</font><b><font color="#000080">Estring</font></b><font color="#990000">.</font>str s3<font color="#990000">)</font>
</tt></pre></code>
</p>
<p>
One weakness of <code>estring</code> is that it is not extendible. Remember, it is only inside the <code>estring</code> module that the compiler knows it is a <code>string</code>. That means other modules cannot define a <code>latin1</code> encoding type and use the <code>estring</code> module to create a <code>latin1 estring</code> type.
</p>
<h2>
rstring - Read only string
</h2>
<p>
Another use of phantom types is to provide read-only interfaces to mutable types. With <code>rstring</code> we provide a phantom type called <code>rstring</code> and then define a subset of the string API that we want to expose. The major downside to <code>rstring</code> is that it copies the string when you turn a <code>string</code> to an <code>rstring</code> and vice versa. This is for obvious reasons: you can't ensure that nobody else has a reference to the <code>string</code> unless you know you made the version you're referencing. Maybe something like <a href="http://en.wikipedia.org/wiki/Linear_types">Linear types</a> would work well here. In a real system I would consider offering an <code>unsafe_make</code> and an <code>unsafe_str</code> set of functions that don't copy. That way you could avoid the copy if that is important (it probably isn't) but at least you know you're polluting your code with <code>unsafe</code>'s. Code for <code>rstring</code> is pretty straight forward, usage looks like:
</p>
<p>
<code><pre><tt><b><font color="#0000FF">let</font></b> <font color="#990000">()</font> <font color="#990000">=</font>
<b><font color="#0000FF">let</font></b> s1 <font color="#990000">=</font> <b><font color="#000080">Rstring</font></b><font color="#990000">.</font>make <font color="#FF0000">"hello"</font> <b><font color="#0000FF">in</font></b>
<b><font color="#0000FF">let</font></b> s2 <font color="#990000">=</font> <b><font color="#000080">Rstring</font></b><font color="#990000">.</font>make <font color="#FF0000">"world"</font> <b><font color="#0000FF">in</font></b>
<b><font color="#0000FF">let</font></b> s3 <font color="#990000">=</font> <b><font color="#000080">Rstring</font></b><font color="#990000">.</font>concat <font color="#FF0000">" "</font> <font color="#990000">[</font>s1<font color="#990000">;</font> s2<font color="#990000">]</font> <b><font color="#0000FF">in</font></b>
<b><font color="#000080">Printf</font></b><font color="#990000">.</font>printf <font color="#FF0000">"%s\n"</font> <font color="#990000">(</font><b><font color="#000080">Rstring</font></b><font color="#990000">.</font>str s3<font color="#990000">)</font>
</tt></pre></code>
</p>
<h2>
Other ideas
</h2>
<p>
Another common example of using phantom types that I did not provide is something like a file or socket interface. You can encode information, such as if you opened the file for reading or writing or both and then make it so it is a compile-time error if you try to write to a file opened for reading.
</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4348519741358344123.post-44935970708427076352011-10-29T12:48:00.000-04:002012-07-05T16:32:47.173-04:00What is Covariance and Contravariance Anyways?<p>If you have been following the talk around Google's latest language, <a href="http://www.dartlang.org/">Dart</a>, then you might have heard things like "covariant arrays are widely regarded as a mistake"
(<a href="http://www.reddit.com/r/programming/comments/lkpil/wrapping_my_head_around_optional_typing/c2tixy6">src</a>). But what is covariance? Why are covariant arrays considered a mistake? Who has covariant arrays? Why?
</p>
<a name='more'></a>
<h3>What is Covariance and Contravariance All About?</h3>
<p>
<pre><code><b><font color="#0000FF">class</font></b> X<font color="#990000">:</font>
<b><font color="#0000FF">pass</font></b>
<b><font color="#0000FF">class</font></b> <b><font color="#000000">Y</font></b><font color="#990000">(</font>X<font color="#990000">):</font>
<b><font color="#0000FF">def</font></b> <b><font color="#000000">zoom</font></b><font color="#990000">(</font>self<font color="#990000">):</font>
<b><font color="#0000FF">print</font></b> <font color="#FF0000">'Y.zoom'</font>
<b><font color="#0000FF">class</font></b> A<font color="#990000">:</font>
<i><font color="#9A1900"># Returns a value of type Y</font></i>
<b><font color="#0000FF">def</font></b> <b><font color="#000000">foo</font></b><font color="#990000">(</font>self<font color="#990000">):</font>
<b><font color="#0000FF">return</font></b> <b><font color="#000000">Y</font></b><font color="#990000">()</font>
<b><font color="#0000FF">class</font></b> <b><font color="#000000">B</font></b><font color="#990000">(</font>A<font color="#990000">):</font>
<i><font color="#9A1900"># Returns a value of type X</font></i>
<b><font color="#0000FF">def</font></b> <b><font color="#000000">foo</font></b><font color="#990000">(</font>self<font color="#990000">):</font>
<b><font color="#0000FF">return</font></b> <b><font color="#000000">X</font></b><font color="#990000">()</font>
<i><font color="#9A1900"># Takes a value of type A</font></i>
<b><font color="#0000FF">def</font></b> <b><font color="#000000">bar</font></b><font color="#990000">(</font>a<font color="#990000">):</font>
y <font color="#990000">=</font> a<font color="#990000">.</font><b><font color="#000000">foo</font></b><font color="#990000">()</font>
y<font color="#990000">.</font><b><font color="#000000">zoom</font></b><font color="#990000">()</font>
<i><font color="#9A1900"># Call bar with a B</font></i>
<b><font color="#000000">bar</font></b><font color="#990000">(</font><b><font color="#000000">B</font></b><font color="#990000">())</font>
</code></pre>
Is this code safe? Often, in an OO language, inheritance is used to create a type that looks just like another type, and pass values of this new type to functions that expect the other type and without breaking anything. So one might expect, looking at this code, that passing a <code>B</code> to <code>bar</code> would be safe. But it isn't. The problem is <code>B.foo</code> returns an <code>X</code> where <code>A.foo</code> returns a <code>Y</code>. <code>bar</code> then goes on to try to call a member function that only exists in <code>Y</code>, so the code will fail at runtime. <code>B</code> has violated the interface that <code>A</code> established. Python doesn't give us any tools to discover this error without executing the code, but statically typed languages like C++ and Java do. These languages can define the types that member functions of a subclass can take, and return, in order to be valid through covariant and contravariant properties. The concepts are larger than member functions but it's a good place to start.
</p>
<h3>Definitions</h3>
<p>
Before we can talk about covariance and contravariance we need to know what subtyping is. I'll restrict this to classes in an object-orientated language because that's what my knowledge is limited to, but these concepts go beyond that. An example of a subtype in an OO language would be a class that extends or inherits from another class, for example in C++ you would do <code>class S : public T</code> for <code>S</code> to extend <code>T</code>. The common way type theorists write this is <code>S <: T</code>. This can be read in a few ways: "any term of type <code>S</code> can safely be used in a context where a term of type <code>T</code> is expected" or "every value described by <code>S</code> is also described by <code>T</code>" (<a href="http://www.amazon.com/Types-Programming-Languages-Benjamin-Pierce/dp/0262162091/ref=sr_1_1?ie=UTF8&qid=1319515999&sr=8-1">Pierce</a>, 182). Thus, <code>S</code> is a more specific type than <code>T</code>, and conversely <code>T</code> is a more generic type than <code>S</code>. Given this, covariance and contravariance define when a more specific or more generic type is acceptable in a particular context.
</p>
<p>
The definition from the wiki <a href="http://en.wikipedia.org/wiki/Covariance_and_contravariance_(computer_science)#Overview_of_covariance.2Fcontravariance_in_some_programming_languages"/>article</a>:
<blockquote>covariance and contravariance refers to the ordering of types from narrower to wider and their interchangeability or equivalence in certain situations</blockquote>
And their definitions:
<ul>
<li><b>Covariant</b> - Given the types <code>S</code> and <code>T</code> specified above, covariance is when a more specific type, <code>S</code>, can be used when a more generic type, <code>T</code>, is specified. This applies to functions, a function that returns <code>S</code> can be used in the same context as a function that returns <code>T</code>.</li>
<li><b>Contravariant</b> - When the more generic type, <code>T</code>, can be used where the more specific type, <code>S</code>, is specified. A function that takes a <code>T</code> can be used in the same context as a function that takes a <code>S</code>.</li>
<li><b>Invariant</b> - The type specified is the only type that can be used.</li>
</ul>
My method for remembering the distinction is contravariant is larger than covariant, so it means narrower to wider.
</p>
<h3>Covariant Example</h3>
<p>
Consider this example from C++:
<pre><code><b><font color="#0000FF">class</font></b> <font color="#008080">X</font> <font color="#FF0000">{}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">Y</font> <font color="#990000">:</font> <b><font color="#0000FF">public</font></b> X <font color="#FF0000">{}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">Z</font> <font color="#990000">:</font> <b><font color="#0000FF">public</font></b> Y <font color="#FF0000">{}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">A</font> <font color="#FF0000">{</font>
<b><font color="#0000FF">public</font></b><font color="#990000">:</font>
<b><font color="#0000FF">virtual</font></b> <font color="#008080">Y</font> <font color="#990000">*</font><b><font color="#000000">foo</font></b><font color="#990000">()</font> <font color="#FF0000">{</font> <b><font color="#0000FF">return</font></b> <b><font color="#0000FF">new</font></b> <b><font color="#000000">Y</font></b><font color="#990000">();</font> <font color="#FF0000">}</font>
<font color="#FF0000">}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">B</font> <font color="#990000">:</font> <b><font color="#0000FF">public</font></b> A <font color="#FF0000">{</font>
<b><font color="#0000FF">public</font></b><font color="#990000">:</font>
<b><font color="#0000FF">virtual</font></b> <font color="#008080">Z</font> <font color="#990000">*</font><b><font color="#000000">foo</font></b><font color="#990000">()</font> <font color="#FF0000">{</font> <b><font color="#0000FF">return</font></b> <b><font color="#0000FF">new</font></b> <b><font color="#000000">Z</font></b><font color="#990000">();</font> <font color="#FF0000">}</font>
<font color="#FF0000">}</font><font color="#990000">;</font>
</code></pre>
Here we have three classes <code>X</code>, <code>Y</code>, and <code>Z</code> which we will return from a virtual function in classes <code>A</code> and <code>B</code>. This code is valid because <code>B::foo</code> is returning a narrower type than <code>A::foo</code> because <code>Z</code> is a subtype of <code>Y</code>. But what happens if we make <code>B::foo</code> return a wider type?
<pre><code><b><font color="#0000FF">class</font></b> <font color="#008080">X</font> <font color="#FF0000">{}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">Y</font> <font color="#990000">:</font> <b><font color="#0000FF">public</font></b> X <font color="#FF0000">{}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">Z</font> <font color="#990000">:</font> <b><font color="#0000FF">public</font></b> Y <font color="#FF0000">{}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">A</font> <font color="#FF0000">{</font>
<b><font color="#0000FF">public</font></b><font color="#990000">:</font>
<b><font color="#0000FF">virtual</font></b> <font color="#008080">Y</font> <font color="#990000">*</font><b><font color="#000000">foo</font></b><font color="#990000">()</font> <font color="#FF0000">{</font> <b><font color="#0000FF">return</font></b> <b><font color="#0000FF">new</font></b> <b><font color="#000000">Y</font></b><font color="#990000">();</font> <font color="#FF0000">}</font>
<font color="#FF0000">}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">B</font> <font color="#990000">:</font> <b><font color="#0000FF">public</font></b> A <font color="#FF0000">{</font>
<b><font color="#0000FF">public</font></b><font color="#990000">:</font>
<b><font color="#0000FF">virtual</font></b> <font color="#008080">X</font> <font color="#990000">*</font><b><font color="#000000">foo</font></b><font color="#990000">()</font> <font color="#FF0000">{</font> <b><font color="#0000FF">return</font></b> <b><font color="#0000FF">new</font></b> <b><font color="#000000">X</font></b><font color="#990000">();</font> <font color="#FF0000">}</font>
<font color="#FF0000">}</font><font color="#990000">;</font>
</code></pre>
<pre><blockquote>$ g++ -W -Wall -ansi -pedantic -c foo.cc
foo.cc:12: error: invalid covariant return type for ‘virtual X* B::foo()'
foo.cc:7: error: overriding ‘virtual Y* A::foo()’</blockquote></pre>
</p>
<h3>Contravariant Example</h3>
<p>
As we saw above, one can be covariant on return types, but what about on parameters to a member function? One has to be the opposite actually: contravariant. Unfortunately, gcc (or the version I'm using at least) doesn't give us an error about contravariance, instead we have to add a little bit of code using <code>B::foo</code> to see the problem:
<pre><code><b><font color="#0000FF">class</font></b> <font color="#008080">X</font> <font color="#FF0000">{}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">Y</font> <font color="#990000">:</font> <b><font color="#0000FF">public</font></b> X <font color="#FF0000">{}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">Z</font> <font color="#990000">:</font> <b><font color="#0000FF">public</font></b> Y <font color="#FF0000">{}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">A</font> <font color="#FF0000">{</font>
<b><font color="#0000FF">public</font></b><font color="#990000">:</font>
<b><font color="#0000FF">virtual</font></b> <font color="#009900">void</font> <b><font color="#000000">foo</font></b><font color="#990000">(</font><font color="#008080">Y</font> <font color="#990000">&</font>y<font color="#990000">)</font> <font color="#FF0000">{ }</font>
<font color="#FF0000">}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">B</font> <font color="#990000">:</font> <b><font color="#0000FF">public</font></b> A <font color="#FF0000">{</font>
<b><font color="#0000FF">public</font></b><font color="#990000">:</font>
<b><font color="#0000FF">virtual</font></b> <font color="#009900">void</font> <b><font color="#000000">foo</font></b><font color="#990000">(</font><font color="#008080">X</font> <font color="#990000">&</font>x<font color="#990000">)</font> <font color="#FF0000">{ }</font>
<font color="#FF0000">}</font><font color="#990000">;</font>
<font color="#009900">int</font> <b><font color="#000000">main</font></b><font color="#990000">()</font> <font color="#FF0000">{</font>
<font color="#008080">B</font> b<font color="#990000">;</font>
<font color="#008080">Y</font> y<font color="#990000">;</font>
<font color="#008080">Z</font> z<font color="#990000">;</font>
b<font color="#990000">.</font><b><font color="#000000">foo</font></b><font color="#990000">(</font>y<font color="#990000">);</font>
b<font color="#990000">.</font><b><font color="#000000">foo</font></b><font color="#990000">(</font>z<font color="#990000">);</font>
<b><font color="#0000FF">return</font></b> <font color="#993399">0</font><font color="#990000">;</font>
<font color="#FF0000">}</font>
</code></pre>
This behaves as we would expect. We can call <code>B::foo</code> with an <code>X</code>, <code>Y</code>, or <code>Z</code>, since <code>B::foo</code> takes the superclass to all of them (<code>X</code>). But what if we modify it to take <code>Z</code>:
<pre><code><b><font color="#0000FF">class</font></b> <font color="#008080">X</font> <font color="#FF0000">{}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">Y</font> <font color="#990000">:</font> <b><font color="#0000FF">public</font></b> X <font color="#FF0000">{}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">Z</font> <font color="#990000">:</font> <b><font color="#0000FF">public</font></b> Y <font color="#FF0000">{}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">A</font> <font color="#FF0000">{</font>
<b><font color="#0000FF">public</font></b><font color="#990000">:</font>
<b><font color="#0000FF">virtual</font></b> <font color="#009900">void</font> <b><font color="#000000">foo</font></b><font color="#990000">(</font><font color="#008080">Y</font> <font color="#990000">&</font>y<font color="#990000">)</font> <font color="#FF0000">{ }</font>
<font color="#FF0000">}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">B</font> <font color="#990000">:</font> <b><font color="#0000FF">public</font></b> A <font color="#FF0000">{</font>
<b><font color="#0000FF">public</font></b><font color="#990000">:</font>
<b><font color="#0000FF">virtual</font></b> <font color="#009900">void</font> <b><font color="#000000">foo</font></b><font color="#990000">(</font><font color="#008080">Z</font> <font color="#990000">&</font>z<font color="#990000">)</font> <font color="#FF0000">{ }</font>
<font color="#FF0000">}</font><font color="#990000">;</font>
<font color="#009900">int</font> <b><font color="#000000">main</font></b><font color="#990000">()</font> <font color="#FF0000">{</font>
<font color="#008080">B</font> b<font color="#990000">;</font>
<font color="#008080">Y</font> y<font color="#990000">;</font>
<font color="#008080">Z</font> z<font color="#990000">;</font>
b<font color="#990000">.</font><b><font color="#000000">foo</font></b><font color="#990000">(</font>y<font color="#990000">);</font>
b<font color="#990000">.</font><b><font color="#000000">foo</font></b><font color="#990000">(</font>z<font color="#990000">);</font>
<b><font color="#0000FF">return</font></b> <font color="#993399">0</font><font color="#990000">;</font>
<font color="#FF0000">}</font>
</code></pre>
<pre><blockquote>$ g++ -W -Wall -ansi -pedantic foo.cc
foo.cc: In function ‘int main()’:
foo.cc:24: error: no matching function for call to ‘B::foo(Y&)’
foo.cc:14: note: candidates are: virtual void B::foo(Z&)</pre></blockquote>
The error we get is about <code>B::foo(Y&)</code> not existing. Because <code>Z</code> is a narrower type than <code>Y</code>, we have broken the interface of <code>A</code>, so <code>B</code> is no longer a valid substitute for <code>A</code>.
</p>
<h3>But You Already Knew This</h3>
<p>
Intuitively, this is the way that makes the most sense. The use case for inheritance is generally when a function takes a certain type as input but you want to pass your own type. In order to ensure that the function, <code>bar</code>, can successfully use the type you pass it, all of the methods that <code>bar</code> calls on your type need to take types that <code>bar</code> can pass to it and return types that <code>bar</code> knows how to use. Take this code:
<pre><code><b><font color="#0000FF">class</font></b> <font color="#008080">X</font> <font color="#FF0000">{}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">Y</font> <font color="#990000">:</font> <b><font color="#0000FF">public</font></b> X <font color="#FF0000">{</font> <i><font color="#9A1900">/* ... */</font></i> <font color="#FF0000">}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">Z</font> <font color="#990000">:</font> <b><font color="#0000FF">public</font></b> Y <font color="#FF0000">{</font> <i><font color="#9A1900">/* ... */</font></i> <font color="#FF0000">}</font><font color="#990000">;</font>
<b><font color="#0000FF">class</font></b> <font color="#008080">A</font> <font color="#FF0000">{</font>
<b><font color="#0000FF">public</font></b><font color="#990000">:</font>
<b><font color="#0000FF">virtual</font></b> <font color="#008080">Y</font> <font color="#990000">*</font><b><font color="#000000">foo</font></b><font color="#990000">(</font><font color="#008080">Y</font> <font color="#990000">&</font>y<font color="#990000">)</font> <font color="#FF0000">{</font> <i><font color="#9A1900">/* ... */</font></i> <font color="#FF0000">}</font>
<font color="#FF0000">}</font><font color="#990000">;</font>
<font color="#009900">void</font> <b><font color="#000000">bar</font></b><font color="#990000">(</font><font color="#008080">A</font> <font color="#990000">&</font>a<font color="#990000">)</font> <font color="#FF0000">{</font>
<font color="#008080">Y</font> y<font color="#990000">;</font>
<font color="#008080">Y</font> <font color="#990000">*</font>y_ptr <font color="#990000">=</font> a<font color="#990000">.</font><b><font color="#000000">foo</font></b><font color="#990000">(</font>y<font color="#990000">);</font>
<font color="#FF0000">}</font>
</code></pre>
If you were to pass your own type, <code>B</code>, which is a subtype of <code>A</code> (that is, <code>B <: A</code>) the only way we can implement <code>B::foo</code> such that our function <code>bar</code> still works is if <code>B::foo</code> takes an object as input that is a supertype to <code>Y</code> and returns a type that is a subtype of <code>Y</code>. Consider if it were valid for <code>B::foo</code> to want a <code>Z</code>. <code>B::foo</code> could try to call a member function that exists only in <code>Z</code>. But since <code>bar</code> is passing an object of type <code>X</code>, which we can't guarantee has member function that exists only in <code>Z</code>, our code would be unsafe. A similar argument can be made with the return type of <code>Y</code> and <code>X</code>.
</p>
<h3>What About Arrays?</h3>
<p>
Now that we get what these terms mean, how do they apply when it comes to arrays of objects? In C++, arrays are not covariant, however in Java they are:
<pre><code><b><font color="#0000FF">public</font></b> <b><font color="#0000FF">class</font></b> <font color="#008080">Test</font> <font color="#FF0000">{</font>
<b><font color="#0000FF">public</font></b> <b><font color="#0000FF">static</font></b> <font color="#009900">void</font> <b><font color="#000000">foo</font></b><font color="#990000">()</font> <font color="#FF0000">{</font>
Test<font color="#990000">[]</font> tests <font color="#990000">=</font> <b><font color="#0000FF">new</font></b> Test<font color="#990000">[</font><font color="#993399">10</font><font color="#990000">];</font>
Object<font color="#990000">[]</font> objects <font color="#990000">=</font> tests<font color="#990000">;</font>
<font color="#FF0000">}</font>
<font color="#FF0000">}</font>
</code></pre>
This code looks innocent enough, but there is a problem: what if I modify <code>objects</code>? For example:
<pre><code><b><font color="#0000FF">public</font></b> <b><font color="#0000FF">class</font></b> <font color="#008080">Test</font> <font color="#FF0000">{</font>
<b><font color="#0000FF">public</font></b> <b><font color="#0000FF">static</font></b> <font color="#009900">void</font> <b><font color="#000000">foo</font></b><font color="#990000">()</font> <font color="#FF0000">{</font>
Test<font color="#990000">[]</font> tests <font color="#990000">=</font> <b><font color="#0000FF">new</font></b> Test<font color="#990000">[</font><font color="#993399">10</font><font color="#990000">];</font>
Object<font color="#990000">[]</font> objects <font color="#990000">=</font> tests<font color="#990000">;</font>
objects<font color="#990000">[</font><font color="#993399">0</font><font color="#990000">]</font> <font color="#990000">=</font> <b><font color="#0000FF">new</font></b> <b><font color="#000000">Integer</font></b><font color="#990000">(</font><font color="#993399">1</font><font color="#990000">);</font>
<font color="#FF0000">}</font>
<font color="#FF0000">}</font>
</code></pre>
This code compiles fine, but executing it gives the following error (changed <code>foo</code> to <code>main</code> in order to run it):
<pre><blockquote>Exception in thread "main" java.lang.ArrayStoreException: java.lang.Integer
at Test.main(Test.java:5)</blockquote></pre>
This is what people mean when they refer to covariant arrays being a mistake. A compile-time check has been moved to runtime, which means you don't know if your code is safe without running it. Note, though, that what makes covariant arrays a problem in Java is modifying the array, not reading it. If we could somehow ensure that the <code>objects</code> array was immutable, covariant arrays in Java would be safe. Why does Java have covariant arrays then? Originally Java did not have parametric polymorphism (generics), without covariant arrays it would be impossible to write a generic function that copies one array to another. Instead one would have to write a copy function for every type of array they wish to copy (<a href="http://www.amazon.com/Types-Programming-Languages-Benjamin-Pierce/dp/0262162091/ref=sr_1_1?ie=UTF8&qid=1319515999&sr=8-1">Pierce</a>, 188). While the behavior is not ideal, at the least the JVM can provide some level of safety by performing type checks at runtime.
</p>
<p>
Unfortunately, I cannot shed any light on why Dart has covariant arrays. It has done everything from befuddle to enrage type theorists (null pointers, in 2011, why?!). Dart appears to support parametric polymorphism, the lack of which lead to Java having covariant arrays. It is a shame that Dart seems to be repeating, what some, would classify as mistakes for no obvious benefit. Dart is not the only one sinning though, C# also has covariant arrays and seems to have simply copied them from Java.
</p>
<h3>Conclusion</h3>
<p>
I'm no type theorist, this post went through several iterations with comments and corrections from friends smarter than I, so this post only scratches the surface of covariance and contravariance. Hopefully it is enough to understand what people are referring to when the terms come up in casual Reddit conversation. If you're interested in more, Benjamin Pierce's book, <i><a href="http://www.amazon.com/Types-Programming-Languages-Benjamin-Pierce/dp/0262162091/ref=sr_1_1?ie=UTF8&qid=1319515999&sr=8-1">Types and Programming Languages</a></i>, has an entire chapter dedicated to subtyping. While I haven't finished the book yet, what I have read is excellent. Even if you skip the math that looks complicated you will come out better.
</p>
<h3>Thanks</h3>
<p>
Special thanks to <a href="http://twitter.com/#!/dklee">@dklee</a>. Much of this post is based off of emails we have exchanged. Thanks to <a href="http://twitter.com/#!/j2labs">@j2labs</a>, and <a href="http://twitter.com/#!/apgwoz">@apgwoz</a>, as well, for help in making this post.
</p>Unknownnoreply@blogger.com3tag:blogger.com,1999:blog-4348519741358344123.post-16754380793572662182011-10-02T22:41:00.002-04:002012-07-05T16:30:16.050-04:00Your Favorite Language is Probably Terrible at Concurrency too<p>
The internet has been ablaze with posts on NodeJS, to some people's joy and to others chagrin. Some have claimed that Node solves a long standing problem in concurrency, <a href="http://www.betabeat.com/2011/04/20/nodejitsu-raises-750k-from-east-and-west-coast-vcs/">saying</a>:
</p>
<p>
<blockquote>People are starting to build more on Node.js because it’s superior and it solves these problems that have always existed. I/O has been done wrong for the last 30 years</blockquote><br />
In my opinion, Node is bad at concurrency, and guess what? Your language probably isn't any better. But let's make sure we're on the same page first.
</p>
<a name='more'></a>
<p>
<ul><li><i>Language/Framework</i> - Most languages do not have concurrency as a first class citizen. So when I say "your language is bad at concurrency", what I really mean is "the options available for doing concurrent things in your language are bad". The former just rolls off your tongue better.</li>
<li><i>Concurrency</i> - What do I mean by concurrency? I mean a model by which you can define actions that can happen at the same time. That could mean running multiple pieces of code in parallel or interleaving them. Specifically in this post I am concerned with solving problems where the number of things you want to do concurrently is significantly larger than the number of cores you have.</li>
</ul>
</p>
<p>
There are a lot of options for concurrency out there. You may have heard of things like <a href="http://en.wikipedia.org/wiki/Pi_calculus">Pi calculus</a>, <a href="http://en.wikipedia.org/wiki/Join_calculus">Join calculus</a>, <a href="http://en.wikipedia.org/wiki/Communicating_sequential_processes">Communicating Sequential Processes</a>, <a href="http://en.wikipedia.org/wiki/Event_loop">Event-loops</a> and <a href="http://en.wikipedia.org/wiki/Coroutines">Coroutines</a>. Your language probably has an implementation of one of these, or a conceptual subset. NodeJS and Twisted implement an event-loop. Coroutines is the path Python's Gevent has taken, as well as libraries for Ruby, C, and C++. Go has chosen Communicating Sequential Processes. But all these distinctions aren't important unless I can say what I consider a good solution to concurrency.
</p>
<p>
Ideally, a good solution should have the following properties:<br />
<ul><li><i>Scaling</i> - If you are writing concurrent software you've already decided handling one thing at a time is not a scalable solution, so now you want to handle multiple things at a time. An ideal solution should scale to the limits of the machine. That means making use of multiple cores, if available.</li>
<li><i>Reasoning</i> - It should be easy for a reader of your code to reason about what it does. Edge cases and gotcha's should be limited. Preferably one shouldn't even be aware of the concurrent aspects of the code unless they need to be.</li>
<li><i>Debugging</i> - Debugging should not be painful. Standard tools like stacktraces should be meaningful. Tracing the path a piece of code takes shouldn't be harder than launching the space shuttle.</li>
</ul>
</p>
<p>
My claim is that very few concurrent solutions meet these criteria. But let me be clear, I'm not saying this is the only way you should judge selecting a solution. There is a Python library that does basically everything you think you need and it will be really hard to re-implement that functionality in another language? Well, maybe dealing with Python's concurrency shortcomings is less work than rewriting the library.</p>
<h2>Scaling</h2>
<p>
Most languages were built for writing serial code. Memory is accessible by any piece of code in the process and it is assumed that nothing interesting happens between two function calls. But modern computers are not fast enough to do all the work programmers want them to do in serial and these languages have a lot of momentum behind them. For valid reasons, it is challenging to just move to another solution. Instead, we duct tape concurrency on top of these serial languages. One problem is that some of these languages can't even run code in parallel (that is, have two functions running at the same exact time) even if they wanted to. Python and Ocaml have a global lock that restricts this. In other languages it's just too much coordination to do safely. In C and C++ it can be too hard and time consuming to coordinate distributing concurrent work over multiple threads. For this reason, many mainstream solutions to concurrency are limited to running on a single core. It's insane, right? I can buy a laptop with, ostensibly, 8 cores now, yet a program written in most mainstream languages cannot make use of more than one.
</p>
<p>
For this reason, most solutions fail to be <i>scalable</i>. For example, NodeJS, Twisted, Ocaml/Lwt, and Gevent: from the point-of-view of a user of these frameworks, their code not only cannot run on multiple cores, but it depends on it. Consider some Twisted code that downloads N web pages and appends the result to a list:
</p>
<p>
<pre><code><b><font color="#0000FF">def</font></b> <b><font color="#000000">downloadUrls</font></b><font color="#990000">(</font>urls<font color="#990000">):</font>
d <font color="#990000">=</font> defer<font color="#990000">.</font><b><font color="#000000">Deferred</font></b><font color="#990000">()</font>
ret <font color="#990000">=</font> <font color="#990000">[]</font>
<b><font color="#0000FF">def</font></b> <b><font color="#000000">_returnWhenDone</font></b><font color="#990000">(</font>_<font color="#990000">):</font>
<b><font color="#0000FF">if</font></b> <b><font color="#000000">len</font></b><font color="#990000">(</font>ret<font color="#990000">)</font> <font color="#990000">==</font> <b><font color="#000000">len</font></b><font color="#990000">(</font>urls<font color="#990000">):</font>
d<font color="#990000">.</font><b><font color="#000000">callback</font></b><font color="#990000">(</font>ret<font color="#990000">)</font>
<b><font color="#0000FF">for</font></b> url <b><font color="#0000FF">in</font></b> urls<font color="#990000">:</font>
downloadDefer <font color="#990000">=</font> <b><font color="#000000">downloadUrlAsString</font></b><font color="#990000">(</font>url<font color="#990000">)</font>
downloadDefer<font color="#990000">.</font><b><font color="#000000">addCallback</font></b><font color="#990000">(</font><b><font color="#0000FF">lambda</font></b> s <font color="#990000">:</font> ret<font color="#990000">.</font><b><font color="#000000">append</font></b><font color="#990000">(</font>s<font color="#990000">))</font>
downloadDefer<font color="#990000">.</font><b><font color="#000000">addCallback</font></b><font color="#990000">(</font>_returnWhenDone<font color="#990000">)</font>
<b><font color="#0000FF">return</font></b> d
</code></pre>
</p>
<p>
Ignoring my failure to handle failures, this code is acceptable Twisted, and it could not work if Python suddenly got the ability to run code on multiple cores and Twisted used it. The reason being, there is no coordination around the <code>ret.append(s)</code> line. What if two threads were to try to <code>append</code> to <code>ret</code> at the same time? NodeJS and Gevent have the same idea in mind. Almost no data access is surrounded by a mechanism to coordinate multiple pieces of code accessing it at the same time. The result is, none of the code using these frameworks can be run on multiple cores. If CPython or V8 got multicore support it would take a rewrite of all of the code to make use of it.
</p>
<p>
But, you say, who cares? "I can just spin up N instances of my program, where N is the number of cores on my machine. I can easily scale that way". You can't even get concurrency right and now you want to move into distributed programming? Who are you fooling? But seriously, the problem is your code now needs to be "location aware". If you want to do something with object X, you have to be aware of where object X lives. This adds another layer of complexity to your system. Without a good way of communicating between instances you are limited to solving embarrassingly parallel problems or pushing the concurrency to another software layer. Either way, you aren't actually solving the problem with your framework. Luckily, a lot of what people want concurrency for is serving webpages, which requires almost no interprocess communication right now.
</p>
<h2>Reasoning</h2>
<p>
No matter how you slice it, writing concurrent code is hard. When it comes to serial code, looking at it and knowing what it does is as simple as understanding how each function operates given the current state of the program. But with concurrent code, the state of the program is changing while a function runs. Understanding a concurrent program involves understanding how the concurrent components are interacting with each other. Some solutions make this easier than others.
</p>
<p>
Take the <a href="https://github.com/koush/node/wiki/%22async%22-support-in-node.js">following</a> piece of example NodeJS code:
</p>
<p>
<pre><code><b><font color="#0000FF">var</font></b> db <font color="#990000">=</font> <b><font color="#000000">require</font></b><font color="#990000">(</font><font color="#FF0000">'somedatabaseprovider'</font><font color="#990000">);</font>
app<font color="#990000">.</font><b><font color="#000000">get</font></b><font color="#990000">(</font><font color="#FF0000">'/price'</font><font color="#990000">,</font> <b><font color="#0000FF">function</font></b><font color="#990000">(</font>req<font color="#990000">,</font> res<font color="#990000">)</font> <font color="#FF0000">{</font>
db<font color="#990000">.</font><b><font color="#000000">openConnection</font></b><font color="#990000">(</font><font color="#FF0000">'host'</font><font color="#990000">,</font> <font color="#993399">12345</font><font color="#990000">,</font> <b><font color="#0000FF">function</font></b><font color="#990000">(</font>err<font color="#990000">,</font> conn<font color="#990000">)</font> <font color="#FF0000">{</font>
conn<font color="#990000">.</font><b><font color="#000000">query</font></b><font color="#990000">(</font><font color="#FF0000">'select * from products where id=?'</font><font color="#990000">,</font> <font color="#990000">[</font>req<font color="#990000">.</font><b><font color="#000000">param</font></b><font color="#990000">(</font><font color="#FF0000">'product'</font><font color="#990000">)],</font> <b><font color="#0000FF">function</font></b><font color="#990000">(</font>err<font color="#990000">,</font> results<font color="#990000">)</font> <font color="#FF0000">{</font>
conn<font color="#990000">.</font><b><font color="#000000">close</font></b><font color="#990000">();</font>
res<font color="#990000">.</font><b><font color="#000000">send</font></b><font color="#990000">(</font>results<font color="#990000">[</font><font color="#993399">0</font><font color="#990000">]);</font>
<font color="#FF0000">}</font><font color="#990000">);</font>
<font color="#FF0000">}</font><font color="#990000">);</font>
<font color="#FF0000">}</font><font color="#990000">);</font>
</code></pre>
</p>
<p>
The amount of syntax is enormous. There is a huge amount of line noise for what should look, at worst, like this:
</p>
<p>
<pre><code><b><font color="#0000FF">var</font></b> db <font color="#990000">=</font> <b><font color="#000000">require</font></b><font color="#990000">(</font><font color="#FF0000">'somedatabaseprovider'</font><font color="#990000">);</font>
app<font color="#990000">.</font><b><font color="#000000">get</font></b><font color="#990000">(</font><font color="#FF0000">'/price'</font><font color="#990000">,</font> <b><font color="#0000FF">function</font></b><font color="#990000">(</font>req<font color="#990000">,</font> res<font color="#990000">)</font> <font color="#FF0000">{</font>
<b><font color="#0000FF">var</font></b> conn <font color="#990000">=</font> db<font color="#990000">.</font><b><font color="#000000">openConnection</font></b><font color="#990000">(</font><font color="#FF0000">'host'</font><font color="#990000">,</font> <font color="#993399">12345</font><font color="#990000">)</font>
<b><font color="#0000FF">var</font></b> result <font color="#990000">=</font> conn<font color="#990000">.</font><b><font color="#000000">query</font></b><font color="#990000">(</font><font color="#FF0000">'select * from products where id=?'</font><font color="#990000">,</font> <font color="#990000">[</font>req<font color="#990000">.</font><b><font color="#000000">param</font></b><font color="#990000">(</font><font color="#FF0000">'product'</font><font color="#990000">)])</font>
conn<font color="#990000">.</font><b><font color="#000000">close</font></b><font color="#990000">();</font>
res<font color="#990000">.</font><b><font color="#000000">send</font></b><font color="#990000">(</font>results<font color="#990000">[</font><font color="#993399">0</font><font color="#990000">]);</font>
<font color="#FF0000">}</font><font color="#990000">);</font>
</code></pre>
</p>
<p>
If you want to add proper error handling, the situation gets worse with callback code. Twisted has attempted to solve this by encapsulating code flow in an object called a <code>Deferred</code>, but the problem remains: a unit of work in callback-based code is not a function, like one is used to in serial code, it is work to do between events. Like the above example code showed, there isn't a function that connects to a db, does a query, and returns the result. There is a function to open a db connection, another function for when that is done and to do the db query, and another function to handle the result. You have defined three functions where you previously needed one. More importantly, you have to define functions not because it makes your code clearer but because the framework requires it.
</p>
<p>
Given how negatively this affects code, there are a lot of attempted solutions. Twisted, for example, allows one to use the <code>defer.inlineCallbacks</code> decorator so a function can use generators to express asynchronous code. Our previous NodeJS code might look like this:
</p>
<p>
<pre><code>@defer<font color="#990000">.</font>inlineCallbacks
<b><font color="#0000FF">def</font></b> <b><font color="#000000">handlePrice</font></b><font color="#990000">(</font>req<font color="#990000">,</font> res<font color="#990000">):</font>
conn <font color="#990000">=</font> yield db<font color="#990000">.</font><b><font color="#000000">openConnection</font></b><font color="#990000">(</font><font color="#FF0000">'host'</font><font color="#990000">,</font> <font color="#993399">12345</font><font color="#990000">)</font>
result <font color="#990000">=</font> yield conn<font color="#990000">.</font><b><font color="#000000">query</font></b><font color="#990000">(</font><font color="#FF0000">'select * from products where id=?'</font><font color="#990000">,</font> <font color="#990000">[</font>req<font color="#990000">.</font><b><font color="#000000">param</font></b><font color="#990000">(</font><font color="#FF0000">'product'</font><font color="#990000">)])</font>
yield conn<font color="#990000">.</font><b><font color="#000000">close</font></b><font color="#990000">()</font>
res<font color="#990000">.</font><b><font color="#000000">send</font></b><font color="#990000">(</font>results<font color="#990000">[</font><font color="#993399">0</font><font color="#990000">])</font>
app<font color="#990000">.</font><b><font color="#000000">get</font></b><font color="#990000">(</font><font color="#FF0000">'/price'</font><font color="#990000">,</font> handlePrice<font color="#990000">)</font>
</code></pre>
</p>
<p>
In many ways this is an improvement but it does have its limitations.
</p>
<p>
The NodeJS community has been at work solving this problem for themselves too. One person <a href="https://github.com/koush/node/wiki/%22async%22-support-in-node.js">added</a> coroutines to V8, and gave it a C#-like syntax. OKCupid gave us <a href="http://tamejs.org/">TameJS</a>. Both of these solutions have their problems which are deal breakers for many.
</p>
<p>
There are also, less complete, solutions like <a href="https://github.com/creationix/step">Step</a>. But library solutions, like Step, only give you access to a subset of functionality you would get from the sequential code you really want to write. To do that you need a full <a href="http://en.wikipedia.org/wiki/Continuation_passing_style">CPS</a> transformation (which is what TameJS gives you, at a cost of debugging). This is actually how the syntax extensions for Ocaml/Lwt work. The previous NodeJS code might look like this in Ocaml/Lwt (the relevant part is that <code>lwt</code> causes a CPS transformation to turn the code into the appropriate callback-based code):
</p>
<p>
<pre><code><b><font color="#0000FF">let</font></b> handle_price req res <font color="#990000">=</font>
lwt conn <font color="#990000">=</font> <b><font color="#000080">DB</font></b><font color="#990000">.</font>open_connection <font color="#FF0000">"host"</font> <font color="#993399">12345</font> <b><font color="#0000FF">in</font></b>
lwt result <font color="#990000">=</font> <b><font color="#000080">DB</font></b><font color="#990000">.</font>query conn <font color="#990000">(</font><b><font color="#000080">SQL</font></b><font color="#990000">.</font>sprintf <font color="#FF0000">"select * from products where id=?"</font> <font color="#990000">(</font>req#param <font color="#FF0000">"product"</font><font color="#990000">))</font> <b><font color="#0000FF">in</font></b>
<b><font color="#000080">DB</font></b><font color="#990000">.</font>close conn<font color="#990000">;</font>
res#send results<font color="#990000">.[</font><font color="#993399">0</font><font color="#990000">]</font>
<b><font color="#000080">App</font></b><font color="#990000">.</font>get <font color="#FF0000">"/price"</font> handle_price
</code></pre>
</p>
<p>
This is one reason for Gevent/Eventlet's popularity in Python. Gevent uses coroutines to give you asynchronous code that looks sequential. The trick is, underneath the hood, some function calls actually result in all of the state for your current function call being saved, another one switched to, executed, rinse, repeat. Gevent has a cooperative scheduler that tries to intelligently decide which function to switch to.</p>
<p>
Say you want to write the earlier NodeJS code in sequential Python, you might get:
</p>
<p>
<pre><code><b><font color="#0000FF">def</font></b> <b><font color="#000000">handlePrice</font></b><font color="#990000">(</font>req<font color="#990000">,</font> res<font color="#990000">):</font>
conn <font color="#990000">=</font> db<font color="#990000">.</font><b><font color="#000000">openConnection</font></b><font color="#990000">(</font><font color="#FF0000">'host'</font><font color="#990000">,</font> <font color="#993399">12345</font><font color="#990000">)</font>
result <font color="#990000">=</font> conn<font color="#990000">.</font><b><font color="#000000">query</font></b><font color="#990000">(</font><font color="#FF0000">'select * from products where id=?'</font><font color="#990000">,</font> <font color="#990000">[</font>req<font color="#990000">.</font><b><font color="#000000">param</font></b><font color="#990000">(</font><font color="#FF0000">'product'</font><font color="#990000">)])</font>
conn<font color="#990000">.</font><b><font color="#000000">close</font></b><font color="#990000">()</font>
res<font color="#990000">.</font><b><font color="#000000">send</font></b><font color="#990000">(</font>results<font color="#990000">[</font><font color="#993399">0</font><font color="#990000">])</font>
app<font color="#990000">.</font><b><font color="#000000">get</font></b><font color="#990000">(</font><font color="#FF0000">'/price'</font><font color="#990000">,</font> handlePrice<font color="#990000">)</font>
</code></pre>
</p>
<p>
How would this look in Gevent? Exactly the same. The <code>openConnection</code> and <code>query</code> functions have an I/O call which actually jumps back to the Gevent scheduler so it can do something else while the I/O happens.<br />
<br />
But Gevent is not without its cost when it comes to reasoning about code. Consider this:
</p>
<p>
<pre><code><b><font color="#0000FF">def</font></b> <b><font color="#000000">foo</font></b><font color="#990000">(</font>data<font color="#990000">):</font>
<b><font color="#0000FF">print</font></b> data<font color="#990000">.</font>bar
<b><font color="#000000">do_something</font></b><font color="#990000">()</font>
<b><font color="#0000FF">print</font></b> data<font color="#990000">.</font>bar
</code></pre>
</p>
<p>
Looking at this code, will the same value be printed twice? The answer is: no idea. Even though <code>do_something</code> does not take <code>data</code> as input, it could do something that causes Gevent to context switch to another function, another function which also has access to <code>data</code> and modifies it. There is no way to tell, simply by looking at the code, if it will context switch or not.
</p>
<h2>Debugging</h2>
<p>
The previous Gevent code is printing out two different values for <code>data.bar</code> and you don't want this, how do you fix it? The first thing you might try, from your serial programming days, is a debugger. But that might not work very well. Why? You're in concurrent-land now, multiple things are happening at once! That means timing is important. If you set a break point somewhere, you've disrupted the time things happen and your program could take a completely different path, not the one you want to debug.
</p>
<p>
If you're smart and you control access to <code>data.bar</code> through function calls, you can do some printf debugging. Perhaps print out a stacktrace when one modifies it. But let's say, even those prints are causing the timing of your program to change, so now <code>data.bar</code> is coming out as the same value at each <code>print</code>. What do you do?!
</p>
<p>
The point is, debugging concurrent code can be very hard. Event-loop code adds another problem to debugging: your code doesn't have a linear path. If you could visualize sequential code, it would be a line. You start at point A, you do the things in order to get to point B, at any point if you have an error your callstack represents the path you took to get there. Event-loop code always needs to hit the event-loop for a blocking call though. The callstack you see is always limited to the path from the last event you got. A callstack in the code handling a database query may not contain the how you got there. If that query is part of a piece of fairly generic code you don't have many leads to go on to track it down.
</p>
<h3>Who got it right then?</h3>
<p>
Three languages come to mind: Erlang, Oz, Haskell. There are more out there but I'm not omnipotent. In my opinion, these languages are capable of the three properties I previously mentioned. Right now you are probably rolling your eyes and saying "I should have known, one of THOSE guys". But my argument is conservative: based on the properties that I believe are important for concurrent solution to be good, these languages excel (or are capable of it) at them. Real world problems contain more than just concurrency issues though, so this does not mean you're wrong to use a language that doesn't meet my criteria, but it does mean you are sacrificing something. Perhaps that sacrifice is acceptable. But don't fool yourself into thinking your language is not terrible at concurrency, because it probably is.
</p>Unknownnoreply@blogger.com21tag:blogger.com,1999:blog-4348519741358344123.post-33306570007943874612011-07-31T14:35:00.003-04:002012-07-05T16:33:28.164-04:00C Gotcha Of The Day: Pointers aren't integersThe C standard is clear that pointers are not required to be convertible to or from an integer.<br />
<br />
<a name='more'></a>
Section 6.3.2.3.5-6 in the C99 draft<br />
<br />
<blockquote>An integer may be converted to any pointer type. The result is implementation-defined, might not be properly aligned, and might not point to an entity of the referenced type.)<br />
<br />
Any pointer type may be converted to an integer type; the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.</blockquote><br />
Basically, you can't depend on the following code doing anything useful:<br />
<br />
<pre><code><tt><font color="#009900">int</font> <font color="#990000">*</font>p <font color="#990000">=</font> <font color="#990000">(</font><font color="#009900">int</font><font color="#990000">*)</font><font color="#993399">0xff</font><font color="#990000">;</font>
</tt></code></pre><br />
The C standard does not define a machine with a flat memory model. Old Intel systems are an example of a non flat memory model, they had a segmented memory model where a pointer needed a segment and offset.<br />
<br />
Conversion to a string is also implementation defined in C, from 7.19.6.1 <code>fprintf</code>, the section on the <code>%p</code> format specifier:<br />
<br />
<blockquote>The argument shall be a pointer to void. The value of the pointer is converted to a sequence of printable characters, in an implementation-defined manner.</blockquote><br />
And from <code>fscanf</code>:<br />
<br />
<blockquote>Matches an implementation-defined set of sequences, which should be the same as the set of sequences that may be produced by the %p conversion of the fprintf function. The corresponding argument shall be a pointer to a pointer to void. The interpretation of the input item is implementation-defined. If the input item is a value converted earlier during the same program execution, the pointer that results shall compare equal to that value; otherwise the behavior of the %p conversion is undefined. </blockquote><br />
It does appear that even though the result of <code>%p</code> is implementation defined, it is guaranteed that you can <code>fprintf</code> and <code>fscanf</code> pointers and get back the same result inside the same program execution.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4348519741358344123.post-30388880749996240862011-07-29T13:26:00.000-04:002011-07-29T13:26:18.857-04:00C Gotcha Of The Day: ptrdiff_tExcerpt from C99 Draft (AFAIK this has not changed):<br />
<br />
<blockquote>The size of the result is implementation-defined, and its type (a signed integer type) is <code>ptrdiff_t</code> defined in the <code><stddef.h></code> header. If the result is not representable in an object of that type, the behavior is undefined. In other words, if the expressions <code>P</code> and <code>Q</code> point to, respectively, the i-th and j-th elements of an array object, the expression <code>(P) - (Q)</code> has the value i−j provided the value fits in an object of type <code>ptrdiff_t</code>.</blockquote><br />
That means, while the type <code>size_t</code> is capable of expressing the size of any object, you cannot guarantee that the subtraction of two pointers inside your object will result in defined behavior. That is because <code>ptrdiff_t</code> is signed (so it can give you the direction of the difference) and <code>size_t</code> is unsigned. You can use the macros <code>PTRDIFF_MAX</code> and <code>SIZE_MAX</code> to determine if your subtraction is safe though.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4348519741358344123.post-75745859448599582602011-05-03T13:31:00.000-04:002011-05-03T13:31:32.030-04:00Sometimes I forget the full degree in which Python's "scoping" is broken.<blockquote>>>> [s for s in [1, 2,3]]<br />
[1, 2, 3]<br />
>>> s<br />
3</blockquote>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-4348519741358344123.post-42909072136305684192011-04-27T22:53:00.003-04:002012-07-05T16:30:38.785-04:00[JC] Efficient Parallel Programming in Poly/ML and Isabelle/MLAuthors: David C.J. Matthews and Makarius Wenzel<br />
URL: <a href="http://www4.in.tum.de/~wenzelm/papers/parallel-ml.pdf">http://www4.in.tum.de/~wenzelm/papers/parallel-ml.pdf</a><br />
Year: 2010<br />
<br />
I had not heard of Poly/ML or Isabelle/ML prior to reading this paper. I tried to do a bit of background research to get an idea for it, but I may have gotten some details wrong. If I made any mistakes below I will be happy to correct them.<br />
<br />
<h1>The Paper</h1><br />
Poly/ML is a full implementation of Standard ML originally developed by David Matthews. Larry Paulson was an early adopter and implemented the Isabelle theorem prover. Isabelle/ML is the implementation of Isabelle on Poly/ML. Poly/ML was originally implemented with a single threaded run time system (RTS). With the ubiquity of multicore machines the RTS was modified to support threading, the garbage collector was modified and threading APIs were introduced. Finally, the modifications were tested in Isabelle. This paper is relevant to the frequent debates that occur on the Ocaml mailing list. Ocaml currently has no support for parallelism and this deficiency is frequently brought up as a negative.<br />
<br />
<a name='more'></a>
The primary motivation for adding parallelism to Poly/ML is Isabelle/ML, where proofs can take hours to days to execute. Poly/ML and Isabelle/ML have been developed for many years which restricted the modifications to not break existing code and not grossly negatively effect single threaded applications. The authors decided to start with a fairly low level, pthreads-like, implementation and build layers on top of it, each more abstract than the previous. The majority of the changes took place in the RTS, Poly/ML base library, and Isabelle/ML library. In the end, no user's Isabelle code need be modified to take advantage of the performance benefits.<br />
<br />
As stated before, Poly/ML is an implementation of Standard ML. The original definition of SML included an asynchronous exception called <code>Interrupt</code>. This was removed from the 1997 definition however Poly/ML has kept in for interrupting threads. This exception can be triggered by one thread in another thread to stop a computation. <code>Interrupt</code> is a useful mechanism to get a threads attention.<br />
<br />
The RTS is written in C++, provides memory management, access to the underlying OS and required the most modifications. The RTS provides a one-to-one relationship between OS threads and ML threads. The OS is in charge of scheduling threads on cores. Synchronization primitives are not implement as a direct call to the underlying OS's, however. Each thread is given a single condition variable which the RTS can signal for the unlocking of a mutex, or signaling of a condition variable, in the ML code. A single threaded stop-the-world GC is used. While it is known that this is a likely bottleneck as the number of cores increase, it is shown that it not an issue up to 8 cores.<br />
<br />
The base libraries were hardly altered, beyond adding the necessary threading APIs. The low-level APIs provided are: <br />
<br />
<b>Threads:</b><br />
<pre><code><tt>fork<font color="#990000">:</font> <font color="#990000">(</font>unit <font color="#990000">-></font> unit<font color="#990000">)</font> <font color="#990000">*</font> attribute list <font color="#990000">-></font> thread
interrupt<font color="#990000">:</font> thread <font color="#990000">-></font> unit
setAttributes<font color="#990000">:</font> attribute list <font color="#990000">-></font> unit
getAttributes<font color="#990000">:</font> unit <font color="#990000">-></font> attribute list</tt></code></pre><b>Mutexes:</b><br />
<pre><code><tt>mutex<font color="#990000">:</font> unit <font color="#990000">-></font> mutex
lock<font color="#990000">:</font> mutex <font color="#990000">-></font> unit
unlock<font color="#990000">:</font> mutex <font color="#990000">-></font> unit</tt></code></pre><b>Condition Variables:</b><br />
<pre><code><tt>condVar<font color="#990000">:</font> unit <font color="#990000">-></font> condVar
wait<font color="#990000">:</font> condVar <font color="#990000">*</font> mutex <font color="#990000">-></font> unit
signal<font color="#990000">:</font> condVar <font color="#990000">-></font> unit
boradcast<font color="#990000">:</font> condVar <font color="#990000">-></font> unit</tt></code></pre><br />
The only one that might need explaining is <code>interrupt</code>. This raises the asynchronous exception <code>Interrupt</code> in the provided thread. A few other locations of the standard library were modified, such as the I/O module which had locks added to each operation. The authors do point out:<br />
<blockquote>... this overhead can be almost entirely avoided by using the functional IO layer of the ML library in which reading a character returns the character and a new stream since in this case a lock is only needed when the buffer is empty and needs to be refilled.</blockquote><br />
Abstractions over these were added to the Isabelle/ML library. The first is a synchronized variable with the following definition:<br />
<pre><code><tt><b><font color="#0000FF">type</font></b> 'a var
<b><font color="#0000FF">val</font></b> var<font color="#990000">:</font> 'a <font color="#990000">-></font> 'a var
<b><font color="#0000FF">val</font></b> value<font color="#990000">:</font> 'a var <font color="#990000">-></font> 'a
<b><font color="#0000FF">val</font></b> guarded_access<font color="#990000">:</font> 'a var <font color="#990000">-></font> <font color="#990000">(</font>'a <font color="#990000">-></font> <font color="#990000">(</font>'b <font color="#990000">*</font> 'a<font color="#990000">)</font> option<font color="#990000">)</font> <font color="#990000">-></font> 'b</tt></code></pre><br />
<code>guarded_access</code> takes a synchronized variable, and a function that takes the value of the synchronized variable as input and returns an option containing a tuple of the new value to put in the synchronized variable and the value to return. The combinator combines the idea of a mutex and condition variable. In the code <code>guarded_access v f</code>, <code>f</code> will be applied to the value stored in <code>v</code>. If <code>f</code> returns <code>NONE</code> then <code>guarded_access</code> waits for the next signal on that variable and applies <code>f</code> again. If <code>f</code> returns <code>Some (a, b)</code>, then the value <code>b</code> will be put back into the synchronized variable and the value <code>a</code> will be returned.<br />
<br />
This simple construct can be used to implement variables that can be safely shared, and updated, between threads. mvar's, variables where a thread waits for a value if none is there when getting and waits for the value to be removed when putting, can be easily implemented with this:<br />
<pre><code><tt><b><font color="#0000FF">type</font></b> 'a mvar <font color="#990000">=</font> 'a option var
<b><font color="#0000FF">fun</font></b> mvar <font color="#990000">()</font> <font color="#990000">=</font> var NONE
<b><font color="#0000FF">fun</font></b> take v <font color="#990000">=</font> guarded_access v
<font color="#990000">(</font><b><font color="#0000FF">fn</font></b> NONE <font color="#990000">=></font> NONE <font color="#990000">|</font> Some x <font color="#990000">=></font> Some <font color="#990000">(</font>x<font color="#990000">,</font> NONE<font color="#990000">))</font>
<b><font color="#0000FF">fun</font></b> put v x <font color="#990000">=</font> guarded_access v
<font color="#990000">(</font><b><font color="#0000FF">fn</font></b> SOME _ <font color="#990000">=></font> NONE <font color="#990000">|</font> NONE <font color="#990000">=></font> <font color="#990000">(</font>SOME <font color="#990000">((),</font> SOME x<font color="#990000">)))</font></tt></code></pre><br />
The other abstraction presented is futures, which are a way to represent the result of a computation before it is completed. The futures interfaces looks a lot like a typical lazy evaluation interface, except the value will always be computed unless it is canceled. Futures are an attractive way to represent parallel computations. Consider this contrived example:<br />
<pre><code><tt><b><font color="#0000FF">val</font></b> x <font color="#990000">=</font> future some_expensive_computation
<b><font color="#0000FF">val</font></b> y <font color="#990000">=</font> future some_other_expensive_computation
<b><font color="#0000FF">val</font></b> z <font color="#990000">=</font> join x <font color="#990000">+</font> join y</tt></code></pre><br />
We create two futures, <code>x</code> and <code>y</code> which evaluate to integers. Then we create <code>z</code> which will be the sum of <code>x</code> and <code>y</code>. Assuming you have enough cores, <code>x</code> and <code>y</code> will be computed in parallel. The <code>join</code> function waits for the future and evaluates to the value of the future. Here is the interface for futures in Isabelle/ML:<br />
<pre><code><tt><b><font color="#0000FF">type</font></b> 'a future
<b><font color="#0000FF">val</font></b> future<font color="#990000">:</font> <font color="#990000">(</font>unit <font color="#990000">-></font> 'a<font color="#990000">)</font> <font color="#990000">-></font> 'a future
<b><font color="#0000FF">val</font></b> join<font color="#990000">:</font> 'a future <font color="#990000">-></font> 'a
<b><font color="#0000FF">val</font></b> cancel<font color="#990000">:</font> 'a future <font color="#990000">-></font> unit</tt></code></pre><br />
If a future is cancelled or produces an exception, that exception will be reraised when <code>join</code> is called. Isabelle/ML also contains a few future combinators, such as future groups which create a hierarchy of execution that kills all futures in the hierarchy if one fails.<br />
<br />
The implied implementation of futures is a thread pool where the <code>future</code> function queues work for the thread pool. A scheduler will then decide which futures get work or are cancelled.<br />
<br />
That is the end of additions to Poly/ML and Isabelle/ML. How well does it work? While subjective, performance improvements were more than acceptable with tests plateauing at around 8 cores. The tests were performed using large Isabelle/Isar proofs available in the standard distribution. Isabelle/Isar is non computational language and has a modular nature that makes it possible to exploit implicit parallelism. Four Isabelle applications were used for testing, all of which are considered reasonably large. The results showed a maximum of 3.2x speedup for 4 cores and 6.5x on 8 cores. Past 8 cores the performance increase plateaus. Two bottlenecks were investigated further. The first is garbage collection. As stated earlier Poly/ML has a single threaded stop-the-world GC. At 16 cores GC becomes about 15-30% of the run time. The second bottleneck is insufficient parallelization. Of the applications chosen, there was an insufficient amount of work to keep all of the cores busy.<br />
<br />
While there are many implementations for functional languages out there, Poly/ML remains one of the few that supports true multicore threads in a stable state and used in realistic applications. The modifications to Poly/ML took about 1 person year of work. Future work includes parallelizing the garbage collector.<br />
<br />
<h1>The Discussion</h1>Every few months the question of if Ocaml will get multicore support comes up. The answer is generally "no" (although it's coming closer to "yes" thanks to OcamlPro) with the reason being it is too hard to provide a proper implementation, garbage collection being given as the primary reason. I think that the results of Poly/ML show that even a single threaded stop-the-world GC can provide a sufficient performance advantage while also providing more powerful abstractions for parallelism. A parallel garbage collector can be worked in later, if needed, without affecting existing code.<br />
<br />
I liked that the authors chose to add low-level threading support and build layers on top. It gives developers options if the provided abstractions are not meeting their needs. Futures appear to be a very appealing abstraction but I worry somewhat that they are insufficient. In order to maximize parallelism, should every function consume and produce futures? That seems a bit overkill and the overhead would likely be enormous. But it is easy to think of situations where either decision hurts you, either because of the overhead or because you aren't utilizing all of the cores effectively. I do like futures as an option though and hope to see them if/when Ocaml gets multicore support.<br />
<br />
This is my first exposure to Poly/ML and the authors have done excellent work. I hope it provides motivation for bringing Ocaml into the multicore world.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4348519741358344123.post-81953216086171756362011-04-14T10:50:00.011-04:002011-04-17T19:45:26.389-04:00A little gotcha in overloading comparison methods in PythonPython supports chaining relational operators so you can express 2 < 3 < 4 and get true. It looks like this is implemented as a little compiler trick that actually creates the expression "2 < 3 and 3 < 4". This can be confusing if you try to do something insane like implement Haskell's (>>=) in Python using (>=). Something like: <br />
<pre><code><tt><b><font color="#000000">Just</font></b><font color="#990000">(</font><font color="#993399">10</font><font color="#990000">)</font> <font color="#990000">>=</font> <font color="#990000">(</font><b><font color="#0000FF">lambda</font></b> x <font color="#990000">:</font> <b><font color="#000000">Just</font></b><font color="#990000">(</font>x <font color="#990000">+</font> <font color="#993399">1</font><font color="#990000">))</font> <font color="#990000">>=</font> <font color="#990000">(</font><b><font color="#0000FF">lambda</font></b> x <font color="#990000">:</font> <b><font color="#000000">Just</font></b><font color="#990000">(</font>x <font color="#990000">/</font> <font color="#993399">2</font><font color="#990000">))</font>
</tt></code></pre>Will actually become:<br />
<pre><code><tt><b><font color="#000000">Just</font></b><font color="#990000">(</font><font color="#993399">10</font><font color="#990000">)</font> <font color="#990000">>=</font> <font color="#990000">(</font><b><font color="#0000FF">lambda</font></b> x <font color="#990000">:</font> <b><font color="#000000">Just</font></b><font color="#990000">(</font>x <font color="#990000">+</font> <font color="#993399">1</font><font color="#990000">))</font> <b><font color="#0000FF">and</font></b> <font color="#990000">\</font>
<font color="#990000">(</font><b><font color="#0000FF">lambda</font></b> x <font color="#990000">:</font> <b><font color="#000000">Just</font></b><font color="#990000">(</font>x <font color="#990000">+</font> <font color="#993399">1</font><font color="#990000">))</font> <font color="#990000">>=</font> <font color="#990000">(</font><b><font color="#0000FF">lambda</font></b> x <font color="#990000">:</font> <b><font color="#000000">Just</font></b><font color="#990000">(</font>x <font color="#990000">/</font> <font color="#993399">2</font><font color="#990000">))</font></tt></code></pre>This only seems to only apply to the relational operators, choosing to implement (>>=) with (>>) in Python seems to work fine. Here is a Gist showing the problem from <a href="http://apgwoz.com">@apgwoz</a>: <a href="https://gist.github.com/916132">https://gist.github.com/916132</a>.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4348519741358344123.post-18467040654742822292011-04-12T10:51:00.000-04:002011-04-12T10:51:11.234-04:00Unicorn is Unix, What?I'm late to <a href="http://tomayko.com/writings/unicorn-is-unix">this</a> blog post but <a href="http://twitter.com/apgwoz">@apgwoz</a> just sent it to me. I found it pretty silly. Apparently we should like Unicorn because it uses a lot of Unix system calls...<br />
<blockquote>There’s another problem with Unix programming in Ruby that I’ll just touch on briefly: Java people and Windows people. They’re going to tell you that fork(2) is bad because they don’t have it on their platform, or it sucks on their platform, or whatever, but it’s cool, you know, because they have native threads, and threads are like, way better anyways.<br />
<br />
Fuck that.<br />
<br />
Don’t ever let anyone tell you that fork(2) is bad. Thirty years from now, there will still be a fork(2) and a pipe(2) and a exec(2) and smart people will still be using them to solve hard problems reliably and predictably, just like they were thirty years ago.</blockquote>False dichotomies are the best form of logic.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-4348519741358344123.post-74263603747900861332010-10-05T22:34:00.009-04:002010-10-06T11:33:18.262-04:00Summary of CUFP 2010This year was my first attending CUFP and I had a great time. I was pleasantly surprised at how strong of a showing the OCaml community had. I knew Jane Street would be there but I ran into several other people working in OCaml. The star of the show was definitely F# in my opinion. The weakest part of the conference was the lack of outlets. My laptop battery ran out by the second session of the first day and it was really quite difficult to find an outlet to charge it.<br /><br /><h1>Day 1</h1><br />The first day was broken into two session, each in a tutorial style. For the first session I was in the <i>Building Robust Servers Using Erlang</i> presented by Martin Logan from Orbitz. This stumbled a bit at the beginning, I think Martin was hoping people would be more familiar with Erlang as a language so he could delve into how to build a robust server. It picked up in the end though and I think he successfully drove his message home. The people I talked to after the session expected it to be a basic description on how to write Erlang but were impressed by the power of OTP, especially the supervisor model. A few people remarked that Erlang seemed great for anything that needed to be long running, so I think Martin was successful.<br /><br />I jumped between all of the presentations in the second session.<br /><ul><br /><li><b>F#</b> - This was interesting, I hadn't seen F# much before. The presenter was teaching it through an ant simulation and had a contest with prizes.</li><br /><li><b>Camlp4 and Template Haskell</b> - I was a bit let down by what I saw of this one. It didn't seem like the presenters really gave a good introduction to templating languages. They presented a problem and let everyone work on it and would go around answering any questions. I wish my laptop battery was working so I could have taken a shot at playing with Camlp4. To their credit they were very helpful when asked but the initial presentation seemed lacking to me. Perhaps it was just too far over my head at this time.</li><br /><li><b>Scala and Lift</b> - This was the presentation that I had the least interest in but I think was the most well done. David's presentation was interactive and had no slides. He simply wrote code with you and explained what it did and I think that worked well. Everyone I talked to after seemed impressed by what Lift was capable of accomplishing so easily.</li><br /></ul><br /><br /><h1>Day 2</h1><br />Day 2 was all talks done in serial. I enjoyed most of the talks quite a bit. Yaron Minksy from Jane Street started out by saying something I think was important and easy to forget if you are heavily in the FP community. Despite the clear progress FP seems to be making (F# in Visual Studio, Real World Haskell, FP's in several big companies), we really <b>aren't</b> growing like we'd like to think. For most people management either says no to a functional language or it has to be snuck in through the back door. That is why they chose the keynote to be about F#. Microsoft including it in Visual Studio is a big leap and probably the biggest news in terms of FP going mainstream. But is it enough? We'll find out in the coming years.<br /><br /><ul><br /><li><b>F#</b> - This was the keynote presented by Luke Hoban from Microsoft and he painted a really great picture of F#. His talk spanned how they introduce F# to non-functional programmers, a demo of F#, and some experiences in productizing it. The integration with Visual Studio was topnotch. Luke showed off how easy it is to create a GUI, handle events, and run asynchronous code. It almost made me wish I was running Windows, it looked so nice. The power of F#, to me, was making GUIs. The language looked like it had to be weakened a bit in order to successfully exist in the .Net ecosystem but if I ever find myself working on Windows I will gladly use F#. How much will F# be adopted by mainstream programmers? Who knows, I'm hoping quite a bit though.</li><br /><li><b>Scaling Scala at Twitter</b> - I knew Twitter used Scala but I did not realize they were such a large Scala shop. Scala was another language that seemed to have good representation at CUFP. I am still not sold on it but people seem to be doing great things with it. This talk was mostly about experiences in building the geolocation in Twitter. It was impressive that geolocation was built very quickly by two engineers who had no Scala or Java experience. There were two takeaways from this talk. The first is that the data center is the new computer. When you are designing a distributed application you really need to think differently about it than you would a non distributed application. This should not be a surprise if you really think about it but the emphasis seemed to be that in many cases people don't realize there is a difference. The second was that we should be honest about GC and realize it is a leaky abstraction. It would be nice if the application could get information back from the GC. The application really knows best how to handle working under heavy load and it would be nice if it could query the GC to figure out what kind of load it is under. I am not quite sure how much I buy the second one, couldn't the application monitor itself based on some metric relevant to its operations and modify its behavior based on that?</li><br /><li><b>Cryptol, a DSL for Cryptographic Algorithms</b> - This was from the people at Galois. I don't know much about Galois other than dons works there and they do Haskell, but it looks like they get nice government contracts too. I had no idea how complex the world of cryptology is. I knew the algorithms were sophisticated but not the rest of it. Cryptol seems powerful but much of the talk was over my head.</li><br /><li><b>Naïveté vs. Experience - or, How We Thought We Could Use Scala and Clojure, and How We Actually Did</b> - This talk was by Michael Fogus and my favorite. MIchael was entertaining and insightful. Most of this talk was about Scala and it included why they moved to Scala from Java, what they expected to use in Scala, what they actually did use in Scala, and the problems with Scala. Michael talked a lot about how he convinced his team to move to Scala as well. The experiences were positive but it did take a lot of convincing. The slides to his talk can be found <a href="http://blog.fogus.me/2010/10/02/naivete-vs-experience/">here</a>.</li><br /><li><b>Reactive Extensions (Rx): Curing Your Asynchronous Programming Blues</b> - Sadly Erik Meijer was unable to present this. I forgot to write down the name of who did present it, Wes something, but he did a great job. Rx looks really cool. I don't know how it scales up in writing an application but Wes was able to throw together some interesting programs very quickly using Rx. Rx is a Reactive Programming library for .Net. In short, it treats events like a collection and you simply iterate over the collection to get events (you can even use LINQ). This makes writing even driven software easier to think about and easier to compose events together. All of his examples were in C# but, because Rx is on .Net, it can be used seamlessly with F# (is the impression that I got).</li><br /><li><b>Eden: An F#/WPF framework for building GUI tools</b> - Eden is built by the Credit Suisse guys so they were unable to actually show Eden, however a subset of its functionality was built for the talk. This showed off more of how pretty GUIs can be created with F#. WPF has really great graphics and looked easy to produce. The portion of Eden shown was using a graph-based layout to calculate output on demand. It is difficult to explain succinctly but this talk showed off GUIs in F# as well as how easy it is to create asynchronous code. F#'s two strongest points seem to be OCaml's two weakest points.</li><br /><li><b>Functional Language Compiler Experiences at Intel</b> - The speaker couldn't talk too much about what they were working on (apparently Intel is making a functional language designed to be used for their processors with many many cores) they they did have some interesting meta-things to say. The first was, even in the FP world, sometimes you just want impurity. The second was, if your FP language is going to allow you to write code imperatively, don't make the syntax terrible. In this case they were writing SML. Finally, it is harder to teach someone FP if they have programming experience than something completely fresh. In their case they were looking at 8 - 12 months before really getting a return from the people they were training.</li><br /><li><b>Riak Core: Building Distributed Applications Without Shared State</b> - This talk was great. Rusty from Basho gave a great look at the important functionality in Riak. Riak is broken into three components: Riak Core - a core library for building robust distributed applications in Erlang, Riak KV - A key-value store using Riak Core, and Riak Search - a full text search engine using Riak Core. The message here, again, was the data center is the computer. I thought Riak's usage of virtual nodes was interesting too, and it seemed obvious in hindsight. Rather than break your distributed application up by physical nodes, create a ton of virtual nodes (more than you'll ever have of physical nodes) and then map those to physical nodes. Take sharding, for example, if you map to physical nodes, once you add a new physical node you'll have to repartition your shards all over again. But if you have a few hundred virtual nodes, adding a physical node just means you have to remap some data to it and point the new virtual nodes at it, but your upstream code doesn't need to change at all. Riak Core helps take care of the virtual node mapping for you as well as how to push data around when you add or take away physical nodes.</li><br /><li><b>Functional Programming at Freebase</b> - I was excited for this talk but sadly let down. This involved rewriting Freebase's query language parser and executor from Python to OCaml. It looked more like mental masturbation to me though. Several times I found myself simply wonder why some choices were made. Many of the choices came off as wishing he were writing Haskell. In the end the speaker got a 10x speed up, which was pointed out to not be very good, and it looked like they had to go through a lot of headaches a long the way.</li><br /><li><b>ACL2: Eating One's Own Dogfood</b> - I was unable to attend this talk.<br /></ul><br /><br />I enjoyed CUFP quite a bit. It was great to meet the people I read papers from or see on videos about my favorite languages. In terms of being mainstream, Scala seemed to be making the fastest gains, most likely because it is so close to Java, it is an easy switch. In many ways I felt like we are all slowly catching up to Haskell. Many of the technical ideas presented here have already existed in Haskell for quite awhile and I could almost see the frustration on faces of the Haskell people wondering why the rest of us haven't figured out that we should be writing it. I'm hoping that next year the number of companies adopting functional languages continues to grow so we can see more examples of FP in industry at the next CUFP.Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-4348519741358344123.post-69047999031442459902010-03-31T15:19:00.004-04:002010-04-05T00:43:49.546-04:00Learn You Some Erlang renamed Learn You Some Scala<p><br />I'm excited to <a href="http://twitter.com/mononcqc/status/11395039195">announce</a> that Frederic Trottier-Hebert has decided to change the name of Learn You Some Erlang to Learn You Some Scala! This should come as no surprise to most of us. As I demonstrated <a href="http://functional-orbitz.blogspot.com/2010/03/how-much-has-scala-affected-erlang.html">here</a> Scala and Erlang are really the same language. With the growing popularity of Scala it only makes sense to target the Scala audience (whom we can thank for Erlang's actors). I got the chance to talk to Frederic about the change. When asked what finally prompted the change he said:<br /></p><br /><p><br /><MononcQc> Well yeah, I mean I was there when you first were talking with Virding about the migration of Erlang to the JVM. I'm quoted in that blog post<br/><br /><MononcQc> that discovery was pretty much a shock to me too, and so it's why I've pondered this and discussed the whole issue over #erlang on the course of the last few weeks<br/><br /><MononcQc> I picked up one of the many great books about Scala and realized that 'damn, they're the same stuff!'<br/><br /><MononcQc> Scala being bigger with the JVM being stress tested in production environment (sometimes claiming 9 nines of uptime)<br/><br /><MononcQc> I decided to do the switch.<br/><br /><MononcQc> So LYSE becomes LYSS<br/><br /><MononcQc> It's much more marketable anyway<br/><br /></p><br /><p><br />Some of the changes he has told me are upcoming:<br/><br /><ul><br /><li>OTP In Scala - How to work with some of the Scala specific OTP libraries to get better soft real time guarantees and performance</li><br /><li>Mnesia and Scala - Mnesia is written in Erlang/Scala so moving your databases should Just Work. There should be a pretty big performance increase due to the JIT too (performance improvements have been shown to be about 20%-25.4%) I'm pretty excited about this one.</li><br /><li>JVM Performance tuning - When to use <i>-client</i> and when to use <i>-server</i> will play a big part in this chapter. Frederic plans on really covering the nitty details of JVM tuning. Frederic admits that he hasn't done much work with the JVM but given the similarity to beam doesn't forsee that being a problem</li><br /><li>Java interop - No more need to use jinterface, Java interop is much easier when running on the JVM!</li><br /></ul><br /></p><br /><p><br />What does Frederic have to say about possible backlash from the Erlang community about the name change? <i>"I see none. I'm moving for the best"</i>. There you have it folks. Frederic said the rebranding is still a work in progress but he hopes to have the entire book moved over to Scala terminology in a few weeks.<br /></p>Unknownnoreply@blogger.com3