Sunday, October 18, 2009

Sneaking Complexity - Being Asynchronous


I went out to lunch with some coworkers last week and we got to talking about the project they are working on. They had done a great job of implementing the app which pretty much makes up the backbone of our company. They rewrote it in Java and have gotten orders of magnitude better performance. The app needs access to various pieces of persistent data quickly and they have created several data-clusters that perform amazingly. And the code is very clean and easy to read. The problem they are running into though is the client code for the data-clusters.




One of the main reasons that Java was selected as a language to write in is the simplicity of reading and writing it. It is fairly easy to jump into a method and, with a good IDE, figure out what it is doing. It is simple. The problems they are running into, though, is when doing things Teh Java Way isn't working out for the volume and latency they are trying to reach. Specifically, they are hitting issues when they query the data-cluster. Generally when they decide they need to hit the data-cluster they do a query and if they don't get a response back in some timeout period they go on as if they have a new record. In this case, being correct about the data all the time is not very important. But they are getting over the number of timed out queries they feel comfortable with (there is also a concern that the timeouts on the library they are using are not working properly). What they would like to do is handle hitting the data-cluster asynchronously. When they think they are going to need data, they send their query in, go do some more processing, then check to see if the data is there and continue on as if a new record if the timeout is reached. The problem is, this really sucks so far in Java. Being asynchronous is hard if you haven't designed your application around it in the first place. You pretty much have to fire up a new thread for every one of these queries you want to do. When considering a single case, this doesn't sound too bad, but the JVM uses OS threads so if you are already pushing your app to the limit, in the worst case doubling or tripling the number of threads it needs is not going to help. You also have increased the complexity of your application. Most of the threading in this application doesn't really need to share any data, each request is off doing its own thing and rarely hitting shared objects. But in this case, sharing the result of the query will need to share some data. It may not be much but it is added complexity. On top of that, there might be a performance hit in terms of GC. I'm not an expert in the JVM GC but shared memory means you might have to walk the entire heap in order to clean up the objects created by the query thread.




This brings me to something that is so great about Erlang. Accomplishing this is trivial. Since messages in Erlang are asynchronous, you simply send your message to query the data-cluster at the beginning of your block, then do whatever work you'd like, and receive it when you are good and ready. A receive block can take a timeout so you don't have any issues there. Doing things asynchronously is how Erlang works pretty much from the get-go. Erlang processes also have their own heap so cleaning up after one is fairly light on the GC, just throw away that process's heap.




To be fair, I am certain that had they implemented this in Erlang they would have run into their own fair share of issues. No language is perfect and Erlang certainly is no exception. But the application in question is basically what Erlang was designed for and is good at. There are also other issues that we talked that they would benefit from had they used Erlang too that I did not talk about. But this is a pattern that seems common in my short career as a developer. People look at the project they are solving, then look at whatever tools solve all the easy problems fast but in the end leave them in a worse state when it comes to solving the harder problems. Those tools that get you 70% of the way there have forced you to design your app so that it even harder to solve that 30%. This happened at my previous employer. A framework was chosen that implemented all the simple things they wanted to solve but they had now inherited this framework whose design was not made to solve the more complex problems. In the end they had to rewrite a bunch of the framework to get what they wanted. I'm sure that my coworkers are going to be able to solve this problem in Java (they have no choice) and perhaps there is a Java-way that this should have been done and I am sure that had they implemented this in Erlang there would still be problems being discussed over lunch. But I feel confident that they would be frustration Erlang records, or syntax, or strings, not with the sneaking complexity of trying to get it to do things asynchronously (and don't even get me started on fault-tolerance).

2 comments:

  1. "They rewrote it in Java and have gotten orders of magnitude better performance."

    what was the first version written in?

    ReplyDelete
  2. In PHP actually, so a fair increase was expected.

    ReplyDelete