Wednesday, September 21, 2005

The Twisted Matrix

It has been a long road from where I started to using Erlang. I certainly don't plan on stopping with Erlang but so far I am quite the fan of it. On this road, which finally lead me here, I have used C, C++, Java, and Python, to name a few. I still have code for a multithreaded http I was writing in C hanging around.
I still use Python quite a bit, and one of the best tools in Python I have used is the Twisted Matrix framework. Prior to discovering Erlang (well more of introduced to it by noss, which i am deeply in debt for), I felt that Python was really *the* language for socket programming. Twisted is what made me think this.
Now, just to be clear, Erlang certainly does not make Python obsolete. Don't confuse enthusiasm for Erlang with feeling every other language is useless. I introduced Python and Twisted into our development cycle at work and I am fairly pleased with the results.
For those that don't know what Twisted is, it is a framework for developing software. What makes Twisted special is they have put a lot of work into making everything work in one thread. Making a server or client that handles multiple network connections is fairly trivial in Twisted.
So what makes Twisted work how it does? A uniform eventloop. In their case it's called a reactor. There are a bunch of different kinds of reactors in Twisted. One needs to be made for whichever kind of event loop you are using. There is a select reactor (default), gtkreactor, gtk2reactor, wxwidgets reactor, and some sort of win32 reactor (I think still under development). The good thing is, most of your code can generally be written without caring about what reactor is being used. That is, atleast, the portions that don't depend on gtk or win32 or wx.
Five years ago, when the Twisted project was first created, the authors feel that Python was the best language for the job. My question is, if Twisted was started today, would Python still be the best choice. I address part of the problem in a previous post. Part of the problem that I point out is, you have to do a lot of work to force everything into the event loop. That makes code reuse more difficult, and near impossible if you need to use a proprietary library that you can't make non-blocking.
An obvious example of this is the adbapi module. adbapi module can use any DB-API 2.0 compliant module. This is to make adbapi useful. There are quite a few DB-API 2.0 compliant modules for Python so the best thing is to make use of them. adbapi's API is not asynchronous or non-blocking though. Doing queries in the Twisted reactor thread means the entire application will stall until the query is finished. To solve this, Twisted does the queries in a thread. Twisted has to fall-back on the very thing it is trying to avoid in order to work. Python is particularly bad at threading too so Twisted tries to keep the number of threads to a reasonable amount. This means the number of queries you can have going concurrently is dependent on this.
A native implementation of the postgresql protocol has been implemented. It is called pgasync. I beleive the author has commented on how much quicker a large number of queries runs in his implementation, although I cannot find the quote. But has the same problem I pointed out before. Because Twisted is all in one thread, everything has to conform to its event loop making the implementation of the pgsql protocol useless. It had to be rewritten to work well in Twisted. This implementation is also fairly useless anywhere except Twisted. I cannot use this in an application that uses an event loop other than Twisted. Wouldn't it be nice if I could use code between projects that arn't dependent on Twisted? In a COL this is a non-issue. Simply run the database code in a process. Processes are part of the language so they are part of any program.
So, to get back to my question, would Python still be the choice of Twisted today, I don't know. I asked the people of Twisted if they still would. For the most part, it seems like they didn't quite know Erlang well enough to say if they would choose it. A few developers said one of the main reasons they chose Python was for its large standard library. In particular the 'os' module was pointed out as an important module that makes Python a good choice. However, after some research, most of the 'os' module is in Erlang's standard library as well. In my opinion, I think the quality of the Erlang standard library and OTP is quite a bit better than Pythons. Python's standard library seems to have suffered from a time period where anyone put anything they wanted into it. As a result, it is large, but a number of modules are fairly poor in quality.
Even if Python has a number of modules in its standard library that are nice for building Twisted, they still need to do a lot of work to force all of this code into the Twisted event loop. If you choose a COL, then you don't have to do any work in forcing code into a single event loop. Instead, you have to do work to build the standard library that Python has. Is that a lot of work? Yes.
One could also argue, "There are a lot more Python projects than Erlang ones so there is a lot of code being written for Python that Twisted can make use of". I think there is little contest in the point that there are more Python projects and programmers. But can Twisted really make use of this code? I think, in general, no, not without work. A lot of interesting things are probably going to involve some blocking somewhere which either needs to be pushed to a thread or rewritten not to block, such as the example I have already given.
The choice to use Python is a difficult one, I think. In my opinion, Python really does not offer much that Erlang doesn't. I think, considering the services Twisted is trying to offer, a COL would be an excellent choice for it. Twisted does a lot of work trying to make asynchronous programming less confusing but writing concurrent programs is more natural. It feels more natural atleast.
I have had a lot of success using Twisted. I don't think I'd write a socket program in Python without it. I also think I would generally not write a socket program in Python these days. Twisted and Erlang seem to be trying to accomplish much the same thing, although going about things in very different ways. In the end I think something like Erlang will win out. A COL seems to be saying, the world is concurrent so lets try to let people write in this natural way. Whereas, Twisted is forcing people to write in a rather artifical and unnatural way. Asynchronous programming works once you grasp it but even still the flow of the programs is rather awkward and hard to grasp I think.
I feel that, today, something like Twisted would be better off written in a COL. Let the work go into making powerful tools instead of massaging prewritten code into the event loop.

2 comments:

  1. Interesting post. I use Python extensively but have stayed away from Twisted thus far because I believe writing concurrent programs is easier than the asynchronous counterpart. The topic of events vs threads is certainly interesting. Here is a link to a paper that tries to dispel some of the arguments used against the threading model which I found interesting:

    http://www.usenix.org/events/hotos03/tech/full_papers/vonbehren/vonbehren_html/

    Regarding Erlang vs Python in general, I've been happy with the Python standard library (I am curious which ones you thought were on the weak side). OTP does seem to contain a large library as well, but I am unable to vouch for its quality as I haven't used much of it because I'm still in the process of trying to learn erlang.

    Some things that I've personally had a hard time acclimating to is the lack of iteration (where is my 'for' statement), the lack of unicode and the lack of a real string type, and the supposed poor performance of strings. I do a lot of text processing (regexps) and this could be an issue for me as the data is not passed opaquely.

    Thanks for the intersting blog!

    ReplyDelete
  2. petekaz,

    Please read that article carefully. The kind of thread concurrency it is using the word "threads" to refer to is not possible with python. It is possible with erlang, oddly enough :) but in Python, twisted+stackless is closer to it than the 'thread' module.

    ReplyDelete