Monday, November 21, 2005

Port Drivers

Does anyone have any good tutorials on writing ports? Google doesn't seem to return much on initial tries.

Thursday, November 17, 2005

Single exit point

Many programmers insist on only having the flow of control in their function end at a single point. I got wondering for a few minutes if any languages enforce this style of programming. Then I realized. Erlang has no 'return' operator, so doesn't Erlang enforce this? That sounds about right as far as I can see. So an Erlang function can only have 1 exit point. That should make them feel good.


But as was just pointed out to me in the middle of writing this, Erlang has 'throw'. There can be many of thoughs. So perhaps Erlang doesn't have 1 exit point.

On a final note:
Does having 1 exit point even matter? Who cares? The beauty of Erlang (in my opinion) is it almost forces you to write short concise functions. Every single one of my erlang functions fits on 1 screen easily, so does having multiple exit locations from there matter? In my opinion, no. I don't need to go searching through pages of code for that function to find it, I can see it, so who cares?

Wednesday, September 21, 2005

The Twisted Matrix

It has been a long road from where I started to using Erlang. I certainly don't plan on stopping with Erlang but so far I am quite the fan of it. On this road, which finally lead me here, I have used C, C++, Java, and Python, to name a few. I still have code for a multithreaded http I was writing in C hanging around.
I still use Python quite a bit, and one of the best tools in Python I have used is the Twisted Matrix framework. Prior to discovering Erlang (well more of introduced to it by noss, which i am deeply in debt for), I felt that Python was really *the* language for socket programming. Twisted is what made me think this.
Now, just to be clear, Erlang certainly does not make Python obsolete. Don't confuse enthusiasm for Erlang with feeling every other language is useless. I introduced Python and Twisted into our development cycle at work and I am fairly pleased with the results.
For those that don't know what Twisted is, it is a framework for developing software. What makes Twisted special is they have put a lot of work into making everything work in one thread. Making a server or client that handles multiple network connections is fairly trivial in Twisted.
So what makes Twisted work how it does? A uniform eventloop. In their case it's called a reactor. There are a bunch of different kinds of reactors in Twisted. One needs to be made for whichever kind of event loop you are using. There is a select reactor (default), gtkreactor, gtk2reactor, wxwidgets reactor, and some sort of win32 reactor (I think still under development). The good thing is, most of your code can generally be written without caring about what reactor is being used. That is, atleast, the portions that don't depend on gtk or win32 or wx.
Five years ago, when the Twisted project was first created, the authors feel that Python was the best language for the job. My question is, if Twisted was started today, would Python still be the best choice. I address part of the problem in a previous post. Part of the problem that I point out is, you have to do a lot of work to force everything into the event loop. That makes code reuse more difficult, and near impossible if you need to use a proprietary library that you can't make non-blocking.
An obvious example of this is the adbapi module. adbapi module can use any DB-API 2.0 compliant module. This is to make adbapi useful. There are quite a few DB-API 2.0 compliant modules for Python so the best thing is to make use of them. adbapi's API is not asynchronous or non-blocking though. Doing queries in the Twisted reactor thread means the entire application will stall until the query is finished. To solve this, Twisted does the queries in a thread. Twisted has to fall-back on the very thing it is trying to avoid in order to work. Python is particularly bad at threading too so Twisted tries to keep the number of threads to a reasonable amount. This means the number of queries you can have going concurrently is dependent on this.
A native implementation of the postgresql protocol has been implemented. It is called pgasync. I beleive the author has commented on how much quicker a large number of queries runs in his implementation, although I cannot find the quote. But has the same problem I pointed out before. Because Twisted is all in one thread, everything has to conform to its event loop making the implementation of the pgsql protocol useless. It had to be rewritten to work well in Twisted. This implementation is also fairly useless anywhere except Twisted. I cannot use this in an application that uses an event loop other than Twisted. Wouldn't it be nice if I could use code between projects that arn't dependent on Twisted? In a COL this is a non-issue. Simply run the database code in a process. Processes are part of the language so they are part of any program.
So, to get back to my question, would Python still be the choice of Twisted today, I don't know. I asked the people of Twisted if they still would. For the most part, it seems like they didn't quite know Erlang well enough to say if they would choose it. A few developers said one of the main reasons they chose Python was for its large standard library. In particular the 'os' module was pointed out as an important module that makes Python a good choice. However, after some research, most of the 'os' module is in Erlang's standard library as well. In my opinion, I think the quality of the Erlang standard library and OTP is quite a bit better than Pythons. Python's standard library seems to have suffered from a time period where anyone put anything they wanted into it. As a result, it is large, but a number of modules are fairly poor in quality.
Even if Python has a number of modules in its standard library that are nice for building Twisted, they still need to do a lot of work to force all of this code into the Twisted event loop. If you choose a COL, then you don't have to do any work in forcing code into a single event loop. Instead, you have to do work to build the standard library that Python has. Is that a lot of work? Yes.
One could also argue, "There are a lot more Python projects than Erlang ones so there is a lot of code being written for Python that Twisted can make use of". I think there is little contest in the point that there are more Python projects and programmers. But can Twisted really make use of this code? I think, in general, no, not without work. A lot of interesting things are probably going to involve some blocking somewhere which either needs to be pushed to a thread or rewritten not to block, such as the example I have already given.
The choice to use Python is a difficult one, I think. In my opinion, Python really does not offer much that Erlang doesn't. I think, considering the services Twisted is trying to offer, a COL would be an excellent choice for it. Twisted does a lot of work trying to make asynchronous programming less confusing but writing concurrent programs is more natural. It feels more natural atleast.
I have had a lot of success using Twisted. I don't think I'd write a socket program in Python without it. I also think I would generally not write a socket program in Python these days. Twisted and Erlang seem to be trying to accomplish much the same thing, although going about things in very different ways. In the end I think something like Erlang will win out. A COL seems to be saying, the world is concurrent so lets try to let people write in this natural way. Whereas, Twisted is forcing people to write in a rather artifical and unnatural way. Asynchronous programming works once you grasp it but even still the flow of the programs is rather awkward and hard to grasp I think.
I feel that, today, something like Twisted would be better off written in a COL. Let the work go into making powerful tools instead of massaging prewritten code into the event loop.

Tuesday, September 20, 2005

Java And Threads (Jetty)

noss in #erlang on freenode recently brought to my attention: Jetty Continuations

This is an interesting blog entry. The basic idea is, instead of using 1 thread per connection, since connections can last awhile, they use 1 thread per request that a connection has. The hope being, a connection will idle most of the time and only send requests once in awhile. The problem that they ran into is, a piece of software is using a request timeout to poll for data. So requests are now sticking around for a long time, so they have all these active threads that they don't want. So to deal with this, they use a concept of continuations so the thread can die but the request still hang around, and then once it's ready to be processed a thread is created again and the request is handled. So having all these requests hanging around that arn't doing anything is no longer a problem.
Well, this begs the question, why are you using a dynamic number of threads in the first place if you are going to have to limit how many you can even make. If the problem, in the first place, is they have too many threads running, then their solution works only for idle threads doesn't it? Being forced to push some of the requests to a continuation means they have applied some artificial limit to the number of threads which can be run. What happens then, when the number of valid active requests exceeds this limit? What then? Push active requests to a continuation and get to then when you have time? Simply don't let the new requests get handled? If they want to to use threads to solve their problem then putting a limit on them seems to make the choice of threads not a good one. Too poorly paraphrase Joe Armstrong, are they also going to put a limit on the number of objects they can use? If threads are integral to solving your problem, then it seems as though you are limiting how well you can solve the problem.

This also got me thinking about other issues involving threading in non-concurrent orientated languages. Using a COL (Concurrent Orientated Language) all the time would be nice (and I hope that is what the future holds for us). But today, I don't think it is always practical. We can't use Erlang or Mozart or Concurrent ML for every problem due to various limiting factors. But on the same token, using threads in a non-COL sometimes makes the solution to a problem a bit easier to work with. At the very least, making use of multiple processors sounds like a decent argument. But writing code in, say, java, as if it was Erlang does not work out. I think the best one can hope to do is a static number of threads. Spawning and destroying threads dynamically in a non-COL can be fairly expensive in the long run and you have to avoid situations where you start up too many threads. I think having a static number of threads i a pool or with each doing a specific task is somewhat the "best of both worlds". You get your concurrency and you, hopefully, avoid situations like Jetty is running into. As far as communication between the threads is concerned, I think message passing is the best one can hope for. The main reason I think one should use message passing in these non-COL's is, it forces all of the synchornization to happen in one localized place. You can, hopefully, avoid deadlocks this way. And if there is an error in your synchornization, you can fix it in one spot and it is fixed everywhere. As opposed to having things synchornized all over the code, god knows where you may have made an error.

I think this post most likely opened up a can of worms. I lightly touched on a lot of issues and most likely did not explain things in full. Perhaps this will raise some interesting questions.

Sunday, September 11, 2005

Initial impressions of yaws

If any of this post is incorrect, please feel free to post a comment and correct me.

I have recently installed yaws and I am looking at it to create a small website. It will be fairly simple. My previous web experience has been using Nevow with Twisted. Nevow allows you to create xml files and then the template engine extracts information from the xml file which is translated to function calls on an object representing the page. I like this solution. It keeps the code seperate from presentation which many people seem to suggest is a good idea. I see it as a positive purely based on working with webdesigners. I dislike doing web frontends so allowing someone to make that and giving them the bare minimum needed to let them call functions in my code seems rather nice. From my experiences with yaws, it seems to use a style that reminds me more of PHP. The major difference being that you can use Erlang terms to be transformed to HTML. Nevow has something similar to this called stan. Even with this though, the .yaws file is still not purely Erlang code, but requires escaping html with a tag. This style of making web applications seems fairly error prone and difficult to work with in a team.
I do think Erlang would make a fairly good web development language. The applications could easily be distributed over several nodes, not requiring one to use some sort of load balancer. A web app could scale quite well. But what can be done about the interface for programming? Some sort of XML system might work but I'm not convinced of that. There must be some more intuitive means of mixing code and presentation. I wouldn't be sirprised if someone has attempted to tackle this already, maybe something exists on google about it.

Wednesday, September 7, 2005

Parallel Project

I have a study in Parallel Programming this semester. However, the teacher is fairly lenient as far as projects are concerned. I am looking for some interesting project relating to Erlang. The leading idea right now is to create some sort of distributed computing framework. The idea would be using Erlang to communicate between nodes and then have each node communicate with a local process which does the actual computing (assuming Erlang could not handle the calculations). The major implementation would be a BLAST algorithm since that seems fairly easy to distribute and could possibly have some use at my college. Other ideas include some sort of fault tolerence framework or application which handles nodes going down well. That would seem fairly easy in Erlang so that probably would not provide a semester worth of interesting work. Another possibility is a peer-to-peer chat application where each user contributes to the number of possible people to host.
Hrmm, hopefully I'll get a better idea soon.

Monday, September 5, 2005

Bookmarks

My archive of bookmarks is slowly growing. It contains a Erlang section, a long with a number of other languages as well as lots of other URL's.

http://ortdotlove.net/bookmarks.html

Saturday, August 20, 2005

Where concurrency shines

I think that it can almost be stated as a fact that concurrency in languages that weren't designed with concurrency in mind tends to be poor. The languages I have in mind here are Python, C, C++ and similar. I have come across a few people who disagree with this statement however after questioning I have found that they have A) Not done much of anything complex with threads B) Never used a concurrently oriented language. Needless to say, I don't take their opinion very seriously. Now, obviously, you can take the time to write an application that uses threads and works well. But, with enough time, you can do just about anything, and in the time it takes to make that application one can deffinatly develope an equivalent program faster in concurrent orientated (CO) language.
Because writting decent threaded applications in other languages is so difficult there are a number of frameworks available that try to make it easier to write applications in a single thread. One of the reasons I think these frameworks will fail in comparison to a language such as Erlang for most developers is the amount of work it takes to integrate other libraries into it. For anyone that has used a framework such as Twisted, they have probably run into a situation where they have a third party library they want to use however the problem is, it blocks. For an asynchornous framework this is murder. So one has two choices. Either to run the third party library in its own thread. Obviously this is generally not what we want to do since the whole point of using the framework is to avoid threads. The other solution is to rewrite the library to integrate it into the frameworks event loop. Depending on the situation, this might be acceptable but it sure is a pain to have to do extra work to use this library. Now, a language which supports concurrency does not have this problem so much. The first solution, of running the library in a thread, works perfectly fine. You probably have 300 or 400 threads going already so it is no big deal. This makes it easy to distribute libraries for the particular language.
For a simple example. Imagine you make a really great http client in python. You can't really make a general http client because you need to take into account the various networking frameworks they might be using. If they are using twisted then it needs to integrate into the twisted event loop to be really useful. If they are using asyncore it needs to integrate into the asyncore event loop, and so on and so forth. Now take the same situation in erlang. Just throw the client in a process and you are all set. You don't have to rewrite anything. The obvious benefit of this is increased development speed.

I think it seems pretty clear that our processors and applications are moving towards more concurrent environments. Languages that can take advantage of this environment are most likely going to be the ones that make it. However I'm no fortune teller, so there is a good chance I could be wrong.

I think I tried to put too much into this one post so it might not make sense. Hopefully I got my ideas across.

Friday, August 19, 2005

Developing on the go

One benefit to using erlang compared to more traditional languages is the development cycle. Generally you write code, compile, debug, write, compile, debug, until you have something you want to use. In between compile and debug you run the program. In traditional languages you shut down the application then restart it to debug again. In erlang we can skip the 'restart' and simply load the new code in. This is assuming no bugs in the code didn't cause the entire application to crash horribly.
So basically, what is going on is, if you have a logic problem or what not in your application that you wish to fix. Outside of your running application, you edit the appropriate .erl files and recompile. For example, your application consists of a.erl and b.erl. In b.erl you have a function called mogwai.
So in our example. a.erl calls b:mogwai, and b:mogwai has some sort of error in it that does not cause the application to crash but you want to fix, regardless. You fix b:mogwai's error and you want to load the new codebase in your running application. Recompile b.erl then in the shell to your application you simple do:

nl(b).

This loads the new version of b into your application and now calls to b:mogwai will use the current version of the function. For certain applications this certinaly provides a more elegant development cycle. I think this style also alters the structure of ones code. For instance, if one writes an application knowing that errors in the code can be fixed on the fly, no longer do they necesarly have to exit nicely. Rather, the application can provide a means of restarting the portions that have crashed. The application can continue working without the crashed process/code or stall until the required portion can be brought back. I'm under the impression this is a feature supervision trees offer you. A supervisor simply restarts a process if it crashes, and reports it. By restarting, a new codebase can be loaded that fixes the error.

One final note on how code replacement works. Erlang only attempts to load a new module if the call is in the form of: module:function. For instance, if you have a process in a loop something like:


loop() ->
receive
Something ->
loop()
end.

If you reload the module that this loop is defined in, the call the 'loop()' will not load the new codebase. The common idiom is something a long the lines of:

loop() ->
receive
restart ->
?MODULE:loop();
Something ->
loop()
end.

This will load the new codebase. On a final note, any function called as module:function must be exported.
Erlang certinaly provides some interesting features. Certinaly somethign like code replacement is possible in other languages, such as python (which provides a reload function to reload a module) I think erlang provides a more elegant solution. For instance I don't think Python provides a means of a module to reload itself, especially in the middle of an event loop.
However this is not a contest between code replacement in various languages. I have modified my irc bot to allow code replacement more seemlessly, however I have not allowed a decent means of bringing an irc bot back if there is an error which causes a crash. I'll have the new code online later if anyone is interested.

Tuesday, August 16, 2005

Erlang and strings

A lot of people complain about strings in erlang. I am one of them. Right now I am under the impression a string type needs to be added to erlang. Joe Armstrong thinks that instead of a string we simply need a character type. My complaint with that is there is no decent container for the character type. In erlangs we have tuples, binaries, and lists. Tuples are meant to store a fixed number of objects and do not have operations on them to perform operations such as iterate through them. The element/2 function allows you to access an index of a tuple but that isnt' very useful for iterating through. Binaries might be nice, but there are no functions to nicely deal with binaries as strings. Lists are what we currently have and I am not pleased. Using a linked list to store a string certinaly seems unreasonable in any other language. You have to store a character value and a link to the next node. This is a lot of memory for a string. The other problem with these containers is none of them allow O(1) access to indecies as far as I know. Am I wrong here? I suppose the question then is, is that a problem? I am under the impression that one generally wants O(1) access. For instance, if you have an index in the string and need to access it and surrounding indecies repeatedly.

http://schemecookbook.org/view/Erlang/StringBasics has a quote supposedly from the sendmail people:
But Erlang's treatment of strings as lists of bytes is as elegant as it is impractical. The factor-of-eight storage expansion of text, as well as the copying that occurs during message-passing, cripples Erlang for all but the most performance-insensitive text-processing applications.

This is in reference to their load balancing software. Is this true? I am inclined to think that it certinaly uses a lot of memory used up, but most text-processing is going to require touching every character in a string anyways won't it? What exactly is performance intensive text-processing? Does anyone have any ideas? If one is going to be iterating through the string they can use a binary to store the byte values. The problem with this is that none of the string functions work on binaries. I'm under the impression a string container type will solve some problems. Using integers as the character values seems like a fine idea to me, as people like to point out it makes dealing with unicode slightly easier.

What problems would having a character type solve? Maybe in the morning I'll be able to think of something.

Identd as done as I care

The identd is as done as I'm interested in right now. It works atleast. Instead of worrying about how to figure out who belongs to an actual port I just return a random ident value for any input. However, the function it uses to make this is a variable you pass to the server so it is not very difficult to give it a different function. To run it, simply compile all the files then do
identd:start(SomePort, {random_identd, random}).

There is no clean shutdown, just kill your shell.

You can download it here.

Monday, August 15, 2005

Identd

I think the basic design for my identd is going to be:
  1. Start a process which takes a port and a function.
  2. On a new connection, start a process to handle the connection with the function given.
  3. Go back to waiting.
  4. The new process will read in the port numbers, parse them out, then call the function given with the port information
  5. The function does what it needs to and returns the information
  6. The process responds on the socket with the correct information.
I think this would work good with gen_server behavior, unforunatly I don't quite understand superivsion tree's that well. I will write it the ad-hoc way first then once I figure out the correct method rewrite it that way.
This framework shouldn't be too hard, making a correct identd function which actually gets the user from the ports might be. Basic one will just return a random identd.

IRC Bot

Here is the code for the latest revision of my IRC Bot. It isn't made to be use friendly right now so don't expect it to work right off the bat.

I think to get it started you'll need to do the following:
  1. Compile everything, be sure to add the inc directory to your include path.
  2. Start it with a node name and set a mnesia directory.
  3. Call p1_db:start(). Then p1_db:create_tables().
  4. Then call irc_bot:add_bot (Maybe it's addbot?). It takes a tuple, see code to figure it out.
  5. p1_main:start()
  6. When you are finished: bot_server ! stop. The beauty of erlang lets you do this from another node on another machine too, if you so desire.
The code can be found here.

The only other erlang irc bot I've found is manderlbot which can be found on freshmeat. I think it is a bit better designed than mine and has the intention of other people using it in mind where as mine is more of me just playing around. If there is an interest in it perhaps I will do more with it.

Initial Post

I will be posting my various erlang accomplishments here. My intended goal is to write a smtpd in Erlang. I will have various posts on that in the future. So far I would not describe myself as a very good programmer but I am working on that. I have written portions of an erlang irc bot. It does not do very much, although it has the ability to relay chats between channels on multiple networks and supports some concept of factoids. My next mini project is a plugable identd. This will be fairly small and simply provide the ability to give it a function that creates a response.
Feel free to post responses to my post, I don't mind constructive criticism. The hope is to make my projects better.