Saturday, May 6, 2006

Pastebin design - Mnesia tables

I have started to design the tables for this. The main goal is simple: make it easily extendable. That is to say I want to make it so I can easily develope it incrementally and add things to it that I didn't think of before. But that should be obvious to any developer. Given the spec of my previous post, here is the beginning of the database.

paste - This is the table that will hold the actual pastes.
pid | text | annotation | language | date

pid - unique id to identify a paste
text - the actual text
annotation - any annotation the paster wants to include
language - Plaintext/C++/Erlang, etc
date - when they pasted it

highlight - cache of pastes that have been put through the source highlighter
pid | text | last_viewed

pid - the same as
text - result of being put through the highlighter
last_viewed - to keep track of when the entry should be removed

threads - paste threads
tid | [pid]

tid - the unique id for a thread
[pid] - list of pid's

I decided to have the thread table just contain an id and list so you don't have to do a bunch of queries to get every paste in a thread. This way a paste can be part of multiple threads (Although I don't see when this will ever happen). The paste table contains, I think, the minimalist amount of data in order to be useful. Using a highlight table means we can store, if we want, every single paste with a highlighted copy of the text, or just a few and use it as a cache style system. Later on we can also add the ability to associate a paste with an irc channel or a user if we wish to track that sort of information.

I'm not sure how to generate unique id's in mnesia. In a SQL database I would use a serial data type but I am unsure if mnesia supports this functionality. If not I suppose I could use newref maybe? I need to be able to convert it to a string to make url's, and also be valid between restarts.

As I think about it, I guess serials are just implemented as a table and store the integer in there, I can implement that in mnesia I suppose. It just needs to have some sort of get_and_increment functionality so two proceses don't get the same idea. If anyone has any ideas on how to implement this let me know. Bear in mind I have not looked at the mnesia documentation yet and am just brainstorming, so it is quite possible the solution is incredibly simple.

Next step - determine what pages pages will be needed.


  1. One common way of getting unique IDs in a database is to have a sequence table. Just a table with one row and one column, a number. Whenever you need a new unique sequence, grab it out of the table and increment the value. Just make sure you get the locking right when you grab ids.

  2. I and Joe Armstrong are working along similar lines. We should compare notes down the road.

    As for global IDs, one simple way of generating globally unique ids in Erlang is {node(), erlang:now()}. For my own blogging-forum backend (which is slowly taking shape), I've decided to keep a numeric prefix per erlang node in the cluster (stored as a user property on the schema table).

    I then use mnesia:dirty_update_counter() and check out a new prefix when the counter reaches a ceiling. It's a lot more work than the {node(),erlang:now()} approach, but the IDs become much more compact and web-friendly.

    /Ulf W

  3. I believe Ulf was complementing anonymous's comment. I was just in the #erlang channel on Freenode and people were helping me out.

    Create a separate table, with just one column, which is an integer. Then call mnesia:dirty_update_counter/3 and it will increment the integer and return the new value. Then use the new value for inserts into your other tables. It should also be noted that it is convenient for use as a Globally Unique ID.