Comments on functional orbitz: Cryptogram solver

http://www.blisstonia.com/software/Decrypto/

2008-04-02T16:52:00.000-04:00

http://www.blisstonia.com/software/Decrypto/

I think the statistical model breaks down a bit in...

2008-03-24T01:22:00.000-04:00

I think the statistical model breaks down a bit in that situation. But it depends on what you mean by 'irregular'. If you define an irregular series of words as those that contain letter counts that do not fit what you would statistically expect, then sure there is an issue. If you define 'irregular' as a series of words that people just don't use very often, the letter frequency there might still be in the range you expect.

How would a statistical model work with a high num...

2008-03-23T21:35:00.000-04:00

How would a statistical model work with a high number of irregular words?

The target text that I'm trying to solve are t...

2006-11-24T10:55:00.000-05:00

The target text that I'm trying to solve are the cryptograms that appear in most Sunday American newspapers. I'm not sure a statistical model would work so well for those since the input is generally only a sentence or two.

My solver has a similar problem in terms of outputting a lot of things that have 100% valid words but they make no sense. Perhaps instead of understanding the grammar one could have a heirarchy of what matches seem better than others. For instance if you solve something via the large words that have fewer matches in your dictionary perhaps those would be better matches than those that have lots of choices. SO if solving those words leads to a full dictionary perhaps that is a better match, but then that runs into the issue of using some odd word that isn't in the dictionary.

Of course, the traditional method without brute fo...

2006-11-21T12:38:00.000-05:00

Of course, the traditional method without brute force lookups is to use letter frequencies. In English the most common letters are roughly ETAONIRSHDLU (there are variants and you can build your own frequencies easily from open source text).

The manual technique also involved recognizing patterns. For example in Dustin's phrase the two letter word "ko" is also the ending for "Wjko". "This is" could be a good guess to get things started.

The method was generally:

1) Count letter frequencies
2) Count two-letter frequencies
3) Note any subsequences that are also words
4) Apply any constraints of #2 and #3 that are unambiguous or only have 2 or 3 choices.
5) Try letters using the frequency expected vs the frequency observed, replacing all occurrences
6) Fill in words with 1 or 2 or 3 letters missing.
7) Backtrack if you get stuck.

I did a similar project in ocaml. I didn't do...

2006-11-21T02:21:00.000-05:00

I did a similar project in ocaml. I didn't do the ``abstract word'' thing until later on in the evolution of the development, but instead started by counting the letter frequencies and sorting the words by cost to isolate the most valuable word to solve first and then figuring out the additional mappings from there.

For example, given the input:

Wjko ko f weow oerwerce wh grocxfsbme

It'd decide to process ``oerwerce'' first because there were five unique letters, many of them existing in other words. There were also only 5 matches given my word classification (abcdbceb) once I added that.

Then, of course, for the rest of the words, I just have to fill in whatever letters are left and check all the remaining words against the dictionaries I loaded by classification.