Karl Hans Janke Kollaborativ
Heute die Welt, morgen das Sonnensystem!

Command line flags for Haskell scripts, Really Cheap & Simple (TM)

How often do you find yourself in the following situation?

Having written a nice little program, you just want to add one little command line option. Typical example: -dprint debugging output. So all you want to say (in code) is something like this:

If we were given the '-d' flag on the command line,
... [print/enable debug output]

But what you actually have to do is usually this:

Seem a bit much? Not that there is anything wrong with the above in principle. These libraries are very useful for writing compilers and faithful reproductions of historic UNIX utilities. But for your everyday script? I think it's annoying.

So, speaking of returns to simplicity. These are for pasting into your next Haskell script:

import System.Environment

-- pesco's really cheap and simple flags and options (tm)
clparts = getArgs >>= return . (\(a,b) -> (a,drop 1 b)) . break (=="--")
getargs = clparts >>= \(a,b)-> return ([h:t| h:t<-a, h/='-' || null t] ++ b)
getflags = clparts >>= \(a,_)-> return (concat [t| '-':t <- a])
getflag x = getflags >>= return . elem x
getenv f v x = catch (getEnv v >>= return . f) (\_ -> return x)

Here, have some type signatures, too:

getargs  :: IO [String]
getflags :: IO [Char]
getflag  :: Char -> IO Bool
getenv   :: (String -> a) -> String -> a -> IO a

Usage of the functions is pretty apparent from the types. Don't look anything up. Please!

From five lines you get:

You buy simplicity at the expense of comprehensiveness. What you don't get:

PS. I did make an elaborate command line parsing library a few years ago. Suitable for building big compilers and stuff! You might still find it in the caches of the intarweb if you're interested. Reading the original abstract, the goal was similar then: Make it really easy to get command line options into programs. Alas, using that library still suffered from problems outlined at the top of this post: Find it, look through the docs, add a big bunch of code to your project, use an elaborate API. I think it was a very good replacement for the standard GetOpt; considerably nicer, with many fancy features. But probably overkill for everyday use in scripts or small programs.

A command-line organizer for getting things done

In case anyone hasn't noticed: I have a thing for simple solutions.

So here is the third iteration of my personal to-do tracker. Previous versions held each task in a separate file, possibly along with a verbose description and several metadata fields.

This one is a return to simplicity. As a special note, I am particularly proud of defaulting to the venerable .plan file to hold the task list. I've always wished I had a proper use for it!

From the README:

basic operation

  • keep a list of tasks in ~/.plan, one per line. example:
     do something
     do something else
    
  • annotate them with due dates as appropriate
    12.7.2010! do something
    do something else
    
  • arrange roughly in order of priority
  • group tasks by context, i.e. what could be done together
    do laundry
    scrub floor
    
    visit parents
    do something on way to parents house
    do something at parents' house
    
  • mark current tasks / next actions
    > laundromat
    scrub floor
    > read chapter on poly types
    
  • stall tasks until a certain date
    2010-08-20> bday present for dad
    2010-08-25> bday present for sis
    
  • use hashtags to name projects
    code up first prototype of new #watnu
    blog about #watnu on #khjk
    
  • watnu will warn when a project has no tasks active or scheduled.
  • use generic tasks to keep projects on plan:
    keep blogging on #khjk
    keep coding: #watnu #bitlbeeotr #noooo
    #home #friends #school
    
  • multiple date formats allowed. examples (all 1st of may):
    2010-05-01! iso date
    1.5.2010!   german
    5/1/2010!   us
    
    1.5.! german w/o year
    5/1!  us w/o year
    
  • delegate tasks to other people, schedule activation as reminder
    10.7.> [mom] do laundry
    [timmy] scrub floor
    
  • call watnu to get your current todo list. order and grouping will be preserved from input.
  • set PLAN environment variable to use a different input file
  • set PLAN="-" to read from stdin

The concepts are my take on ideas from David Allen's GTD. These are some data points on the design:

The program is just short of 180 lines of Haskell code. See the README file for build instructions.

NB. I have found it a surprisingly nice routine to print a fresh todo list on a PocketMod each morning, so I'm throwing in a dirty shell script as a free bonus.

Get it here: http://code.khjk.org/watnu/

Fibers for Ruby 1.8 in 42 lines feat. call/cc

Actually, stripped of comments and stuff, it's even less than 42 lines.

Fibers are a big new thing with Ruby 1.9. The name is supposed to suggest that they're the thinner things inside a thread. You create them with a block to execute but they won't run in parallel. Instead they are explicitly entered and left via resume and yield. The first call to resume enters at the top of the block. Calling yield from inside the block jumps back out and the next resume jumps back inside. Repeat as often as you like.

f = Fiber.new {
    ...
    Fiber.yield 23  # returns 5
    ...
}
f.resume            # start it up; returns 23
...                 # control transfers back here after "yield"
f.resume 5          # run the rest

The two can pass arguments to each other like regular procedure calls. So it's indeed like threads as there are independent control flows. But execution switches only explicitly and intercommunication is much more direct.

Today I found out what they are really useful for: To separate some very sequential business logic (do A, do B, do C, finish) from the event-centric tangle of a GUI toolkit. A fiber does A, B, and C in sequence. Where does means it starts the job in the background, registers an event handler, and immediately yields back to the GUI event loop. When the job finishes, the handler resumes the fiber and it goes on to the next step.

Ruby 1.9 adds fibers as a primitive, but they are also easily implemented in terms of call/cc - call with current continuation, which Ruby has had for no idea how long. So here goes, a drop-in substitute for 1.9's Fiber class:

class Fibr
    @@fs = []   # a stack of fibers corresponding to calls of 'resume'

    def initialize(&block)
        @k = lambda(&block)         # lambda makes 'return' work as expected
    end

    def resume(*xs)
        @@fs.push(self)
        jump(xs)                    # jumping into fiber
    end

    def self.current
        @@fs.last
    end

    def self.yield(*xs)
        f = @@fs.pop
        f && f.send(:jump, xs)      # jumping out of fiber
    end

    private
    def jump(xs)
        callcc { |k|
            destination = @k
            @k = k
            destination.call(*xs)
            @@fs.pop
            @k.call                 # return from the last 'resume'
        }
    end
end
Fiber = Fibr if RUBY_VERSION<"1.9"

Download: fibr.rb

Really simple distributed DNS

There is a project, originating within the CCC, to link hackerspaces around the world into a common VPN. Within this, we would like to provide DNS; probably under a fake TLD, say .hack.

Well, you know, we like decentralized systems…

This post is about a prototype distributed system that, give or take a few design iterations, could actually be useful for decentralized name resolution.

Desirable properties:

At the conceptual level, the network's nodes represent any kind of communications medium. I.e. anything that messages can be sent onto with the expectation that all entities also on the medium will receive them.

Concretely there are two kinds of nodes:

Messages also come in two flavors:

When a station receives a message on a low-level link, it starts unwrapping the tunneling headers and propagates the message to any neighbors it has within the respective nodes.

Given the above behaviour, it is possible to construct networks of nodes interconnected in rather arbitrary ways. For the purpose of emulating DNS, and probably many others, it is useful to think of nodes as either representing hosts or being formed by joining (federating) a number of subnodes. The federations would correspond to DNS domains.

Determining how each station's table of node neighbor connections should look is left as an exercise to the reader. ;) The attached code contains a number of such context tables for an example network, pictured below.

rsddns-proto.klein.jpg
Diagram of a hypothetical network showing federations (circles with green labels), hosts (squares), and low-level links (lines). Shown in red is the path of an example message from within a hamburg subnode to a host in cologne.

To watch the example in action, run each of the following commands in a separate terminal:

$ ghc rsddns-proto.hs -e 'mainloop 2001 laptop_ct'
$ ghc rsddns-proto.hs -e 'mainloop 2002 daddel_ct'
$ ghc rsddns-proto.hs -e 'mainloop 2003 router_ct'
$ ghc rsddns-proto.hs -e 'mainloop 2004 turing_ct'
$ ghc rsddns-proto.hs -e 'mainloop 2005 b1_ct'
$ ghc rsddns-proto.hs -e 'mainloop 2006 b2_ct'
$ ghc rsddns-proto.hs -e 'mainloop 2007 k1_ct'

Then inject the test message from an interpreter prompt:

$ ghci rsddns-proto.hs
> s <- testsocket 5555
> testsend s "127.0.0.1" 2001 $ Mtun "pesco.hh.ccc.hack" $
      Mtun "hh.ccc.hack" $ Mtun "ccc.hack" $ Mtun "koeln.ccc.hack" $
      Mtun "k1.koeln.ccc.hack" $ Mstr "hello world!"

Note the DNS-like hierarchical node IDs and how the nesting of messages reflects the path through this hierarchy from pesco.hh.ccc.hack to k1.koeln.ccc.hack.

There is a 1s delay after each line of output so it's easy to watch the messages propagate. The destination node only prints out the message in this instance, but it would be trivial to include a return address and have it send back a useful answer (e.g. the station's IP address).

You will see a lot of backscatter of messages. This is due to the fact that the stations only perform a minimal check whether they have already seen a particular message. A proper implementation should avoid more.

Appendix: implementation prototype (Haskell): rsddns-proto.hs

Closed algebraic data types for Ruby

Today, a little detour off the path to world domination. Or maybe not; who knows.

I use Ruby at work and I guess it's pointless to deny that I regularly cringe at it for its rude failure at being just like Haskell. Even though, I have to admit that it is rather flexible:

Algebraic data types in Haskell look like this:

data Expr = Lam String Expr
          | App Expr Expr
          | Var String

It says that there are three kinds of expressions, called lambdas, applications and variables, consisting of the given data fields (e.g. a variable name and a body for lambdas, and so on). Operations on such a type are defined by giving cases for each kind of argument.

Off the shelf, there are no algebraic data types in Ruby. Google didn't find anything, so I thought about it. With the right plumbing, it's possible to do the following:

class Expr
    ctor :lam, :var => String, :body => Expr
    ctor :app, :op => Expr, :arg => Expr
    ctor :var, :name => String
end

Not much longer and only slightly uglier! \o/ And with type checks, too.

The result is a class Expr with three singleton constructor methods, similar to the usual new. Each takes a single hash as its argument that assigns values to the relevant fields. All fields must be provided and their types must match the definition. For example:

id = Expr.lam(:var=>"x", :body=>Expr.var(:name=>"x"))   # (\x.x)

The ctor method defines accessors to the constructor tag and each field. A typical method definition on Exprs would look like this:

class Expr
    def eval(env={})
        if is_lam? then
            self
        elsif is_app? then
            f = op.eval(env)
            if f.is_lam?
                x = arg.eval(env)
                f.body.subst(f.var,x).eval
            else
                raise "operator is not a function"
            end
        elsif is_var? then
            env[name] || self
        end
    end
end

Obviously, this is in contrast to the object-oriented way of distributing operations on different kinds of things over respective (sub-) class definitions. Aside from being a matter of taste, I prefer the above in this case, because the definitions of evaluation for different kinds of expressions have to fit together. Their interaction is easier to view when the definition is in one place instead of scattered across multiple classes.

To the plumbing: Everything happens in ctor, which is singleton method of Class, so it can be used in a class definition (similar to convenience methods like attr_reader and such). The routine looks like this:

class Class
    def ctor(ctorname, argtypes)
        # add singleton constructor method to self
        self_ = (class << self; self; end)
        self_.module_eval do
            define_method(ctorname) do |argvalues|
                x = allocate

                # check argument types
                # [...]

                # fill instance variables of x
                x.instance_eval do
                    @ctor = ctorname
                    @args = argvalues
                end

                x
            end # constructor
        end # singleton methods of our ADT

        # inspection methods
        attr_reader :ctor
        attr_reader :args

        # define accessors and convenience methods on self
        # [...]
    end
end

The full source code along with a complete lambda evaluator is in the file attached below. The evaluator also supports destructive in-place updates, effectively yielding graph reduction.

Note: The copyright notice states isc license, meaning do whatever you want as long as you leave the notice intact.

Attachment: adt.rb

PS. Happy Birthday, you know who you are.