Karl Hans Janke Kollaborativ
Heute die Welt, morgen das Sonnensystem!

dec 2009

Closed algebraic data types for Ruby

Today, a little detour off the path to world domination. Or maybe not; who knows.

I use Ruby at work and I guess it's pointless to deny that I regularly cringe at it for its rude failure at being just like Haskell. Even though, I have to admit that it is rather flexible:

Algebraic data types in Haskell look like this:

data Expr = Lam String Expr
          | App Expr Expr
          | Var String

It says that there are three kinds of expressions, called lambdas, applications and variables, consisting of the given data fields (e.g. a variable name and a body for lambdas, and so on). Operations on such a type are defined by giving cases for each kind of argument.

Off the shelf, there are no algebraic data types in Ruby. Google didn't find anything, so I thought about it. With the right plumbing, it's possible to do the following:

class Expr
    ctor :lam, :var => String, :body => Expr
    ctor :app, :op => Expr, :arg => Expr
    ctor :var, :name => String
end

Not much longer and only slightly uglier! \o/ And with type checks, too.

The result is a class Expr with three singleton constructor methods, similar to the usual new. Each takes a single hash as its argument that assigns values to the relevant fields. All fields must be provided and their types must match the definition. For example:

id = Expr.lam(:var=>"x", :body=>Expr.var(:name=>"x"))   # (\x.x)

The ctor method defines accessors to the constructor tag and each field. A typical method definition on Exprs would look like this:

class Expr
    def eval(env={})
        if is_lam? then
            self
        elsif is_app? then
            f = op.eval(env)
            if f.is_lam?
                x = arg.eval(env)
                f.body.subst(f.var,x).eval
            else
                raise "operator is not a function"
            end
        elsif is_var? then
            env[name] || self
        end
    end
end

Obviously, this is in contrast to the object-oriented way of distributing operations on different kinds of things over respective (sub-) class definitions. Aside from being a matter of taste, I prefer the above in this case, because the definitions of evaluation for different kinds of expressions have to fit together. Their interaction is easier to view when the definition is in one place instead of scattered across multiple classes.

To the plumbing: Everything happens in ctor, which is singleton method of Class, so it can be used in a class definition (similar to convenience methods like attr_reader and such). The routine looks like this:

class Class
    def ctor(ctorname, argtypes)
        # add singleton constructor method to self
        self_ = (class << self; self; end)
        self_.module_eval do
            define_method(ctorname) do |argvalues|
                x = allocate

                # check argument types
                # [...]

                # fill instance variables of x
                x.instance_eval do
                    @ctor = ctorname
                    @args = argvalues
                end

                x
            end # constructor
        end # singleton methods of our ADT

        # inspection methods
        attr_reader :ctor
        attr_reader :args

        # define accessors and convenience methods on self
        # [...]
    end
end

The full source code along with a complete lambda evaluator is in the file attached below. The evaluator also supports destructive in-place updates, effectively yielding graph reduction.

Note: The copyright notice states isc license, meaning do whatever you want as long as you leave the notice intact.

Attachment: adt.rb

PS. Happy Birthday, you know who you are.