25.7.2010, pesco
tags: code haskell getflag cmdline watnu
How often do you find yourself in the following situation?
Having written a nice little program,
you just want to add one little command line option.
Typical example:
-d —
print debugging output
.
So all you want to say (in code) is something like this:
If we were given the '-d' flag on the command line,
... [print/enable debug output]
But what you
actually have to do is usually this:
- Get a hold of your favourite
getopt
library.
If it's not a standard library,
make sure to note that dependency somewhere!
- Find the documentation and look up basic usage.
- Specify to the library exactly which options you want to accept,
whether they take any arguments,
and provide a usage note for the autogenerated help text.
- Make a loop in your main routine for
parsing all options
.
- Finally, in that loop, state that -d enables debugging output.
Seem a bit much?
Not that there is anything wrong with the above in principle.
These libraries are very useful for writing compilers and
faithful reproductions of historic UNIX utilities.
But for your everyday script?
I think it's annoying.
So, speaking of returns to simplicity.
These are for pasting into your next Haskell script:
import System.Environment
-- pesco's really cheap and simple flags and options (tm)
clparts = getArgs >>= return . (\(a,b) -> (a,drop 1 b)) . break (=="--")
getargs = clparts >>= \(a,b)-> return ([h:t| h:t<-a, h/='-' || null t] ++ b)
getflags = clparts >>= \(a,_)-> return (concat [t| '-':t <- a])
getflag x = getflags >>= return . elem x
getenv f v x = catch (getEnv v >>= return . f) (\_ -> return x)
Here, have some type signatures, too:
getargs :: IO [String]
getflags :: IO [Char]
getflag :: Char -> IO Bool
getenv :: (String -> a) -> String -> a -> IO a
Usage of the functions is pretty apparent from the types.
Don't look anything up. Please!
From five lines you get:
- Boolean flags: Anywhere in your (IO) code, ask whether a certain flag
has been specified:
d <- getflag 'd'
when d $ putStrLn "debug on!"
Alternatively, get the list of flags once and pass it around.
flags <- getflags
when all (`elem` flags) "nlp" $ putStrLn "my favourite options!"
- Condensed flags: Throw any number of flag characters behind a single
dash
-
:
$ myps -aux
- Options with defaults: This might feel strange, but seriously:
Use environment variables.
To query, complete with reader/parser and default value:
n <- getenv read "n" 10 -- think "tail -n"
c <- getenv read "C" 0 -- think "grep -C"
-- default comes in from another IO action
today <- getenv parsedate "today" =<< getcurdate
-- cascading defaults
plan <- getenv id "PLAN"
=<< getenv (++ "/.plan") "HOME"
=<< return "-"
BTW, don't think that setting an env var is more work than a command
line argument! Compare:
$ today=2010-07-26 watnu
vs.
$ watnu --today=2010-07-26
The so-called keyword parameter
syntax of the former has been
supported since the original Bourne Shell.
- Retrieve any non-flag arguments via the getargs routine.
- Arguments that begin with a dash:
Anything after an argument of
--
is not a flag.
A single dash -
is also not a flag.
You buy simplicity at the expense of comprehensiveness.
What you don't get:
- Long flags (e.g.
--debug
).
You have 52 latin letters (upper- and lower-case) available.
Plus any digits and special characters you can sensibly make use of.
When you need more, use a real command line option parser.
You're writing a goddamn compiler or something.
- Complaints about unrecognized flags or options.
Defining the following function is left as the proverbial
exercise to the reader:
allowflags :: [Char] -> IO ()
- Command line options with arguments.
Use environment variables.
- Autogenerated help and usage messages.
Type them yourself,
they're going to be so much prettier.
Come on, add some ASCII art.
When it really is too bothersome,
feel free to define the following function:
helpscreen :: String -- textual command description
-> [(Char, String)] -- recognized flags with desc.
-> [(String, String)] -- recognized env vars with desc.
-> String
- Elaborate error messages.
Obviously there's nothing like
command line option --foo requires Frob argument, but got Twizzle
in those five lines.
Then again, nothing is to stop you from putting arbitrarily complicated
error checking in the reader you pass to getenv or
validate the order and number of things in the result of getflags,
etc..
PS.
I did make an elaborate command line parsing library a few years ago.
Suitable for building big compilers and stuff!
You might still find it in the caches of the intarweb if you're interested.
Reading the original abstract, the goal was similar then:
Make it
really easy to get command line options into programs.
Alas, using that library still suffered from problems outlined at the top of
this post: Find it, look through the docs, add a big bunch of code to your
project, use an elaborate API. I think it was a very good replacement
for the standard
GetOpt; considerably nicer, with many fancy features.
But probably overkill for everyday use in scripts or small programs.
6.7.2010 23:36, pesco
tags: haskell code watnu gtd
In case anyone hasn't noticed:
I have a thing for simple solutions.
So here is the third iteration of my personal to-do tracker.
Previous versions held
each task in a separate file,
possibly along with a verbose description and
several metadata fields.
This one is a return to simplicity.
As a special note, I am particularly proud of defaulting to the venerable
.plan
file to hold the task list.
I've always wished I had a proper use for it!
From the
README:
basic operation
- keep a list of tasks in ~/.plan, one per line. example:
do something
do something else
- annotate them with due dates as appropriate
12.7.2010! do something
do something else
- arrange roughly in order of priority
- group tasks by context, i.e. what could be done together
do laundry
scrub floor
visit parents
do something on way to parents house
do something at parents' house
- mark current tasks /
next actions
> laundromat
scrub floor
> read chapter on poly types
- stall tasks until a certain date
2010-08-20> bday present for dad
2010-08-25> bday present for sis
- use hashtags to name projects
code up first prototype of new #watnu
blog about #watnu on #khjk
- watnu will warn when a project has no tasks active or scheduled.
- use generic tasks to keep projects
on plan
:
keep blogging on #khjk
keep coding: #watnu #bitlbeeotr #noooo
#home #friends #school
- multiple date formats allowed. examples (all 1st of may):
2010-05-01! iso date
1.5.2010! german
5/1/2010! us
1.5.! german w/o year
5/1! us w/o year
- delegate tasks to other people, schedule activation as reminder
10.7.> [mom] do laundry
[timmy] scrub floor
- call watnu to get your current todo list.
order and grouping will be preserved from input.
- set PLAN environment variable to use a different input file
- set PLAN="-" to read from stdin
The concepts are my take on ideas from David Allen's
GTD
.
These are some data points on the design:
- Contexts never really worked out for me.
Instead of ordering tasks by priority,
GTD advocates grouping them by what can be done together.
While I agree with that,
trying a priori to name a good set of contexts
was annoyingly cumbersome.
- It was hard to keep an overview when tasks were scattered around their
separate files. This made the weekly review unpleasant.
- Long task descriptions are superfluous.
Sometimes there would be a note,
e.g. opening hours of a store,
but the vast majority of tasks are one-liners.
Tack-on notes can be added as a separate feature later.
- Only some things have hard due dates.
In general, it is more important when I plan to start something
rather than until when I must have completed it.
- Tasks of the
someday/maybe
category need not be tracked.
Keep them in a different file.
- There are always tasks that are definately
to do
but should not appear
on my radar, yet.
They might be scheduled for later or simply buried by
more pressing matters.
The program is just short of 180 lines of Haskell code.
See the
README file for build instructions.
NB.
I have found it
a surprisingly nice routine
to print a fresh todo list on a
PocketMod each morning,
so I'm throwing in a dirty shell script as a free bonus.
Get it here:
http://code.khjk.org/watnu/
7.6.2010 22:15, pesco
tags: code ruby fibers callcc
Actually, stripped of comments and stuff, it's even less than 42 lines.
Fibers
are a big new thing with Ruby 1.9. The name is supposed to suggest
that they're the thinner things inside a thread.
You create them with a block to execute but they won't run in parallel.
Instead they are explicitly entered and left via
resume and
yield.
The first call to
resume enters at the top of the block.
Calling
yield from inside the block jumps back out and the next
resume
jumps back inside. Repeat as often as you like.
f = Fiber.new {
...
Fiber.yield 23 # returns 5
...
}
f.resume # start it up; returns 23
... # control transfers back here after "yield"
f.resume 5 # run the rest
The two can pass arguments to each other like regular procedure calls.
So it's indeed like threads as there are independent control flows.
But execution switches only explicitly and intercommunication is much more
direct.
Today I found out what they are really useful for:
To separate some very sequential business logic (do A, do B, do C, finish)
from the event-centric tangle of a GUI toolkit.
A fiber does A, B, and C in sequence.
Where
does
means it starts the job in the background,
registers an event handler,
and immediately yields back to the GUI event loop.
When the job finishes,
the handler resumes the fiber and it goes on to the next step.
Ruby 1.9 adds fibers as a primitive, but they are also easily implemented
in terms of
call/cc -
call with current continuation
, which Ruby has had for no idea how long.
So here goes, a drop-in substitute for 1.9's
Fiber class:
class Fibr
@@fs = [] # a stack of fibers corresponding to calls of 'resume'
def initialize(&block)
@k = lambda(&block) # lambda makes 'return' work as expected
end
def resume(*xs)
@@fs.push(self)
jump(xs) # jumping into fiber
end
def self.current
@@fs.last
end
def self.yield(*xs)
f = @@fs.pop
f && f.send(:jump, xs) # jumping out of fiber
end
private
def jump(xs)
callcc { |k|
destination = @k
@k = k
destination.call(*xs)
@@fs.pop
@k.call # return from the last 'resume'
}
end
end
Fiber = Fibr if RUBY_VERSION<"1.9"
Download:
fibr.rb
12.2.2010 18:50, pesco
There is a project, originating within the CCC, to link hackerspaces around the
world into a
common VPN.
Within this,
we would like to provide DNS; probably under a fake TLD, say
.hack.
Well, you know, we like decentralized systems…
This post is about a prototype distributed system that,
give or take a few design iterations,
could actually be useful for decentralized name resolution.
Desirable properties:
- require only a minimum of local knowledge to configure the peers
- distributed self-organization instead of central administration
- avoid central points of failure
At the conceptual level, the network's
nodes
represent any kind of communications medium.
I.e. anything that messages can be sent
onto
with the expectation
that all entities also
on the medium
will receive them.
Concretely there are two kinds of nodes:
- Low-level network links.
IP-port pairs or other kinds of peer-to-peer addresses.
The idea is that these could also be Ethernet interfaces and thelike.
They carry no global name and are thought of as only locally known to a
specific station (host system).
Mesh
nodes (for lack of a better name).
Formed by connecting stations in a mesh.
Has a globally unique ID.
For each node it participates in,
a station knows how to reach its neighbors within that node.
Any other node known to the station can act as the medium
for these connections.
Putting other meshes in this list effectively creates a tunnel.
Messages also come in two flavors:
- Payload. In the example code, these consist of a single String containing
a message to be printed.
- Tunnel messages.
When a message is sent onto a mesh node,
the station wraps it with the ID of that node before sending it
down the neighbor links (i.e. nodes).
When a station receives a message on a low-level link,
it starts
unwrapping
the tunneling headers
and propagates the message to any neighbors it has within the
respective nodes.
Given the above behaviour, it is possible to construct networks of
nodes interconnected in rather arbitrary ways. For the purpose of emulating
DNS, and probably many others, it is useful to think of nodes as either
representing hosts or being formed by joining (
federating
) a number of
subnodes. The federations would correspond to DNS domains.
Determining how each station's table of node neighbor connections should look
is left as an exercise to the reader. ;)
The attached code contains a number of such
context tables
for an example network, pictured below.
Diagram of a hypothetical network showing
federations (circles with green labels), hosts (squares), and
low-level links (lines).
Shown in red is the path of an example message
from within a hamburg subnode to a host in cologne.
To watch the example in action, run each of the following commands in a
separate terminal:
$ ghc rsddns-proto.hs -e 'mainloop 2001 laptop_ct'
$ ghc rsddns-proto.hs -e 'mainloop 2002 daddel_ct'
$ ghc rsddns-proto.hs -e 'mainloop 2003 router_ct'
$ ghc rsddns-proto.hs -e 'mainloop 2004 turing_ct'
$ ghc rsddns-proto.hs -e 'mainloop 2005 b1_ct'
$ ghc rsddns-proto.hs -e 'mainloop 2006 b2_ct'
$ ghc rsddns-proto.hs -e 'mainloop 2007 k1_ct'
Then inject the test message from an interpreter prompt:
$ ghci rsddns-proto.hs
> s <- testsocket 5555
> testsend s "127.0.0.1" 2001 $ Mtun "pesco.hh.ccc.hack" $
Mtun "hh.ccc.hack" $ Mtun "ccc.hack" $ Mtun "koeln.ccc.hack" $
Mtun "k1.koeln.ccc.hack" $ Mstr "hello world!"
Note the DNS-like hierarchical node IDs and how the nesting of messages
reflects the path through this hierarchy from
pesco.hh.ccc.hack to
k1.koeln.ccc.hack.
There is a 1s delay after each line of output so it's easy to watch the messages
propagate.
The destination node only prints out the message in this instance, but it would
be trivial to include a return address and have it send back a
useful answer (e.g. the station's IP address).
You will see a lot of backscatter of messages.
This is due to the fact that the stations only perform a minimal check
whether they have already seen a particular message.
A proper implementation should avoid more.
Appendix:
implementation prototype (Haskell):
rsddns-proto.hs
11.12.2009 13:37, pesco
Today, a little detour off the path to world domination.
Or maybe not; who knows.
I use
Ruby at work
and I guess it's pointless to deny that I regularly
cringe at it for its
rude failure at being just like Haskell.
Even though, I have to admit that it is rather flexible:
Algebraic data types in Haskell look like this:
data Expr = Lam String Expr
| App Expr Expr
| Var String
It says that there are three kinds of expressions, called lambdas,
applications and variables, consisting of the given data fields
(e.g. a variable name and a body for lambdas, and so on).
Operations on such a type are defined by giving cases for each
kind of argument.
Off the shelf, there are no algebraic data types in Ruby.
Google didn't find anything, so I thought about it.
With the right plumbing, it's possible to do the following:
class Expr
ctor :lam, :var => String, :body => Expr
ctor :app, :op => Expr, :arg => Expr
ctor :var, :name => String
end
Not much longer and only slightly uglier! \o/
And with type checks, too.
The result is a class
Expr with three singleton constructor methods,
similar to the usual
new.
Each takes a single hash as its argument that assigns
values to the relevant fields.
All fields must be provided and their types must match the definition.
For example:
id = Expr.lam(:var=>"x", :body=>Expr.var(:name=>"x")) # (\x.x)
The
ctor method defines accessors to the constructor
tag
and each field.
A typical method definition on
Exprs would look like this:
class Expr
def eval(env={})
if is_lam? then
self
elsif is_app? then
f = op.eval(env)
if f.is_lam?
x = arg.eval(env)
f.body.subst(f.var,x).eval
else
raise "operator is not a function"
end
elsif is_var? then
env[name] || self
end
end
end
Obviously, this is in contrast to the
object-oriented
way of distributing
operations on different kinds of things over respective (sub-) class
definitions.
Aside from being a matter of taste,
I prefer the above in this case,
because the definitions of evaluation for different kinds of expressions have
to fit together.
Their interaction is easier to view when the definition is in one place instead
of scattered across multiple classes.
To the plumbing:
Everything happens in
ctor, which is singleton method of
Class,
so it can be used in a class definition
(similar to convenience methods like
attr_reader and such).
The routine looks like this:
class Class
def ctor(ctorname, argtypes)
# add singleton constructor method to self
self_ = (class << self; self; end)
self_.module_eval do
define_method(ctorname) do |argvalues|
x = allocate
# check argument types
# [...]
# fill instance variables of x
x.instance_eval do
@ctor = ctorname
@args = argvalues
end
x
end # constructor
end # singleton methods of our ADT
# inspection methods
attr_reader :ctor
attr_reader :args
# define accessors and convenience methods on self
# [...]
end
end
The full source code along with a complete lambda evaluator is in the
file attached below.
The evaluator also supports destructive in-place updates,
effectively yielding graph reduction.
Note: The copyright notice states
isc license
,
meaning do whatever you want as long as you leave the notice intact.
Attachment:
adt.rb
PS.
Happy Birthday, you know who you are.