In my projects, I tend to start with a simple analysis of a limited dataset,
then incrementally expand on it with more data and deeper analyses. This means
each time I update the data (e.g. add another species' protein sequences) or
add another step to the analysis pipeline, everything must be re-run -- but
only a small part of the pipeline actually needs to be re-run.
This is a common problem in bioinformatics:
http://biostar.stackexchange.com/questions/79/how-to-organize-a-pipeline-of-small-scripts-together
How can we automate a pipeline like this, without running it all from scratch
each time? This is the same problem faced when compiling large programs, and
that particular case has been solved fairly well by build tools.