A faster (but incomplete) implementation of SCons on top of doit

19 07 2011

Motivation

doit is an automation tool. It is kind of build-tool but more generic…

My motivation was to demonstrate how to create a specialized interface for defining tasks. doit can be used for many different purposes, so its default interface can be quite verbose if compared to tools created to solve one specific problem.

Instead of creating an interface myself I decided to use an existing interface. I picked SCons. So the goal was to be able to build C/C++ project using an existing SConstruct file without any modification. And of course it should be as good as SCons on dependency tracking, ensuring always a correct result.

A secondary goal was to make it fast.

Note that this implementation is very far from complete. I only implemented the bare minimum to get some benchmarks running.

Implementation

I wont go into the gory details… Just a few notes. You can check the code here.

In docons.py there is an implementation of the API available in SConstrcut files. When a “Builder Method” (like Program or Object) is executed a reference to the builder is saved in a global variable. These “Builder Methods” are actually implemented as class that can generate dictionaries representing doit tasks.

In doit the configuration file that define your tasks is called dodo.py. In this case the end user wont edit this file directly. dodo.py will import the SCons API namespace from docons, than it will execfile the SConstruct file and collect the tasks from the “Builder Methods”.

Creating tasks for compile/link is straightforward. The hard part is automatically finding out the dependencies in the source code and mapping it into account on your tasks. To find out the dependencies (the #include for C code) in the source I am using the same C preprocessor as used by SCons.

SCons uses the concept of a “Scanner” function associated with a Builder. In doit the implicit dependencies are “calculated” in a separate task. The dependencies are than put into the build (compile/link) tasks through calc_dep (calculated dependencies).

A faster implementation

It seems SCons creates the whole dependency graph before starting to execute the tasks/builders. Because of this it doesn’t scale so well when the number of files increase.

doit creates the dependency graph dynamically during task execution. But even on a no-op build it will end-up with a complete graph built because it will check all tasks dependencies.

tup is a build-tool that saves the dependency graph in a SQLite database. I decided to give it a try to its approach.  So I created “dup” – sorry for the name :) . It will still read the build configuration from SConstrcut files but it will keep a SQLite database mapping the targets to all of its dependencies. This enables a much faster no-op and incremental builds. The  underlying graph dependency of a target will only be built if required.

Benchmarks

I did some very basic benchmarks from SCons and my two SCons implementations. The benchmarks were created using the gen-bench from wonderbuild benchmarks.

I used gen-bench script was run with the arguments “50 100 15 5”.  This generates 10000 tiny interdependent C++ source and header files, to be built into 50 static libs.

All benchmarks were run on a intel i3 550 quad-core (3.20GHz), running Ubuntu 10.10, python2.6.6, doit 0.13.0, SCons 2.0.0. All benchmarks were run using a single process.

The graph doesn’t include full build time. It was SCons=266 seconds, docons=249 seconds, dup=253 seconds.

  • no-op 1 lib -> no-operation build when all files are up-to-date and only one of the 50 libraries were selected to be built (scons build-scons/lib_0/lib0.a)
  • no-op -> no-operation build
  • partial build – cpp file -> only the file lib_1/class_0.cpp was modified. rebuilt 1 object file and 1 lib.
  • partial build – hpp file -> only the file lib_0/class_0.hpp was modified. rebuild 30 object files and 14 libs.

Analysis

  1. comparing SCons and docons on no-op build you can see how doit is considerably faster than SCons on creating the dependency graph and checking what to build.
  2. comparing “no-op 1 lib” with “no-op” you can see how both SCons and docons have a performance degradation from creating the dependency graph (5.2 and 3.6 times slower respectivelly). And how dup shows little influence on no-op build relative to the size of the dependency graph.
  3. all 3 solutions have almost no difference between a no-op build and a partial/incremental build where a single source file is modified.
  4. as the number of built objects increase the advantage of dup over docons is reduced because it requires the dependency graph from the affected tasks to be built.

Should you stop using SCons?

Probably not. The implementation is very incomplete and probably buggy in many ways. This code was written just as proof-of-concept (and for fun) to check how powerful, flexible and fast doit can be.

I have personally no interest in developing a C/C++ build tool and if I would build one I would create a different interface from the one used by SCons.





doit – a build-tool tale

14 04 2008

doit is a built-tool like written in python. In this post explain my motivation for writting yet another buil tool. If you just want to use it. Please check the website

Build-tool

I started working on a web project… As a good TDD disciple I have lots of tests spawning everywhere. There are plain python unittest, twisted’s trial tests and Django specific unit tests. That’s all for python, but I also have unit tests for javascript (using a home grown unit test framework) and regression tests using Selenium. Running lint tools (JavaScriptLint and PyFlakes) are as important.

So I have seven tools to help me keeping the project healthy. But I need one more to control the seven tools! Actually there are more. I am not counting the javascript compression tool, the documentation generator…

I am not looking for a continuous integration (at least right now). I want to execute the tests in a efficient way and get problems before committing the code to a VCS.

– What tool do we use to automate running tasks?
– GNU Make. Or any other build tool.

SCons

I had the misfortune to (try to) debug some Makefile‘s before. XML based was never really an option to me. Since I work with python SCons looked like a good bet.

SCons. Writing the rules/tasks in python helps a lot. But the configuration (construct) file is not as simple and intuitive as I would expect. Maybe too powerful for my needs. Thats ok I don’t have to write new “Builders” that often.

Things went ok for a while… but things started to get too slow. Normal python tests are fast enough not to bother about it. But Django tests using postgres execution time do bother. The javascript tests run on the browser. So it needs to start the server, launch the browser, load and execute the tests… uuoooohhhh.

Most of the time i really need to execute just a subset of tests/tasks. The whole point of build tools is to keep track of dependencies and re-build only what is necessary, right? The problem with tests is that actually i am not building anything. I am executing tasks(in this case tests). Building something is a “task” with a “target” file(s), but running a test is a “task” with no “target”. The problem is that build tools were designed to keep track of target/file dependencies not task dependencies. Yes I know you can use hacks to pretend that every task has a target file. But I was not really willing to do this…

I was not using any of the great SCons features. Actually at some point I easily substitute it to a simple (but lengthy) python script using the subprocess module. Of course this didn’t solve the speed problem.

doit

doit. I want a tool to automatically execute any kind of tasks, having a target or not. It must keep track of the dependencies and re-do (or re-execute) tasks only if necessary, (like every build tool do for target files). And of course it shouldn’t get on my way while specifying the tasks.

Requirements:

. keep track of dependencies. but they must be specified by user, no automatic dependency analysis. (i.e. nearly every build tool supports this)
. easy to create new task rules. (i.e. write them in python)
. get out of your way, avoid boiler-plate code. (i.e. something like what nose does to unittest)
. dependencies by tasks not on files/targets.

The only distinctive requirement is item 4. I guess any tool that implements dependency on targets could support dependency on tasks with not so much effort.
You just need to save the signature of the dependent files on successful completion of the task. If none of the dependencies changes the signature the task doesn’t need to be executed again. Since it is not required to have a target the tasks needs to be uniquely identified. But thats an implementation detail…

So how does it look like?

look at the tutorial: