Open Source and Software Estimation

8 01 2008

Software is hard , estimation is even harder (when possible ). There are many other articles on the subject. I am not qualified and will not try cover how to do it.

I will just share some of my insights on the subject from my experience in software development. Especially in the case of adopting open source projects within the project, and the unavoidable uncertainty that comes from using/relying on third part software.

Open source

I first understood the importance of open source at a time I barely knew anything about it. In my young arrogance, I was trying to build a simple 2D CAD system as side project on my free time. At that time, all my knowledge about graphics libraries was the canvas component from Delphi (lol). While debugging my application I hit a black box. The bug was not in my code but inside “that” compiled library that i could not look inside. I had no way to solve the problem. Maybe I was just using it on a wrong way but without looking inside that black box I could never figure out what was wrong…

I will not discuss if an organization should use open source or not. But me (as a developer) will always choose to work with open source.

I will use the terms “Black box” to describe a program where the user doesn’t see its inner workings. And “White box” where the user does see its inner workings.

White box

What i really like about the engineering software projects that I worked on during my college days was that there was almost no external dependency. Everything in C or C++ most of them dealing with mathematical models.

Starting a system from scratch, not bounded to any framework and free from learning how to use a library is great. Using a handy lib that do what you want is good. But programming a system that doesn’t use any lib is more fun. Of course that doing everything by yourself, what you can do is quite limited.

I was happy to work using an open source lib in my first real programming job. So my first task was something like this: this library can do “this” and we need “this” more “that” bit.

At first I did some trial and error. I mean, I was not really understanding what was going on “inside”. Soon it became clear to me that I wouldn’t achieve anything this way, so I dove into it. But I was completely lost. I was still in college and my biggest project had 4k lines of code, and this was much bigger. I also lack knowledge on the subject and as most open source projects the documentation was not part of its strength.

So I spent 10 (or 20) times longer understanding that huge amount of undocumented code than coding itself. Ok, the time I took to understand (a small fraction of it) is insignificant compared to the time I would take to write it (if I would be able to write it) so not that bad.

When you need to use something that will not act as a black box for you. You need time not only to learn how to use it. You also need time to understand how it works. This is obvious. I am not claiming I did a great discovery but a lot of people forget to take this into account.

So the first thing that needs to be estimated is how long it will take to learn X library or framework. Things are really hard to estimate here because most of the time you want learn just enough to execute your task. Important factors are size, quality of documentation and code. Before really reading the code and documentation how can you know its quality? So how can you even estimate the time you need to learn it?

And only after learning it you can have a reasonable idea of how long it will take to accomplish a certain task. In practice you will start doing your task and learning X framework at the same time. Often the result is a prototype.

Black Box

When you decide to use a library as a black box, user documentation gets even more important. Finding out if X library will suite all your needs is a really tricky task. Even if it has a good documentation it is very hard. But it is worst, some libraries claim doing stuff that they don’t really do, or just plan to do. And not all of them has the practice of documenting their bugs.

So a small task or bug using library X can easily turn from one hour to one week. The transition from a black box to a white box where you have to understand its internals is very time consuming.

Most of the time, assuming that you will need a white box understanding of all involved parts makes the project not feasible. The user base and stability of the library play a major role here. Also very important is how innovative is what you trying to do. If you are pushing the library to new limits you must probably will be the first to see some bugs. More on this later…

I guess the best you can do here is to assume that some black to white box will happen.

Frameworks

From wikipedia “A software framework is a re-usable design for a software system”. A framework is a result of extracting the common parts of a well known system pattern. They are great doing what they were planned to do. They give a huge boost on productivity, and you can do those amazing screen casts in “20 minutes”. But of course they don’t do magic. I guess to a non-developer they give an idea that everything can be done easily. So remember that whatever need to be done that your framework does not provide will take much longer than if you have a framework aid for the task.

Another great advantage of using frameworks is that they give you the basic design of the application. And it usually means a successful design. Whenever you build an application without relying on a framework, you need to design it. I guess no software developer would be arrogant enough to think that they can design something but trivial in the first attempt.

Most probably your basic design will need to be written and re-written sometimes while you learn about it, until you achieve a good enough design. So plan to write it more than once. The complexity of your system is the key point here, a good design is the result of planning and a lot of work. Who never heard the quote “System design is one percent inspiration, ninety-nine percent perspiration”?

Not exactly this

And the worst can happen. After working for a while with X you can find out that it is just not suitable for your requirements. And what happens if there is no Y to put in place X? You get one more project to develop, for FREE! I hope you didnt burn half your schedule to get at this point. Well, thats why checkpoints are there. In case like this is when you have to think again if developing the project is worth or not and make it clear to all stake-holders.

This mostly probably happens when you are developing something new, pushing the state-of-art of your domain, or just from the library you are using.

Conclusion

Every new X library/framework that you incorporate to your development stack is like a small R&D project. Research and Development are harder to estimate but it is easy to understand that they do consume more effort than a “normal” project.

Open source is great. It makes possible small teams to compete with big companies, but wait big companies also use open source… So don’t underestimate (too much – everybody always do anyway) development time, open source does not solve all your problems like magic.