Using Open Source Code

When it comes to using open source code in my projects, I hesitate. Even though I may find that the code I am about to write might have already been written by someone else and is sitting in front of me, I will often ignore theirs and write mine again from scratch. I am aware that we would never get anywhere without standing on the shoulders of others, but I frequently just can't bring myself to do it.

Part of it is that I enjoy solving problems, but the real issues for me are quality control and future-proofing.

If I write something, I know how it has been written, and how it has been tested. Without developing it from scratch, it can be difficult to fully test the code. If I miss a bug, data could be lost, users would get annoyed, and the image of me and my project would be damaged.

I am also in control of the future of my own code. If open source only takes me half the way and I have to do a lot of work to customise and extend it, when the original authors release a new version I either have to fork their project and port bug-fixes, or go back and re-do all of my changes.

Don't get me wrong - I'm not saying that I'm the best programmer out there and that everything everybody else writes is inferior. I know that there is going to be rock-solid code out there written by the best in the field, but there's also going to be bug-ridden barely-tested code written by incompetent amateurs. And it's not easy to tell the two apart - I've got it wrong in the past, and I've been stung many times.

In essence if I use something written by somebody else, I am making a leap of faith by trusting that a stranger has got it right. But if it goes wrong it's my problem - it's me who my customers will see as the incompetent amateur. I see the lack of accountability and responsibility in open source development as a very real price that I don't want to end up paying.

I see this as a long-standing problem with open source and I'm sure I can't be the only person who thinks this. I find myself wondering what should be done about the accountability of open source, if anything? Should there be a more recognised and accepted method of ensuring the reliability of open source code? Perhaps more standardised and publicised internal testing and peer review? Maybe SourceForge should have a mechanism to raise bounties for fixing bugs and introducing features to financially reward developers and keep projects active?

I'll be interested to hear your comments, but for now, back to my closed-source ActionScript :)

Comments

There is a fundamental problem project forking, and when you take on a project built on top of an open source project, you are pretty much forking that project. The main problem is that if a bug is discovered in one branch that relates to code common to all, there doesn't appear to be a good way of transmitting information about that change across all branches that have that code in. Essentailly, when you fork an open source project, you're taking on the development of the entire code below what's new, as well as what is new. As you say, you either port bug fixes and patches from the original version, or you assume that the version you forked is perfect. Obviously the second of these isn't true, so it has got to be the first.

A good compromise would have to be something like porting all the patches that relate to the particular version of the project that you forked (1.4.1 for example). If there's a major new release (1.5, or 2.0), you have to ignore it, or try to implement it underneath what you've written and hope it doesn't break anything. Sure, you'll lose out on functionality that newer versions offer, but as long as the old version is fairly stable, it shouldn't require too much changing.

I know what you mean overall. I got offered the chance to take on a project that involves coding over the top of something open source today. I turned it down, because it wouldn't have been interesting enough.

You're already building on top of open source though - you use Perl, PHP, Apache and Linux, all of which live outside your control.

I've always based my selection of open source components on the health of the community around the product. If there's an active mailing list with lots of other people using it successfully, it's probably a safe bet.

For code that is reused at a higher level, the key thing I look for is clearly defined APIs with a promise of backwards compatibility for future releases. That minimises the chance of future problems after updating to later releases.

These days, I tend to look for projects with a comprehensive set of unit tests that I can run myself - although I have to admit I haven't yet become disciplined enough to use them for all of my own development. The 2000 unit tests that come with Mark Pilgrim's feedparser library ( http://feedparser.org/ ) are a fantastic indication of programming quality a commitment to API stability in the future.

Andrew: It's a difficult. You could diff your changes against the original and try to apply it to future versions, but if they change too much then you'd have to go through and make the changes again manually. I started writing an application to look after these changes, but then realised that the changes to the original would just be too large to pick up on without intelligence. Unless the original project writer can foresee all possible modifications and offer a hooking mechanism for your changes, it seems like an impossible problem to solve.

Simon: Things like Perl and Apache have large enough communities that I feel that any bugs will be noticed and fixed, but even so, I am evaluating the quality of the project on its popularity, not its development procedures. Although for large projects with a massive following this method of QA is not a problem, perhaps it is something that should be addressed by smaller projects, and considered by the people who use them.

More importantly though, I am only using their source code as a platform to build upon, and we are separated by a well-defined API - I don't worry so much about that. My main concern about using open source code is when I'm going to be using it as part of a project; in fact the thing that made me write the entry was something you linked to, the Dojo WYSIWYG HTML editor.

A decent editor is a significant part of the new CMS framework I'm working on - in fact, it will probably be the biggest change from my current CMS that my existing tech-illiterate clients will see. As a result I've looked at what's around and haven't found anything right, so have been developing my own editor on and (mostly) off for a few months now. The Dojo editor does at least some of what I want my editor to do, but in the minute or two that I played with it, I spotted some significant bugs, which implies a lack of attention to detail in the development process.

The other reason I won't be using it is that it's part of a larger system that I don't need or want to use. I'd have to do a lot of work to get it integrated into my system, and I'd need to extend it to get all of the functionality I will require. Although they've got a good API that may well not change, I'd need to extensively rewrite their internal code, so you could say I'd be making a fork. However, with the amount of work that I'd probably have to do, I may as well just write my own editor from scratch - at least that way I'd know how it works, what to test, and where to go to fix it. Actually this leads me on to another question, but I'll ask that in another entry when I have more time.

The feedparser tests are fantastic - more of this! I've never seen anything like that on an open source project, but it's a perfect example of what I was thinking was needed. I'd feel confident using that in whatever I needed it for. I'd like to see more projects documenting their testing, even if it's just a table to say what they tested. Although for some things, any documentation would be a start :)

Mark (Swannie)

Over 2000! Holy fricking crap!

I need to look at that to see how he's organised them.

I seem to be repeating myself *a lot* in my unit tests. Looks like I might be writing about 150 unit tests for my current component at work. Unit tests are great. I seem to be testing the same things for quite a few methods. And it's not just something I notice, but some more senior members of my team too.

Test driven development rocks! More TDD in opensource for sure.

Tests are for wusses who doubt their genius at coding.

:p

Leave a comment