NOTE: After I posted this I thought I may have not been clear enough as to what I meant by "data independence". I am referring to data independence between physical software, not the established field of research dealing with data in DBMSs. Whoopsie ;)

One issue (amongst others) that many people would no doubt know I feel strongly about about after reading this blog is the issue of data independence.

From what I understand, data independence deals with how easy it is to move your data from one software program to another while maintaining its integrity. Whether you are moving to another program written by the same company or a different one altogether, moving your data should not only be possible but also painless; in a perfect world of course.

Date independence is important for several reasons; the most obvious is probably the fostering of competition in the software market. Competition, as with any industry in a market based economic system, forces software producers to improve their products, lower production costs and in turn not get too greedy in charging for that software. If people can easily move their data from your software to a competitors then the onus is on the current producers to keep their software at the cutting edge to keep their customers.

Another reason which is gaining more traction as the software industry matures is the idea of future proofing. It is dangerous to assume the software we'll be using years from now will be able to open all our files; whether the company creating our software will cease to exist in the future or whether they themselves will allow future versions of their software to be backward compatible are chilling possibilities. By ensuring from the beginning that you are using software that allows easy transfer of your data you have more leverage to use that data with other systems now, and more likely in the future.

There is also ample evidence that data independence also fosters collaboration. The HTTP protocol on the web itself uses (x)HTML and variants which allow one coded page to be used on a multitude of devices, operating system platforms and software. No matter whether the website you generated was creating in DreamWeaver, Frontpage, GoLive or… nano… other programs can easily access that information. The recent explosion of the RSS and Atom XML standards are also examples of this.

Of course freedom for the consumer is rarely what corporations want! By using open standards and allowing for data independence many companies believe they will have to work harder to keep existing customers because the ability of consumers to move from their system to another is too scary. The Microsoft Office suite (sorry, the Microsoft Office System) and it's proprietary "standards" have been and still are a classic example of this paranoia: if they allowed for native support of other formats and allowing for more data independence, they would have to work hard to make their software not suck, and their market share wouldn't help to defend their position anymore. Screw that!

However, I am of the belief that promoting data independence can actually help your bottom line; del.icio.us for example still leads the social bookmarking space despite their ability from very early on to export media to move to another service. Steve Gillmor on a 2004 episode of The Gillmor Gang argued that data independence can actually be a value added feature in itself and the security people feel using such services would keep them coming back for more.

This the puts open source software at a tremendous advantage; but that's for another post ;).

I'd be really interested to hear what James Ross and Dave Winer would have to say about this issue, or anyone else. Have I got thie gist of this? How easy is it to implement data independent measures in your programs? Is it really economically feasible?

Favourite this blog on Technorati! data independence, gillmor gang, microsoft office