Monday, December 1, 2008

Problems of Data Portability and Data Storage

When I was in junior high, I had a teacher that had the following bulletin board post:

"The journey of a thousand miles begins with a single step." (Confucius)

For months, I have been thinking...today is the day that I start blogging but I never quite find the time. What was my catalyst-- a good posting by Tim O'Reilly to O'Reilly Radar on why he likes Twitter. One line particularly captivated me just because it is something I believed for a long time after reading a very interesting book called "The Humane Interface" by Jef Raskin. The basic premise is this: software should not be a packaged as a predefined group of functions that never changes. We live in the age of web APIs which essentially harkens back to the original motivation of UNIX...the pipe. UNIX programs (as well as Linux and Mac today) can send output from one program as the input to another program (assuming that the data format is what it is expected to be).

You may be asking...ok, that's nice, but how does that affect me? Good question. Ever since we started to use technology to manipulate data, the difficulty of data enduring after the technology dies or becomes obsolete is actually a very serious problem. I actually remember watching something on PBS many years ago that talked about such a problem. Think about the evolution of data storage (given in no formal order). Old records were stored on microfilm. Then there were punch cards, tape drives, hard drives, compact discs, solid state memory (such as a flash drive), etc. On top of that, one not only has to face the problem of physically storing the data but also being able to run the software that manipulates the data in a format suitable for that software. Everybody knows this problem better known as data portability. Many people faced it early on moving between video game consoles (i.e. "my favorite game will not play unless I keep my old console") or old DOS programs that saved data in some odd binary format that is difficult to reconstruct with another program.

Again, so what's the point? Data needs to be free from the program that created it. Functions need to be free from the software application. The Internet is at a point where you need to commit to a platform in order to share and manipulate data. This is mostly with social networking sites like Facebook or even proprietary business systems like Salesforce. There are ways to make the data move from one software application to another, but you really need to be a programmer to do that well. The question is-- do you have control of your data to manipulate it, share it, and present it the way you want? Another question is-- do you have control over access to your data? One of the secrets of the social Internet is that you have to sacrifice privacy in order to participate in this new world AND you have to trust the people who hold that data. To make an interesting (but maybe not totally appropriate) analogy. Imagine all the stuff where you live was not inside where you were living in closets and drawers but instead stored in lockers and garages in a public storage area. If you knew how to handle these difficulties, would you really want someone else to do this for you?

I am in the process of creating an open-source peer-to-peer database and programming environment called Kaleidoscope. I feel that it is going to be the first step in finding a solution to this problem. How does this solution get put together and become useful to you...well...you have to wait for the next post :) .