Tuesday, January 6, 2009

Associative and Reactive Data

Last post...which was a long time ago...talked about the importance of data portability by media and by platform. This post talks about how Kaleidoscope can put everything together...hopefully explained in fewer words than the last post (unless you love essays).

Kaleidoscope is designed to put data and function together. The ideas that people have generally revert to language which implies text representation. Additionally, things (ideas, objects, etc.) need to be unique within a certain scope, and these things have properties and can be categorized. Think of a building on a city block-- it has a street number and the building can have properties (height=1000 feet, doors=20, color=silver, etc.). Finally, things can relate to other things through some sort of association. With a different example, Alice and Bill can be related through a "friend" association where the friend association maps a thing from a person category to another thing in a person category. Mix in a touch of web services to allow for the construction of small programs consisting of small, distributed functions. To the more attuned reader, what I just described is basically an informal description of semantic web. I used to like to use the word semantic web until people would either just look at me blankly or see if I meet the criteria for the formal "Semantic Web". I am starting to become concerned that the term "semantic web" is the new "artificial intelligence". Making a long story short...both terms have tremendous hype surrounding them, and unless they prove the hype, anything less than meeting that expectation seems like a failure.

You may ask again, why is this necessary? Well, everyone, whether they know it our not, needs a database. Files on your computer are essentially a hierarchical organization of "flat files". It is hierarchical because you have folders of folders that contain files. The files are "flat" because they mostly lack structure (excluding XML files or HTML files). Have you ever noticed how many files you now have on your computer and how it is becoming increasingly difficult to find things. Furthermore, what if you wanted to mix data from individual files together to make something different. Cut and paste is so 1980s. Active objects tied to a platform is so 1990's. People need databases...except not traditional relational databases. Traditional relational databases rely on knowing structure from the start. It's kind of like interconnecting a bunch of columns in different worksheets. If there is a change to the structure of the worksheet, then things can get a little messed up. New column-store databases like Hadoop and BigTable are the way to deal with large, unstructured data. The old way of having a row with a unique number and then tying columns to a particular worksheet/table are not going to be as useful as before. The new way (something like Freebase and other RDF stuff) will be to create a unique ID and tag it with categories and property-value tags. In this way, one could have the following types of queries:
  • Find all the items in a category.
  • Find all the items having a property and filter according to a range of its values.
  • Find all properties of a given item.
  • Find all categories in which an item belongs.
Another interesting thing can be found here...did you ever notice that I could be describing a programming language. When someone writes a program, one creates objects of a certain category/type and assigns values to different properties of the object whose values are also objects of a certain category/type. Taking the next step, what if your file system was really a giant database with innumerable views. Imagine a word processing document that has different views-- one for appearance/formatting, one for grammatical structure, one for just sequentially listing all of the words and punctuation, one for reviewing/annotating. Now imagine writing a program that can work with one or more of these views. What new information could you generate from the grammatical structure? How could you translate a screen appearance to a spoken word "appearance"? What if there are "database-like triggers" in the system that react when a value is changed or a value is generated? What if these triggers trigger other triggers in a distributed system?

This is the new world without walls. Information and function have no boundaries. The world becomes like a box full of Legos...just put the pieces together (for a similar idea, check out "A Playful World by Mark Pesce). For example, one could use a word processor to write e-mail or extract the text from an e-mail into a photo or drawing without copy and paste. Imagine receiving a text message from your 3D modeling program saying that your rendering job is done and someone has commented on the final animation. What if your phone rang and a voice read a new e-mail to you?

You can do this now with lots of programming with web frameworks and web services...but what about a system for everyone else. Kaleidoscope can make this type of system accessible to people.