RuntimeDifferencing » RDD » Component Evolution Problem » RDD Data Table

RDD Data Table

Last modified by Samuel White on 2011/02/13 21:27

The RDD Data Table is the fundamental flexible unit of application functionality and data storage. For simpler design problems, name and value pairs are sufficient, but for more complex data storage needs, tables are the answer. The RDD Resource Format is much simpler and more compact than XML and uses about the simplest syntax possible for specifying table data. The standard argument against the usage of tables is their awkward support for nesting structures, such as navigation menus or document structure. But those types of problems can be solved by using giving the sub-entities fuller independent definition and using table relationships to create the parent child relation. For more on this, see Unrolling Nested Structures. Although such an implementation is not as natural as XML tag nesting, you get to retain the virtues of tables.

Tables are especially advantageous when doing data overlay where you want to have a component modify data in a base component in a minimally invasive way. The overlay can be done with a table of the same name, but specifying only the primary key value and the values of the columns you want to modify leaving blank the columns you do not want to modify. An equivalent overlay structure for XML is quite awkward, especially with no simple way to designate the matching criteria with out using a complex and awkward looking XML query language. The table overlay implementation also is quick to execute in code because the rows can be stored in a hash table using the primary key as a look up key. With an XML solution, there would be no implicit way to optimize the overlay if there were thousands of rows of data and thousands of rows of overlaid data.

Tables have other natural advantages. Since they use a repetitive structure, data storage can be much more compact. The querying language is much more straightforward and it is easy to optimize simple queries using indexes on the columns of the tables. This is true even if the table data is stored entirely in memory and consists of just a few thousand rows of data. Code that processes tables is much more straightforward to understand and maintain as well. Code that processes nesting models usually gets involved in complex recursive method calls that can be hard to understand, debug, audit, and troubleshoot. If you use tables then certain types of customizations can be taken out of the hands of sophisticated developers and put into the hands of administrators of a software product.

Tables also have a natural way of creating associations between different tables by using joins on columns. There are no natural such constructions in XML to allow clean association descriptions between two different XML data files. The nesting constructs allowed by XML can make processing and specifying such associations ambiguous and complex.

There is one other virtue to using tables as the natural atomic data unit in your code. All the databases that have any real market presence store all their data in tables. By using tables objects in code and doing table manipulations, you can avoid a lot of the messy code that translates from tables to other container structures. The database is no longer a storage solution that you work around but is a first class citizen in your code and the design of your database is considered to be part of the top level design of your application. This approach is especially fruitful when you need to scale your application up to millions of records. You need to follow fairly strict rules when creating queries in such an application and if you application code is not aware of these requirements, it can be tricky to guarantee fast performance in your data retrieval code.

The following is an editorial comment based on opinion and years of experience but its validity is yet to be truly tested. I believe that the reason why all database vendors with large market presence use tables as their primary storage format is not because of good marketing or a failure of implementation by those advocating other approaches. I believe tables are provably the best data storage structure much like the inverse square law for attractive force is provably the best approach for creating stable orbits of planets. Tables along with hash maps are the fundamental building blocks for any data oriented application. It is wrong to write code (or a user interface) that obscures or hides this basic fact. Sometimes there is an over emphasis on business object modeling that looks great on paper but creates inflexible and over engineered solutions.

Tags: