Gyassa Software Blog

Mar 19 2012

Global Reference Tables

This article is about whether certain types of data should be in version control systems and treated as source code or whether they should be in database tables and treated as data. Let us say you are writing a web application which will allow users to input recipes for food. As part of your web application you decide that you need a list of the different types of spices that can be put on food. This table and its related tables would not only list the spices, but the degree of hotness, a little snippet of the history of the spice, the plant that it comes from, methods of extraction from the plant, general popularity of the spice, the typical cost of the spice, the regional areas that favor the spice, the amount normally used for spicing a standard size dish, and tips for usage in creating dishes. The entries in this table and its companion children tables would be referenced by recipes, by users reporting about the usage of spices in message boards, functionality in the web site that determine the hotness of the dish based on the amount of various spices in the dish, recommendations of spices for different types of dishes, lists of spices for different regions in the world and so on. In other words, the table and its associated children tables are an example of a Global Reference Table. I use the singular when referencing a global reference table because the children tables are in some sense owned or subsumed by the root table. 

The problem is this. Where is this global  reference table stored? The developers writing the application don't know much about spices and do not believe that have the expertise to create an authoritative list of spices, so they know they need input from experts who have knowledge on spices. The first solution is to put the spice list into database tables and create a user interface to allow administrators or trusted users (the so-called experts) to add new spice entries and all the associated information to a spice. The second is to hardwire the choices into tables in a source code file (preferably RDD formatted tables). 

The natural urge is to go with the first approach and put the data into database tables because it allows the developers to punt on the general problem of coming up with the list. I believe this is wrong. As counter-intuitive as it might sound, I believe it is better to hardwire the data into source code controlled text files.

Source code control systems (such as “git”) give you more than you might first think. For example, suppose you wanted to add some new spices to support Vietnamese specialties which you were going to feature on your food web site with a new Vietnamese recipe designer mobile app. You would like the new spices and the new functionality for Vietnamese specialties to go out at the same time on your web site. If both were in source code, it would happen naturally as part of your deployment process. Source code versioning systems give you natural large scale synchronized distributed transactions on the behavior of your web site. In sophisticated software development environments, the whole build and deployment process leverages this feature of a source code control system to create targeted builds and deployments that control precisely what you want to have appear in your application.

On the other hand, if the spices were in a database that could be changed by administrators and privileges users, then you would lose the ability to have simple synchronized deployment of new features. Also, if you have demo, test, development, and staging versions of your application which used their own databases, every new spice that had associated code functionality would need to have controlled propagation to all these environments. For example, both the Component Evolution and the Staging to Production problems that create such pain for most mature applications would require quite complex additions to your application to solve. In addition, you could get into serious trouble if some of the administrators and/or trusted users added new spices that anticipated the changes coming from the development group. In that case, you would have colliding definitions and difficulties resolving conflicts. That is another thing version control systems do very well – resolving conflicts from multiple developers.

Suppose you buy the argument that the spice tables should go into a version controlled text file that is bundled with the deployed application. What if you want to let administrators or trusted users to add new spices or modify existing entries? My answer is to turn these contributors into developers. Not all version controlled development activity needs to be done using a text editor or involve writing logical behavior. This is well understood for the artists who create the images and styling of your web pages. I believe a similar approach should be used for controlling the content of your global reference tables. 

There is no reason why a web interface that would normally target a database table cannot have proposed modifications captured into a version controlled “diff file”. You compare the table as it is in source code and the proposed new table as desired by a contributor and capture the difference, but unlike with regular file differencing, you capture differences at the table cell level and not by comparing lines of text. When a user saves their changes, they are saved into a “diff” file and committed to a source code control system automatically by the web server that displays the page. Using the RDD model, the “diff files” would be runtime overlays of the existing global reference data. Generally, a group of administrators or trusted users would have a common diff file that they would be editing so that you would not have a proliferation of such files. In some cases, if you wanted to isolate certain changes from the main branch of changes, you could have some administrators and trusted users creating a separate new diff file that was overlaid on top of both the base source file and all the other active diff files. If the contributors wanted to have a coordination of propagating the changes in multiple different global reference tables, they could put diff files being applied to different tables into a single component.

If after a certain period of time, the changes by the administrators and trusted users gained wide spread acceptance, the core group of developers could roll the diff files into the main source file for the reference data, much in the same way they would merge in the changes from a offshoot branch of their main development tree. If a bundle of diffs happened to be captured into a single component, it would make the merging process simpler and easier to understand and control.

In a sophisticated version of this solution, the creators of these “diff” files could view and analyze the history of the changes in the data much as they would in a Wiki site where a user can view the history and changes in the content of a Wiki page. In theory, they would even be free to write suggestions for proposed changes or write comments about the current table content. A very sophisticated solution might let the users edit the tables using an Excel spreadsheet that they can download and upload as they desire just as it is common for users to copy the contents of their Word documents into the edit areas of a Wiki page when editing online page content. Admittedly it would be a major effort to create this functionality, because the files themselves would still be simple text files under the control of standard developer oriented version control system such as “git”. Mapping this functionality to a Wiki page metaphor would require some real labor.

Oct 23 2011

New Job at Redbrick Health

I am now gainfully employed at Redbrick Health. I started a few weeks ago and most of the recent weeks has been getting familiar with their main software product. For those who are not familiar with Redbrick, it is a company that tries to reduce health costs of large "self-insured" companies by improving the health behavior of the employees. A "self-insured" company is one that pays for the health costs for the employes as they occur and uses a large health company (such as Blue Cross Blue Shield) only for administration and accounting purposes. Because they pay for the costs directly, these companies have an incentive to reduce the health costs of their employees. 

Redbrick helps reduce their costs by using software and coaching programs that encourage employees to do things like eat better, exercise more, take their medications, and see the doctor for needed checkups. What is novel about Redbrick is in how the software works. It takes its cue from some of the social gaming software (Farmville comes to mind) that is out there and tries to make improving one's health a bit of a game where you can earn points and achieve goals on a smaller time interval than is typical for many health programs.

The development environment is Grails and Groovy. It makes for quite a different development approach than I have seen before. Java opened the door on using "reflection" to automatically bind parts of your application together. With Grails and Groovy, reflection is how even the simplest numeric operation is performed and calling a method using reflection is as simple as just making the method call or "pseudo-accessing" an attribute. In Grails, a lot of application implementation happens by Groovy implied reflection "magic". The other big thing in Groovy is "closures" which I have found radically changes the design of most of the software I write.

Because of my new job, my work on ratings is currently on hold. There is quite a bit of new code in it and it is much more configurable than before. For those who want to see this code in its current raw state please download the zip file.

Aug 30 2011

Evaluating Rating Systems

In my prior blog I suggested that head to head comparisons should be used to create numeric “chess-like” ratings using the Elo Rating Model as a theoretical basis for the computations of ratings for various goods and services that are evaluated. But this brings up a question. How do you determine whether it is truly better?

In the multiplayer game simulation I wrote, it is quite easy to determine how well the rating solution is performing. The software knows the “true skill” of each player and can tell you precisely how well the differences in numeric ratings of two opponents reflect their true skill differences.

But in a real game, you do not know the actual skill of the players involved so you need to find some other way to determine how well the rating system is working. If this were a simple problem in sampling a random variable, then typically one would calculate the Variance of a computed estimation as compared to the observed values by taking the difference between the estimated value and an observed value, squaring it and then taking the mean of these squares. If we were to do this for a rating system, we would take the rating difference between the two players (or goods or services being compared) and use that to predict the percentage chance of the first player winning. That would be the estimated value for the result of the match. The match would then have a result of 1.0 if the first player won, 0.0 if the second player won, and 0.5 if there was a draw. We would then take the estimated value for the result of the match and subtract the actual result (1.0, 0.0, or 0.5), square it and take the average of all the squares computed this way. But this turns out not to work well. The rating difference for the players is only an estimated difference in ratings, their true difference can only be known if the true skill values of the players are known. This estimation of the rating difference does behave like a standard statistical random variable and can use standard variance computations for judging accuracy if we had a way of sampling the actual rating differences with a standard probability variable. But the estimated value for the winning percentage is a complicated function of the rating difference which means the random variable for the winning percentage with respect to the rating difference does not follow a standard probability distribution. How do we fix this?

It turns out that Jeff Sonas has already done this work and you can find the formula at http://www.kaggle.com/c/ChessRatings2/Details/Evaluation. The formula for a variance term coming from a single resolved match (the equivalent of one of the squares in the standard variance computation).

 -[Y*LOG10(E) + (1-Y)*LOG10(1-E)]

where LOG10 is the logarithm base 10, E is the estimated winning percentage, and Y is 1.0, 0.0, or 0.5 depending on who actually won the match. The actual variance value is the mean of the variance terms computed for all the matches played.

I go a little further than this in my use of this variance computation. I compute the best possible variance assuming that for each match, the result is one that minimizes the variance term. So if the first player actually won the match, but the variance term would be smaller if the opponent won, then the computation for best possible variance would use that value instead. To compute an absolute variance value I take the variance value on actual results as proposed by Jeff Sonas and subtract this best possible value. In the simulations I have performed, a value of 0.1 or less indicates a well functioning rating system.

So how do we use this formula for other rating systems besides games? Take the example of ranking the best 100 films of the last decade. Have a group of experts be consulted as the basis for creating this ranking. You evaluate the ranking by randomly selecting two films from the best 100 films and asking one of the experts to judge which is better. You do this many times with all the experts getting an equal number of head-to-head films to judge. The current ranking predicts which film should be judged better and a percentage chance that this view will be confirmed by the expert can be estimated by how far apart the film is on the top 100 list (maybe each difference in 1 can be declared to be 10 Elo rating points, this computation can be varied until the variance computation is minimized) and then the chess evaluation variance as described above can be used to rate the quality of the top 100 list.

For those who don't like an idea unless it can make money, I believe there is some untapped potential here as well. The same approach could be used to evaluate stock pickers (predict which of two stocks will do better), rating agencies for bonds (predict which of two bonds has a higher chance of default), and formulas for evaluating derivatives (which of two derivatives is more accurately priced). Having a single metric that judges the quality of contributions of various financial experts could have great monetary value.

Aug 23 2011

Wonders of Ratings

Please read the earlier blog article for more on ratings.

I have been talking about my work on simulating competitive games and rating systems to anybody who might be interested. From these conversations, I have been getting a growing conviction that ratings as a general mechanism for evaluating skill in competitive environments have a much greater potential than I think most people realize.

For example, I have now heard a number of anecdotal stories about internet games that lost their user base because their competitors had rating systems and they did not. It is clear that rating systems give you competitive advantages for attracting players. Ratings are also used in more subtle ways. For example, Slashdot lets you rate stories and even lets you rate how other people rated stories. Google and Wikipedia both now have mechanisms for rating the quality of the content they show to the user.

Ratings are everywhere. Employees are rated by their coworkers and bosses. Teachers use tests to give ratings (grades) to their students. You can get ratings for colleges, cities, police departments, hotels, restaurants, and practically any activity performed by humans where there are competitors. Ratings are also used in finance. Bonds are rated by rating agencies, companies have market capitalizations, currencies have their current exchange rates, and so on.

This brings me to my main point that I wish to make. Most rating systems out there are far from optimal. One of the big problems is that they ask an authority to give absolute ratings. This has problems for two reasons. The first is that there is a limited supply of authorities and authorities can have their own biases. The second, and larger issue, is that it is hard to assign an absolute rating without reference to the competitors.  Criteria and tests can help, but they still can fail to adequately discriminate between two different competitors. 

In the movie “The Social Network”, the movie portrays the creation of a web site by Mark Zuckerburg where students can rate which girl they think is better looking than another. In the movie, the actor portraying Mark makes an argument that asking students for an absolute rating will fail because it is so difficult for students to come up with consistently applied rating systems. Instead, Mark uses a Elo chess based rating solution where two girls are selected and shown and the viewing student is asked to chose which is better looking. It appears that this was a highly successful approach.

I believe that this idea has untapped potential. For example, when evaluating films and choosing which movie should get the Oscar for best movie, a “head-to-head” based approach could be used where the members of the Actor's guild could be asked to judge which somewhat randomly selected film is better than another randomly selected film and asked to do about 50 such evaluations each. When making the judgement you could also choose between “slightly better”, “clearly better”, “far superior” and that could be used to determine the K-factor when applying rating adjustments. I believe that this would produce a more accurate consensus pick for best film than the current approach.

As another example, suppose Slashdot replaced their current moderating system with the following. Instead of the normal “absolute” approach, Slashdot would present you with two randomly selected user comments and asked you which comment was better, how much better, and what made it better. A similar thing could be done for Wikipedia content and for practically any other web site that offered user generated rating systems. You could be asked which restaurant is better, which hotel is better, which plumber is better and so on. In many of these cases, the potential selections that are shown for judgement would have to be limited to selections with which the judger was familiar. 

Of course, if the judgers tend to be familiar with only one or two of the potential selections than this approach will not work and a more traditional approach has to be used. In that case, this approach can be used to evaluate the quality of the evaluations made by the judgers. If the judgers have to give written justifications for their judgement, these justifications can be judged in “head-to-head” competitive fashion.

An interesting thing occurs if you use a “head-to-head” competitive solution for producing your ratings. You get numeric ratings for all the content being judged not just a judgement of the relative ranking. For example, if this was done for hotels, it might turn out that the top three hotels might be very close in ratings, while the next tier has a large drop off in ratings. If I were looking to book a good hotel, I might judge that the top three are close enough in rating that other criteria such as price, location, convenience might become more important.

Jun 22 2011

Rated Multiplayer Competitions

In my idle time, I have been playing various computer games. Some of these games (such as World of Warcraft) feature randomly assembled teams that compete against each other. After playing these team player vs. player (PVP) games for a while, I became frustrated at the total randomness in the quality and skill of the players that were on a team. Elite players dominated and bad players could profit by doing nothing and still be on a winning team.

This frustration brought back memories of other team sports that I have participated in which had similar issues. Years ago I dreamed that such issues might be solved by creating a rating system similar to the one used for chess and applying it to members of team competitions. It would be great to go to a pick up game of touch football or Ultimate Frisbee and know that your teammates were of similar skill and aptitude. But I knew such dreams were fantasies because even if a rating system were possible, there was no way it would ever be applied to real sports. The overhead of maintaining information on all the players (and making players use it) would be quite cumbersome and most players would not play enough games so that a proper rating could be given to them.

However, online computer games do not have these problems. It is easy to track the results of players and it is even possible to track performance data so that it is possible to determine which player contributed more to the team effort. Also dedicated players of computer games tend to play a lot more team matches, usually 1000s of team matches a year. In this case, a real multiplayer rating system (using the Elo system that is used for chess as the base) might be quite possible. How could I demonstrate this? One way would be to create a simulation of players and team matches and then show that a rating system would correctly determine the relative skill of the players to some degree of accuracy. This turned out to be large effort and the results produced many interesting facts, many of them unrelated to the number of players on a team. Because of this, I decided to write an extensive document about what I had done.

Feb 03 2011

Runtime Data Differencing

When a team of developers write code for a fairly large project, they typically use a version control system for their source code. One of the standard uses is to create branches and migrate changes in one branch to another branch. In sophisticated version control solutions such as git, recent changes to one branch can be captured as a package of diffs and then applied to other branches of the code being developed by other groups of users. 

The idea behind Runtime Data Differencing is to apply the diffs at the time the code is executed and not at the time the application is built from source code. This brings some of the advantages to source code control to the problems of complicated application deployments. In particular it can help solve the Component Evolution and the Staging to Production problem. It can also allow an administrator for an application who performs complex administrative tasks to get some of the advantages of a source control system for the administrator's changes.

The focus of the solution is on tables and overlaying tables from a Component on top of the configuration or runtime data of an application. This approach forces a change in how the application is designed and written, but with great advantages gained as the applications evolve over the years.

Sep 12 2010

Why Gyassa?

When we were trying to choose a name for our new company we found that it is quite difficult to find a novel and interesting name that is not already taken. When we looked through the names that were already taken, we found that nautical names and terms seemed especially appealing because they were not overly specific but still had implications of "going places" and "doing useful things". 

In our search we found an attractive picture of a sailing vessel that traveled on the Nile called the Gyassa (or alternately Gaiassa). The boat has lateen sails and is used mostly for cargo. We believe that it sends the right message for our company. We want to create software that is useful and depended on for daily work activities. ...

Sep 02 2010

First Post

This is the first post for Gyassa Software, a newly formed entity owned by Samuel White and Eva Cordes. Sam and Eva are the founding developers for the Stellent "Content Management System" (CMS). Three years ago Stellent was acquired by Oracle and the product became the Oracle "Universal Content Management" (UCM) product. As of last July 2010, Sam and Eva are independently employed and started their new company.

Here are some more links on Sam, Eva, and the CMS product.

The Book on the Stellent CMS Product

Documentation on the Workflow Implemented by Eva

Sam White's First Mathematics Article

This website is being run by XWiki, an open source Java software application that lets you create Wiki and Blog web sites. It is our intention to extend and enhance XWiki to do many more things. In particular, we hope to develop a solution for creating independently developed add-on components that can be easily integrated with each other.  For more on this please see Runtime Differencing Data. ...

Tags:

This wiki is licensed under a Creative Commons 2.0 license
XWiki Enterprise 2.4.30451 - Documentation