Please read the earlier blog article for more on ratings.
I have been talking about my work on simulating competitive games and rating systems to anybody who might be interested. From these conversations, I have been getting a growing conviction that ratings as a general mechanism for evaluating skill in competitive environments have a much greater potential than I think most people realize.
For example, I have now heard a number of anecdotal stories about internet games that lost their user base because their competitors had rating systems and they did not. It is clear that rating systems give you competitive advantages for attracting players. Ratings are also used in more subtle ways. For example, Slashdot lets you rate stories and even lets you rate how other people rated stories. Google and Wikipedia both now have mechanisms for rating the quality of the content they show to the user.
Ratings are everywhere. Employees are rated by their coworkers and bosses. Teachers use tests to give ratings (grades) to their students. You can get ratings for colleges, cities, police departments, hotels, restaurants, and practically any activity performed by humans where there are competitors. Ratings are also used in finance. Bonds are rated by rating agencies, companies have market capitalizations, currencies have their current exchange rates, and so on.
This brings me to my main point that I wish to make. Most rating systems out there are far from optimal. One of the big problems is that they ask an authority to give absolute ratings. This has problems for two reasons. The first is that there is a limited supply of authorities and authorities can have their own biases. The second, and larger issue, is that it is hard to assign an absolute rating without reference to the competitors. Criteria and tests can help, but they still can fail to adequately discriminate between two different competitors.
In the movie “The Social Network”, the movie portrays the creation of a web site by Mark Zuckerburg where students can rate which girl they think is better looking than another. In the movie, the actor portraying Mark makes an argument that asking students for an absolute rating will fail because it is so difficult for students to come up with consistently applied rating systems. Instead, Mark uses a Elo chess based rating solution where two girls are selected and shown and the viewing student is asked to chose which is better looking. It appears that this was a highly successful approach.
I believe that this idea has untapped potential. For example, when evaluating films and choosing which movie should get the Oscar for best movie, a “head-to-head” based approach could be used where the members of the Actor's guild could be asked to judge which somewhat randomly selected film is better than another randomly selected film and asked to do about 50 such evaluations each. When making the judgement you could also choose between “slightly better”, “clearly better”, “far superior” and that could be used to determine the K-factor when applying rating adjustments. I believe that this would produce a more accurate consensus pick for best film than the current approach.
As another example, suppose Slashdot replaced their current moderating system with the following. Instead of the normal “absolute” approach, Slashdot would present you with two randomly selected user comments and asked you which comment was better, how much better, and what made it better. A similar thing could be done for Wikipedia content and for practically any other web site that offered user generated rating systems. You could be asked which restaurant is better, which hotel is better, which plumber is better and so on. In many of these cases, the potential selections that are shown for judgement would have to be limited to selections with which the judger was familiar.
Of course, if the judgers tend to be familiar with only one or two of the potential selections than this approach will not work and a more traditional approach has to be used. In that case, this approach can be used to evaluate the quality of the evaluations made by the judgers. If the judgers have to give written justifications for their judgement, these justifications can be judged in “head-to-head” competitive fashion.
An interesting thing occurs if you use a “head-to-head” competitive solution for producing your ratings. You get numeric ratings for all the content being judged not just a judgement of the relative ranking. For example, if this was done for hotels, it might turn out that the top three hotels might be very close in ratings, while the next tier has a large drop off in ratings. If I were looking to book a good hotel, I might judge that the top three are close enough in rating that other criteria such as price, location, convenience might become more important.