Browsing articles tagged with " IBM"

A Closer Look at Wiki Authorship

Feb 5, 2009   //   by Mike McMurray   //   Code, Interesting Stuff  //  No Comments

Jeff Atwood takes an interesting look at the history of changes to wiki pages and the balance between opinion and fact. The larger wikis (e.g. Wikipedia) have a huge amount of data around page edits and Jeff’s article also highlights an IBM study on how the more popular Wikipedia content evolves over time.

There’s also a comment about one of my favourite subjects – reading too much into statistics. Apparently Jimmy Wales (Wikipedia co-founder) looked into who was responsible for most of the articles changes and found that 0.7% of users were responsible for over 50% of all edits. But an “edit” may be a spelling correction rather than adding content or altering the facts or meaning in an article. As it turns out, the data points to these hyper-active users doing just that – cleaning up after everyone else.

Kisimi uses a basic string comparison function called simple_text() to show the relative difference between two versions of a page. We could also use the Levenshtein function which gives the minimum changes to go from string A to sting B, but that doesn’t always make much sense for larger content changes. If someone sees that two versions are 96% the same then it’s obvious they’re much the same.