Wikis are becoming popular with business and academia as a way to harvest, archive, and manage knowledge. One of the most popular and well-known wikis is Wikipedia, the online encyclopedia started by Jimmy Wales and Larry Sanger in 2001. Since its inception, much has been written (both pro and con) about Wikipedia; however, Wikipedia is one of the most popular sites on the Internet today. As its popularity increases, more and more “net generation” students will be utilizing its articles as reference sources for academic work. This paper explores the emerging “wiki way” of Web 2.0 tools and highlights the good, the bad, and the management of Wikipedia as an academic reference.
Wikipedia is a rapidly growing phenomenon in the online world of collaborative (Web 2.0) activities. Since the advent of the public Internet, many types of shared activities have been evolving, with massive multi-player online games leading the list of popular activities that have stood the test of Internet “time.” However, a new type of collaborative activity is gaining momentum, the wiki. As per Wikipedia, the online encyclopedia, a wiki is “a web application designed to allow multiple authors to add, remove and edit content” (Wikipedia, 2007b, para. 1).
The origin of the word wiki has its roots in the Hawaiian language and is found to be derived from the phrase “awiwi, wikiwiki” which is translated to mean quick or fast (Hawaiian Dictionary, 2007). Ward Cunningham is widely attributed for pioneering the first wiki in 1995 by writing server software that allowed any web page to be edited by any user (Szybalski, 2005). Some wikis work like a library for a document in that users check out the document, modify the document, then check it back in for other users to read, edit, or modify. Thus, a collected knowledge is contained within the document as well as archived through saving all previous editions.
College students in 2008 (the so-called Google generation or net generation) have grown up using web based resources and consider them to be a component of daily life (personal observation in the classroom of a 21st century college; Murley, 2008). As such, it is not surprising to see many reference lists stocked with web based articles, Internet sources and hyperlinks. However, is the Web the most authoritative source for academic work? Moreover, is Wikipedia an authoritative academic source given its dynamic nature? If society continues to digitize journals, magazines, and encyclopedias, management of online content and its usage in an academic arena will need to be addressed. This paper explores these ideas and pushes the reader into the world of Wikipedia as an academic reference.
Many academics have strong opinions about Wikipedia, many of them negative. Conversely, Randy Pausch of “The Last Lecture” has written for a traditional encyclopedia (World Book) and being familiar with Wikipedia, summed up his experience:
I have not bought the latest set of World Books. In fact, having been selected to be an
author in the World Book, I now believe that Wikipedia is a perfectly fine source for your
information, because I know what the quality control is for real encyclopedias. (Pausch,
2008, p. 42)
2. Literature Review
There have been many attempts to “measure” entries in Wikipedia for their value as authoritative academic sources. Korfiatis, Poulos and Bokos (2006) pose the metric of “article degree centrality” which is based on links to/from the article in question and has been used in social network analysis and search engine metrics. This quantitative metric “grades” an article in Wikipedia based upon the number of edges (a concept from graph theory, viewed as links in this construct) leading to or from an article. Brandes and Lerner (2007) also use a graph theoretic approach in obtaining a “who revises whom” visual representation of article edits in Wikipedia. This tracking metric attempts to identify vandalism and “edit wars” caused by differences of opinion. The idea of “opinion” being expressed in an article divides article types into two classifications:
scientific articles (e.g. linear equations, Shannon entropy, colony collapse disorder)
(Brandes and Lerner, 2007; Halavais and Lackaff, 2008)
Scientific articles, while opinion can play a part, primarily deal in known facts, while controversial articles (see the Sara Palin edits on Wikipedia which resulted in Wikipedia personnel posting: Edit warring / Content dispute: Hello! Please don't edit-war on our articles. It slows the servers down) can have diametrically opposite viewpoints being expressed, thus controversy and dispute occurring in article creation and subsequent editing. The examples above illustrate that there is no clear dividing line between scientific articles and controversial articles, particularly in respect to emerging scientific theories such as colony collapse disorder (where worker bees suddenly disappear from the hive), the causes of which are currently being researched. This is one of the greatest assets of Wikipedia, and a key finding in the paper by Black (2008) “The Wikipedia concept is a potential model for more rapid and reliable dissemination of scholarly knowledge” (p. 1). In addition, an often cited study by Nature in 2005 found that “Wikipedia comes close to Britannica in terms of the accuracy of its science entries” (Giles, 2005, p. 900).
Potthast, Stein, and Gerling (2008) also address vandalism and define it as a “classification task” based on a proposed metric that identifies edits that could be vandalism. One result of vandalism in Wikipedia was the creation of “protected pages” which cannot be edited by the general contributor to Wikipedia. The proposed edit must first be approved by a Wikipedia internal editor charged with quality control of the article. Errors and vandalism do appear to be corrected quickly, with some authors positing that vandalism in Wikipedia is usually repaired within minutes (Brandes and Lerner, 2007; Murley, 2008).
Utilizing a different approach, Stvilia, Twidale, Gasser and Smith (2005) introduce ten “information quality” problems that are qualitative in nature. The article quality metrics are: accessibility, accuracy, authority, completeness, complexity, consistency, informativeness, relevance, verifiability and volatility. Many of these metrics deal with the language of the article and the culture of the contributor and are subjective by nature, adding the possibility of controversy and dispute into articles. Two of the most studied metrics are accuracy and completeness, and these ideas permeate the literature and are central to increasing article quality in Wikipedia.
Ideas of accuracy and completeness have given rise to another thread in the literature concerning Wikipedia, that of article maturity and author credibility. Halavais and Lackaff (2008) assert that article quality can be measured against the number of edits occurring to the article, inferring that more edits make an article “better” (more complete and accurate). They note that there are exceptions to this rule, some of which are covered in the vandalism discussion above (the most notable being edit wars). The paper by Snyder (2007) also addresses article quality and completeness and proposes continuous functional representations to measure when an article is complete and accurate. Both completeness and accuracy are functions of the number of edits and editors, and this paper illustrates that the balance point (an article being both complete and accurate), under this model, is an elusive target.
Vuong, Lim, Sun, Le, Lauw, and Chang (2008), address article quality in a different manner, that of controversy rank models. Controversy rank models attempt to determine which articles are generating disputes within the editing process of article generation. Vuong, et al. define a contributor to be controversial if they are likely to be involved in disputes with others. Their model involves graph theory (relationships between article editors) and a measure of the controversy of editors. These models are challenging due to:
the large number of articles in Wikipedia
diverse content among articles in Wikipedia
evolving content in articles in Wikipedia (Vuong, et al., 2008).
Dispute and controversy measurements can be placed on the author (editor) rather than the article as posited by Adler and de Alfaro (2006). In this technical report, the authors measure the “reputation” of article authors. This ranking would demonstrate how important an element of content addition is from certain (ranked) authors. This metric takes into account:
text longevity – how long text in an article is allowed to remain before being re-written or removed
edit longevity – how long before the entire edit is removed from an article.
The evaluation of the longevity of text and edits leads to another measure of the quality of the text added or edit made. The addition to the quality metric is that of author reputation. More reputable authors have better quality edits and text additions, thus more longevity to their edits. Lastly, the authors posit that the age of the text in an article can be considered as a trust metric, in that no edits have been made to the text in a specified time frame.
In addition to the concerns listed above, Wikipedia has received criticism due to the inherent untrustworthiness of a publication that can be edited by anyone, which brings into question the scope and balance of the articles (Chesney, 2006). Chesney’s study revealed that experts (individuals reading articles in their field) rated the accuracy of Wikipedia’s information as high, but that 13 percent of the articles did contain errors. On the other hand, a study by Giles (2005) found that “modest” differences exist in comparisons of articles in Wikipedia and Encyclopedia Britannica, while Murley (2008) points out that while Wikipedia articles can contain errors, the same is true of other secondary and tertiary resources. It is common practice for newspapers, magazines, print encyclopedias, and other print media to publish errata as soon as an error is discovered. Murley also notes that the greatest value of a Wikipedia article could be its links to relevant sources outside of Wikipedia (Murley, 2008), something that traditional encyclopedias lack.
Another follower of Wikipedia and an unlikely ally (Wikipedia’s print “competition”) is the international journal of science, Nature. The editors at Nature view Wikipedia with a skeptical eye, but openly advocate for its success. In an editorial dated December 2005 (during the initial “rise” of article contribution and public viewing of articles), the following statements are made: “backing up a claim with a peer-reviewed reference, for example, makes a world of difference” and “to push forward the grand experiment that is Wikipedia and to see how much it can improve” (p. 890). Finally, the editors push for involvement by the research community by stating “researchers should read Wikipedia cautiously and amend it enthusiastically” (p. 890), illustrating their support for doing what Nature does, publishing current scientific findings (Nature, 2005).
The complexity of Wikipedia, with its internal and external links opens up the collective knowledge sets of any entity on the Internet. It is this opportunity that both fascinates and terrifies researchers as to the accuracy of the information. Nevertheless, “Online encyclopedias, unrestricted by weight, volume, and time spent flipping pages, hold out the promise of being truly comprehensive” (Halavais and Lackaff, 2008, p.433). Wikipedia is emerging as a source that is broader than any other single source of knowledge in human history. Thus, to construct and complete this broadening of knowledge, a broad base of contributors must be utilized.
The wiki way is not a new notion; businesses have been trying to get the customer to aid in their workload since the advent of commerce. Many farms have “U-pick-em” areas and the buffet line is not a new concept. However, with the rise in popularity of the Internet and e-commerce, a new method of accomplishing the wiki way has emerged. Wikipedia is utilizing this vast online workforce to manage the inputting, modifying, updating, and verifying of its entries. Further, under correct management, Wikipedia could become one of the world’s largest repositories of digitized knowledge.
Below is a short list of wiki applications currently on the Internet.
Wikipedia – users building an encyclopedia
Second Life – users building their world
RateMyProfessor.com – students adding content about faculty
digg.com – users rating articles to be read
MySpace.com – users posting their own personal content
YouTube – users posting their own videos
Curriki.org –faculty exchanging curriculum ideas
These sites illustrate the growing popularity of Web 2.0 applications and the fact that users enjoy generating content for providers of web sites. As the net generation students enter higher education and the workforce, both arenas must manage these individuals and their ways of mining knowledge to best utilize their potential. As an illustration, the global top ten list on Alexa.com (Alexa, 2008) reveals that the net generation is interested in three main themes. These sites, their themes, and their rankings are given in the lists below:
This list illustrates that wiki applications are some of the most popular and that Web 2.0 applications have a huge fan base and show no indications of slowing in the future. Already, sites like MySpace and Facebook are being utilized in the hiring (and firing) arena to evaluate employees on items not contained in their resumes.
Wikipedia has been continually gaining in popularity (it has moved from number nine to number eight in the global web rankings in the past year (Alexa, 2007, 2008)) and usage as illustrated in Figure 1 below which gives the “reach” or percentage of Internet users (who have the Alexa toolbar) visiting the site:
As can be seen from Figure 1, the reach of Wikipedia has been growing exponentially since its inception on January 15, 2001, with accelerated growth taking off in 2005 (Wikipedia, 2007c). In addition to the astounding growth rate of users accessing the Wikipedia site, the English version of Wikipedia boasts over 2.6 million entries with approximately one billion words, which is about 25 times the size of the Encyclopædia Britannica, the largest (print) English language encyclopedia. In addition, Wikipedia is growing internationally, boasting over 9.25 million articles in 250 languages (Wikipedia, 2008a).
In the academic realm, wikis are gaining popularity and usage in the classroom and libraries (Richardson, 2006; Stephens, 2006; Kamel-Boulos, Maramba and Wheeler, 2006), as well as in the research arena (Voss, 2006; Hill, Gaudiot, Hall, Marks, Prinetto, and Baglio, 2006). This trend can be seen in Figure 2, which graphs the number of papers appearing in conference proceedings and journals which address the topic of Wikipedia.
Academic papers relating to Wikipedia (Wikipedia, 2008b)
Figure 2 is obtained from data on Wikipedia’s site and is not exhaustive. In performing a Google Scholar search for the term “Wikipedia,” 20,200 hits were returned, illustrating the academic interest in the subject. In light of this interest in Wikipedia and the Web, these dynamic content sites will be appearing in literature and reference lists, forcing the academic community to manage the soundness of the citation and of the site. This verification of information will be a task for students, faculty, and other interested parties to undertake. Table 1 gives an idea of where Wikipedia is beginning to become established in the academic community.
Users and uses of Wikipedia in the academic arena
Efraim Turban, E-Commerce Textbook
(Turban, et al., 2008)
Wikipedia referenced for problem assignments, definition of “online intermediaries,” Wikipedia used as an e-commerce application case
Douglas Samuelson – OR/MS Today Article
Definition of “Colony Collapse Disorder” for honey bee colonies
International Association for Computer Information Systems 2007 Call for Papers (IACIS, 2007)
Wikipedia definition for “globalization”
Paper appearing in IJKLO Vol. 3
(Parker and Chao, 2007)
Utilize Wikipedia as a teaching tool
Paper appearing in JISE Vol. 19(2)
(White, Longenecker, McKell, and Harris, 2008)
Wikipedia definition of “assessment”
Table 1 illustrates that Wikipedia is gaining momentum in the academic arena, and a recent survey by eWEEK (2009) of their readers revealed that blogs and wikis are the Web 2.0 applications that are most frequently appearing in industry. Almost half of the respondents reported that wikis are being deployed at their organizations. If these numbers hold, or grow, students need to be trained in the appropriate use of wikis, including Wikipedia.
For all the good of Wikipedia and other wikis, there is a dark side to publicly accessible, freely altered content. In a high profile case, a Nashville area resident changed the Wikipedia entry of John Seigenthaler, a one-time administrative assistant to Robert Kennedy, to read that Mr. Seigenthaler was involved in the Kennedy assassinations. The Nashville area resident claimed to have posted this to “fool” a colleague (Goodin, 2005; Said, 2005). While the article was eventually corrected, the personal damage was done. John Seigenthaler responded to the false content posted about him in an article in USA Today and explained how difficult it was to uncover who had posted the malicious information about him. His summary thought about Wikipedia was “I am interested in letting many people know that Wikipedia is a flawed and irresponsible research tool” (Seigenthaler, 2005, p.2). It should be noted that his critique is based on a sample of size one, and was exceedingly personal as it was his biography.
The John Seigenthaler case (there have been other high-profile misuses of Wikipedia) began the debate on policing Wikipedia rule changes for editing entries on Wikipedia and who is ultimately responsible and legally liable for content on a wiki space. (See Ken Meyers’ (2006) article for an informative legal treatment of the Seigenthaler case and applying the communications decency act to Wikipedia.)
The Seigenthaler article also points to one of the major dividing lines on Wikipedia content, that of controversial articles versus scientific articles. While it is relatively easy to post “opinion” about a subject such as an individual’s biography, and have this opinion escape examination, it is much more difficult to do the same in an article concerning scientific facts such as the quadratic equation. This discrepancy in article accuracy has been noted (Wikipedia, 2008e), and many attempts to identify faulty information, vandalism, dispute, author credibility, and controversy are currently being debated in the academic community (Vuong, et al., 2008; Brandes and Lerner, 2007; Wilkinson and Huberman, 2007; Potthast, et al., 2008; Braun and Schmidt, 2007). Wikipedia itself recognizes the issue of credibility and readily publicizes the issue in an attempt to solicit ideas (Web 2.0!) and procedures to correct these issues (Wikipedia, 2008e). Quoting Wikipedia, on their viewpoint toward using Wikipedia as an academic reference and/or teaching tool,
If you're a professor, teacher, or student within the college community, we encourage you
to use Wikipedia and/or Wikiversity in your class to demonstrate how an open content
website works (or doesn't). Many of these projects have resulted in both advancing the
student's knowledge and useful content being added to Wikipedia. (Wikipedia, 2008f,
The key issue is the parenthetical acknowledgement that this entire experiment might not work! By expressing their own doubt, the Wikipedia management team is acknowledging that quality issues exist with the content and that they are concerned with improving their product.
3.1Quality Control at Wikipedia
Wikipedia is attempting to make their articles accurate and complete. Wikipedia’s reliability is measured internally using the following set of criteria:
Accuracy of information provided within articles
Comprehensiveness, scope and coverage within articles and in the range of articles
Identification of reputable third-party sources as citations (Wikipedia, 2007d).
The criteria used by Wikipedia parallels the concerns of authors in this subject area and three points are constantly re-enforced throughout the literature: accuracy, completeness, and reputable third-party sources.
In a first attempt at an edited Web 2.0 encyclopedia, the Wikipedia: Version 1.0 Editorial Team/Assessment (Wikipedia, 2008c) project reviews articles and “grades” the articles based upon a consensus measure, which can be a subjective rating of the articles based on team member experience. The grading scheme is illustrated in Table 2. Observing the article’s grade is an option that a registered user of Wikipedia can enable under their “my preferences” menu, “gadgets” tab. The benefit to viewing an article’s grade is knowing where the article “stands” as far as usage as an academic reference.
The main focus on grading an article is how complete the article is, but content and language quality play important roles as well (Wikipedia, 2008c). If an article has attained FA or FL - class status, it is considered complete and thus usable as a general academic reference, while an A – class article, while complete, would benefit by having more specific references added before being used as an academic source. Below A – class articles need external sources for verification and augmentation of the topic being discussed in order to be utilized as an academic source. I.e. the article authors must “complete” the articles use as an academic reference by seeking verification of the information from other reputable sources. These other sources should also verify the accuracy of the content, thus building trust in the information. The expected “reader’s experience” for each graded article is given in Table 2.
Reader experience of graded (ranked) articles in Wikipedia (Wikipedia, 2008c)
FA or FL
Professional, outstanding, a definitive source for encyclopedic information.
Very useful. Fairly complete treatment of subject. Good for non-experts.
Useful to general readers. Approaching the quality of a professional encyclopedia.
No reader should be left wanting, but content may not be complete for research.
Useful to a casual reader. Not a complete picture, not ready to be an academic source.
Some meaningful content, but most readers will seek further information.
Little more than a dictionary definition of the topic.
The bold border in Table 2 marks the dividing line between complete, accurate, reviewed articles with external sources (FA, FL, and A grade), thus suitable for use as an academic reference, while other graded articles could use outside sources and further research to verify the information contained.