Life

Searching For India

Wikipedia, the CIA and the U.S. State Department are the principal sources for a majority of people searching for information about India on the Web.

By
In his 1897 travelogue Following the Equator, Mark Twain waxed poetically: “So far as I am able to judge, nothing has been left undone, either by man or nature, to make India the most extraordinary country that the sun visits on his rounds. Nothing seems to have been forgotten, nothing overlooked.”

For many Americans at the turn of the 19th century, it was their first window to a distant and mysterious land.

 

A century later, Americans who type the word “India” into the Google search bar and click on the first result will be led to this bland and prosaic entry in Wikipedia: “India, officially the Republic of India, is a country in South Asia.”

 

That is what the wonderful world of the world wide web has brought us to.

Since the web has an English language bias, understandably Dainik Jagran, the largest India newspaper did not make the list of top 10 search results on Google. But nor did The Times of India, the world’s largest English language newspaper, nor India Today, the largest Indian magazine group, nor, for that matter, did any of the ubiquitous Indian television channels, such as Zee, NDTV or IBN.

Instead, Wikipedia, the CIA’s World Factbook and the U.S. State Department’s India Background Note, topped the results in the world’s leading search engine, which handles nearly two-thirds of all searches worldwide.

Ponder that for a moment — a less than authoritative, collaborative, user-edited public encyclopedia, the CIA and the U.S. government are the principal sources for a majority of people searching for information about India on the web. Not because any of them paid for sponsored placements, which businesses often do; in fact the “India” search does not throw up any sponsored links or ads at all. Rather, these are so-called “organic” search results, derived from Google’s much-touted algorithm, designed to generate, in the company’s own pretentious claim, near-perfect search results.

The particular significance of the top-ranked results lies in the fact that nearly half the searchers end up on just these three websites as a result. Compete.com, which analyzes search data based on the web usage of its more than two million user-panelists in the United States, estimated that roughly 30 percent of the traffic from all web searches for the word “India” went to Wikipedia, 9 percent to the U.S. State Department and 5 percent to the CIA World Factbook. (Another 6 percent went to Facebook).

 

Wikipedia also happens to be the top ranked search result on Yahoo and Bing, two less prominent search engines, with 16 percent and 13 percent market share respectively. The Indian government portal ranked second, a relatively obscure Indian email service third, the Wall Street Journal’s India section fourth and the U.S. State Department fifth on both these search engines.

 

Wikipedia topped not just the result for the search term “India,” but also many other terms associated with India, including cities, historical events and prominent politicians and celebrities. Whether it is New Delhi or Nainital, Manmohan Singh, Sonia Gandhi or Mahatma Gandhi, Hinduism or Adivasis, Wikipedia is the go-to place, according to Google. Even Bollywood hearthrobs are not exempt from Wikipedia’s vacuous and understated reach. Search for Amitabh Bachchan, his son Abhishek Bachchan or his actor-rival Shah Rukh Khan on Google and Wikipedia pops up as the top ranked search result. Amitabh Bachchan, the breathless lead sentence on Wikipedia tells us, is an “Indian film actor and producer.” So is his far less distinguished son Abhishek Bachchan, as is Shah Rukh Khan, except that Khan is also “a prominent Bollywood figure” and a television host. Aishwarya Rai’s official website pips Wikipedia on Google, but only by a hair, although Wikipedia bests her official website on both Yahoo and Bing. More importantly, according to Compete.com data, Wikipedia attracted more than two-and-a-half times the traffic from searches for Rai (21 percent to 9 percent) than her official website.

It is hard to overstate the significance of the top “organic” results on search engines, on Google especially. The company controls almost two-thirds of the search market in the United States, 84 percent in Europe, 85 percent in India, 87 percent in Australia and 89 percent in Brazil. A study of over 8 million page impressions by Chitika, an online advertising network of 100,000 websites in May 2010, found that nearly 34 percent of Google’s traffic went to the No. 1 result, 17 percent to No. 2 and 11 percent to the No. 3 listing (see chart). Earlier, an analysis of 9 million search results leaked by AOL in 2006 found that 42 percent of the traffic from a search went to the No. 1 result, 12 percent to No. 2 and 8.5 percent to No. 3. Typically, the top three ranked websites on a search attract almost half the resulting traffic and the top 10 results, which are displayed on the first page of the search, account for 90 percent of the traffic. Relatively few people wade into content that floats to the second or subsequent pages in a search regardless of the merits of the websites. Indeed, one study found that when the first and second result in a Google search was swapped by experimenters, the clicks to the websites by users flipped too.

 

Compete.com data showed that 34 percent of the search traffic for Mahatma Gandhi, 51 percent of the traffic for New Delhi, 46 percent of the traffic for Taj Mahal, 45 percent of the search traffic for Amitabh Bachchan, 71 percent of the traffic for Sonia Gandhi, and 33 percent of the traffic for Hinduism, ended up on Wikipedia, which was the top search result on Google and other major search engines for each of the keywords.

 

Google’s Keyword Tool estimates the global monthly searches for just the keyword India at over 83 million, over 9 million of it in the United States, 3.3 million in the UK and 1.2 million each in Australia and Canada. As a result, tens of millions of users are steered to Wikipedia, the CIA and the U.S. State Department for their primary information about the country every year.

Wikipedia data reveals that its India entry is the 39th most popular page on its website, attracting 1.3 million pageviews monthly — almost 15 million pageviews annually. Atleast seven other India-connected themes or people (See table) are among the top 1,000 Wikipedia articles, including Indian classical dance, which is ranked the 375th most viewed page with more than 384,000 views monthly.

What makes Wikipedia the single most valuable source of information on India on the Web? The explanation lies in a particular characteristic of Wikipedia, which is attuned to the single biggest consideration used by Google in ranking websites: the number of other sites that link to it. Search engine algorithms are designed such that the more sites that link to a particular site, the higher it is ranked, which seems like a fairly reasonable assumption of the site’s web popularity.

 

Wikipedia happens to be the most scraped site on the Web, because it is public and allows, even encourages, users to freely reuse and repurpose its content. Content hungry Websites, many of which are just parking lots for Google ads, turn to Wikipedia for free material to bolster traffic and as filler material for pages to run ads from third party ad delivery services, such as Google Ads. There is even evidence that some software bots simply crawl Wikipedia to populate web ad farms. Consequently, Wikipedia has an uncharacteristically large number of links, not because its content is highly prized, but because it is freely available for reuse. That happens to be Wikipedia’s core appeal, not because it is seeking to game Google’s search system, which the company vigilantly polices and penalizes, but simply because it is in Wikipedia’s celebrated DNA.

 

Scholars and critics who have examined Wikipedia’s articles have found them uneven, at best. Some entries are quite comprehensive and reliable, shaped by large numbers of users in which the wisdom of the crowd polices content. The entry on India happens to be among the more heavily debated (yet unresolved) articles and is quite substantial in scope. An editorial note on Wikipedia’s India page points out that “The neutrality of this article is disputed” and prompts readers to review the discussion on its “Talk Page,” where one can review the contentious and often highly personal debate and disagreements, both big and petty. Few readers, of course, bother to review the Talk Page, whose content runs several times longer than the actual entry. But that hasn’t stopped Wikipedia from becoming the top source for India-related information.

As Robert McHenry, a former editor-in-chief of Encyclopædia Britannica, pointed out in his 2004 critique, titled “The Faith-Based Encyclopedia,” of Wikipedia: “The user who visits Wikipedia to learn about some subject, to confirm some matter of fact, is rather in the position of a visitor to a public restroom. It may be obviously dirty, so that he knows to exercise great care, or it may seem fairly clean, so that he may be lulled into a false sense of security. What he certainly does not know is who has used the facilities before him.”

 

That is an independent area of far graver concern. WikiScanner, a tool developed in 2007 by Virgil Griffith, a graduate student at the California Institute of Technology, exposed anonymous edits to Wikpedia by people with a vested interest in the contents, by matching the internet protocol addresses with public sources. The CIA, Congressional staffers, the Democratic Congressional Campaign Committee, etc., were all found to have massaged Wikipedia content in self-serving ways as the encyclopedia allows anyone to contribute content to the database, even anonymously. A Times columnist cautioned: “Critics of the web decry the medium as the cult of the amateur. Wikipedia is worse than that; it is the province of the covert lobby. The most constructive course is to stand on the sidelines and jeer at its pretensions.”

 

Whatever the merits and limitations — and there are both — of Wikipedia, there can be no disagreement over the danger of its outsized role in Web search results. Its influence, as well as that of other publicly shared government sources, such as the CIA and the U.S. State Department, is destined to only grow as more and more commercial Websites, haul up the information bridges and bury their content behind pay walls, as the New York Times did at the end of March.

 

For the better part of the last century, people around the world were initiated to other peoples and cultures by chance encounters with literary masters, such as Twain’s witty 1897 travelogue referenced earlier. The world of India might have been opened up to some by R. K. Narayan’s Malgudi Days, or Rudyard Kipling’s The Jungle Book, or possibly Jawaharlal Nehru’s Discovery of India or Mahatma Gandhi’s The Story ofMy Experiments With Truth.

 

During the past few decades, their impressions might been shaped by fleeting coverage in newspapers or magazines or television. The India to which they were exposed was no doubt colored by the biases, prejudices and limitations of the authors of the works. But it was, nevertheless, multidimensional, earthy and recognizably incomplete. More importantly, collectively, it was diverse, shaped by spotty encounters with a newspaper article here, a classic work of fiction or contemporary nonfiction, or a movie or documentary there.

The English novelist E. M. Forster, whose A Passage to India has been ranked among the 100 greatest works of English literature, famously wrote, “But nothing in India is identifiable, the mere asking of a question causes it to disappear or to merge in something else.”

He would scarcely have imagined where posing a query about India on the web would so definitively land him — “a country in South Asia.”

 

Percent Share of Traffic by Google Result

Google Result Impressions Percentage
1 2,834,806 34.35%
2 1,399,502 16.96%
3 942,706 11.42%
4 638,106 7.73%
5 510,721 6.19%
6 416,887 5.05%
7 331,500 4.02%
8 286,118 3.47%
9 235,197 2.85%
10 223,320 2.71%
11 91,978 1.11%
12 69,778 0.85%
13 57,952 0.70%
14 46,822 0.57%
15 39,635 0.48%
16 32,168 0.39%
17 26,933 0.33%
18 23,131 0.28%
19 22,027 0.27%
20 23,953 0.29%
Numbers are based on a sample of 8,253,240 impressions across the Chitika advertising network in May, 2010.
Source: Chitika    

 

 

 

                  Wikipedia article traffic statistics

 

Most viewed India-related articles in  December  2010

 
     
Rank Article

Page views

39 India 1308393
375 Indian classical dance 384326
542 Shahrukh Khan 324168
574 Salman Khan 312492
583 Aishwarya Rai 310293
598 Hinduism 308303
961 Taj Mahal 242855
979 Kama Sutra 240668

 

Leave a Reply

Your email address will not be published. Required fields are marked *