Google’s Caching Feature- Spam Protection or Copyright Violation?

http://news.com.com/2100-1038_3-1024234.html

The above url links to a 2003 article from CNET news about Google’s caching feature.  When you query a search item through google, besides being able to click on the main link there is also a link labeled “cached,” which directs the user to a cached version of the website they wish to visit. Google introduced caching in 1997, mainly to avoid page-jacking and spam. Page-jacking occurs when someone copies a frequently accessed webpage, resubmits it to a search engine to recreate the high rankings generated through the search engine, and then alters the content of the page to suit their interests (in other words, stealing surfers from the page they intended to visit and redirecting them to a copied webpage of altered, unwanted content). Caching can help to avoid this problem. In 1997, Google began archiving web-pages by making direct copies of them and offering these copies to web-surfers. By clicking on a cached site, there is a greater chance that the user will end up with the original content of the page they wish to view. However, caching has posed a few questions concerning accuracy of information and copyright infringement. The article above points out that web surfers can access “dead” or removed internet pages, archived, member-only accessible New York Times articles, as well as inaccurate information, because Google web-crawlers cannot constantly re-copy every internet page as it is updated. The article also points out that the caching feature could potentially violate copyright law by directly copying pages without consent from the page operators and offering access to these out of date or member-only pages.

This topic relates to what we’ve been learning in class about how search engines operate in two ways. First of all, the article sheds some light on how page-jackers use the sytem of page-ranking to their advantage by imitating pages with high authority scores. Also, the article mentions how Google differs from other search engines by archiving pages, whereas many of its counterparts use statistical records of page-content for page ranking. In class we discussed the drawbacks of using only statistical methods to generate page-ranking. This article shows how relying solely on statistics leaves large room for spammers to imitate highly relevant pages. However, Google’s alternative may prove precarious.

Posted in Topics: General, Technology

Responses are currently closed, but you can trackback from your own site.

Comments are closed.



* You can follow any responses to this entry through the RSS 2.0 feed.