» Cornell Info 2040

Cornell Info 2040 - Networks

This is a supplemental blog for a course which will cover how the social, technological, and natural worlds are connected, and how the study of networks sheds light on these connections.

Opening Up the Keyword Based Advertising Market

Thursday, March 15th, 2007 12:46 pm

Contributed by: rr2001

If you’ve taken some basic economics, you know that the perfect competitive market equilibrium that economists love so much can be thrown off by a few things, including imperfect information. The exploding popularity of keyword-based advertising fueled by Google is a relatively young and developing marketplace with companies continuing to tweak their strategies and mechanisms trying to get the auction system right for connecting advertisers with the ad space. In class we’ve talked a bit about optimal advertiser/ad matchings, but after all, its in the interests of the advertisers to take away all the consumer (advertisers in this case) surplus. They can only do this, however, in the absence of competition.

An marketing industry blogger on internet search, Bill Wise wrote a post entitled Accountability Matters at the end of last year on accountability in the keyword-based advertising market. He discusses how companies like Google and Yahoo are using their market power and making a shift from displaying competitive bids and click statistics to “black-box” systems like Google’s. He points out that advertisers should be holding on to this privilege of having the information in front of them. In many other forms of media that offer advertising, like television and print, prices are usually individually negotiated. Knowing the going rate can often help prevent buyer’s remorse and prevent a seller from price discriminating. The internet provides an opportunity for a perfectly competitive marketplace, if advertisers can seize the opportunity.

Posted in Topics: Technology

No Comments

Freebase: A Semantic Web

Thursday, March 15th, 2007 2:07 am

Contributed by: bluejayalltheway

An article over at O’Reilly Radar previews Metaweb’s Freebase, an upcoming product that aims to collect and organize the vast amounts of knowledge available on the web. Like Wikipedia, Freebase empowers users to view and edit its information archive directly. (In fact, Freebase has absorbed much of the content provided in openly available databases such as Wikipedia and musicbrainz.)

What makes Freebase interesting, however, is the large amount of metadata that users may associate with each topic. The link uses the O’Reilly Media page as an example. This page is identified as a “Company” type. Every “Company” page has metadata associated with it, such as: founding date, headquarters, industry, revenue, ticker symbol, etc. If we knew the founder of O’Reilly Media, we could add it in as a property, and the founding person would also have a page. On the founder’s page, Metaweb would indicate that (s)he was the founder of O’Reilly Media. Herein lies the power of Freebase — not only is extensive metadata available for each entity, but also it can be intelligently and automatically searched through and updated in response to related metadata changes.

“Bare” web content has no useful metadata associated with it by default. This makes it harder to determine relevant connections between various kinds of web sites. Google’s approach to this problem is to examine how hyperlinks tie particular pages (or groups of pages) together. PageRank relies on heuristics regarding node in- and out-degrees. By contrast, Metaweb seeks to attach useful information along with each node. This additional data can then (in theory) be used to construct more useful and accurate connections between nodes in a graph. For example, one might imagine that several nodes may all be of type “Magazine.” Metaweb knows these nodes are all related by the fact that they are magazines. But it may also be able to distinguish car magazines from sports magazines, based on other metadata that is stored in each of these nodes.

A notable advantage to this scheme is that changes to information is reflected much faster through metadata than through PageRank statistics. If for some bizarre reason Bill Gates decided to leave Microsoft, then removing Bill Gates from Microsoft’s metadata will automatically have appropriate metadata adjusted throughout the graph. Web links, on the other hand, must be updated manually and thus won’t reflect changes as quickly (if ever, which could happen if a website is abandoned).

Posted in Topics: Education, Technology

No Comments

Link bombs: Clever Manipulation of Search Algorithms

Wednesday, March 14th, 2007 11:00 pm

Contributed by: nullity

Web search algorithms, such as PageRank or HITS, rely on hyperlink analysis to rank the relevance of web pages in the world wide web. As mentioned in class, the principle these algorithms rely on is that the presence of an in-link onto a page indicates a human approval of its content. Consequently, studying the link construct of the web gives us a sense of which pages are important.

The link analysis, however, can be intentionally manipulated to give some pages higher ranking using link bombs, commonly referred to as Google bombs. A link bomb works by generating significant number of pages with cleverly crafted links to a specific target web page, thereby increasing its PageRank such that googling a specific keyword results in seeing the target as a first hit. The nature of link bombs can range from silly and humorous to highly political.

On January 25, 2007, Google announced a change in their search algorithm to combat link bombings. In their blog entry, Google mentions that “people assume that [bombed pages] are Google’s opinion” simply because they are ranked higher by PageRank. While this is not true, it is natural for Google to stop link-bombings as they can skew the integrity of their ranking algorithm. Instead of displaying linked-bombed pages as first hits when bombed keywords are queried, the modified Google algorithm now attempts to display pages about the link bomb instead of the actual bombed pages. Political link bombs see immediate effect by the change; however, it did little more obscure bombs. The “miserable failure” link bomb directed at President Bush’s biography page from the White House’s web site, for example, no longer works on Google, even though it can still be seen in other search engines such as Ask. Other the other hand, a less known bomb of “awful announcer” on Fox’s baseball announcer Tim McCarver is unaffected.

In Tatum’s paper, which did a case study on the “miserable failure” bomb, political link bombs can be seen as social movements. The “miserable failure” bomb is a response of disconnect of President Bush’s policy, while the consequent counter-bombing (in a attempt to redirect the bomb to liberal political figures as Michael Moore, Jimmy Carter, and Hillary Clinton) is a protest in response to the bomb itself. While adaptive refinement of search engines might eventually make link-bombing a thing of the past, the practice has served a mean of waging protests and political wars on the Internet.

Posted in Topics: Technology, social studies

No Comments

Privacy in the use of Keyword-Based Advertising

Wednesday, March 14th, 2007 10:28 pm

Contributed by: misterkonej

Gmail, Google’s e-mail service, has several patents for the use of advertisement alongside e-mails. The use of advertisements based on the content of e-mails received many complaints about violating users privacy. In fact, several of my friends addressed concerns about this feature. I decided to look into this matter to see if there is any real reason for concern. Two articles, Tailoring Ads to Email Users, Google Has Some Poor Fits and Gmail Leads Way in Making Ads Relevant, as well as Google’s own website FAQs have some information on the system and relevance of ads.

Advertisements directed through the use of emails are auctioned in the same manner as those seen on Google’s main site through keyword based advertising as discussed in lectures; charging advertisers each time ads are clicked and auctioning off slots for certain terms. These advertisements are the main source of revenue for Gmail, and Google argues that it is a small price to pay for the amount of space Gmail offers. Google co-founder Sergey Brin mentions that scanning is automated (not seen by human), content is not stored relating to the ads, and that there is no data going out to advertisers. This can be seen as a directed edge in the information network. Also, it is important to note that all e-mail systems scan the content of e-mails, if not for keywords for ad-targeting, than for viruses and spam. Perhaps a larger concern should be to work on better parsing of e-mail content to eliminate irrelevant, poorly matched advertisements.

Interesting enough, Google acts as an intermediary, or possibly although it may be a stretch to say, a trader. As an intermediary, Gmail provides a platform for advertisers to have a place, connection to e-mail users and consumers after using a 2^nd price bid type auction. Gmail makes profits on clicks rather than just viewing, so the better matched the ads are, the better the profit. This analogy has a few places that are not well matched. The buyer may click on an ad and make a transaction, but the actually buying is done directly with the seller on that webpage. Where it relates to trader networks, is that Google makes “commission” or profit from that click in transit to the purchase. A slight difference is that even if the e-mail user clicks an ad and does not complete the transaction, Google will still receive pay from the advertiser. This is also very much like real estate brokers or other brokers as Gmail’s role with buyer and seller. Why is it in one case that privacy is such a large concern and not in the other. Contrary to intuition, people feel more comfortable sharing their salaries, home life, and problems with a real-estate broker than a machine that temporarily stores words. Networks online create problems due to the lack of built relationships. Edges in hyperweb are ordered by strength in numbers more often than determined by quality of the connection. As of now, it is difficult to measure trust among edges. Perhaps if there was a way to build trust among online connections between machines, then the fear of privacy invasion would not be so prominent.

Below are some examples of patents which are fun to read through briefly.

http://www.google.com/patents?vid=USPAT6199106&id=vZIGAAAAEBAJ&pg=PA3&dq=email+for+displaying+advertisements#PPP1,M1

(This is an older patent, but provides the best example for the analogy of a trader with buyers and sellers. Page 8 of Drawings has an interesting setup with directed edges from servers where the server system acts as intermediary between e-mail client and outside sourcs such as advertisers.)

http://www.google.com/patents?vid=USPAT6513052&id=yIcOAAAAEBAJ&pg=RA1-PR2&dq=email+for+displaying+advertisements#PRA1-PA65,M1

Posted in Topics: General

View Comment (1) »

The Truth Is Out There: A UFO/Paranormal Refined Search Engine

Wednesday, March 14th, 2007 9:13 pm

Contributed by: babaganoush

UFO Crawler: The New Paranormal Search Engine
http://www.bigmouthmedia.com/live/articles/ufo-crawler-the-new-paranormal-search-engine-.asp/3572/

After graduating from a high school whose mascot was a “grey ghost,” named after the spirit of a girl who haunts the old building as occasionally reported by janitors and employees, I have had a soft side for people who make seemingly wild claims about seeing ghosts or UFO’s. The television launch of the X-Files in 1993 brought a wave of obsession over paranormal phenomena into pop culture, as did popular movies like Independence Day. Twentieth Century US History is certainly not complete without mentions of the 1947 Roswell incident, as well as the now de-classified two decade investigation of UFO phenomena made by the US Air Force between 1952-1969 known as Project Blue Book. Famous sightings are of course not limited to what may seem like a by-product of mid-20th Century American pop culture, but rather they transcend world history and civilizations, dating as far back as an Egyptian Pharoh’s description of multiple “circles of fire” in the sky and as recent as the Tunguska impact event in Siberia in 1908. Ghosts are sighted even more frequently, especially after EVP (electronic voice phenomena) was recently popularized with methods of analyzing Gaussian white noise in digital recordings.

With millions of ghost and UFO sightings reported throughout history, what’s a paranormal enthusiast to do to navigate that vast network of special information? The article cited above talks about a new site launched jointly by IBM and the Anomalies Network known as the UFO Crawler (www.ufocrawler.com) that is a dedicated search engine for paranormal phenomena. The Anomalies Network is currently the world’s largest online database of documents and files containing reports of UFO sightings and other abnormal mysteries. The article claims that paranormal sightings have recently been on the rise, so I guess they picked the perfect time to launch this specialty search engine. After all, what better time than now when just last week in the United Kingdom, there were reports of an object floating across the moon? Let’s not forget the incidents reported last month, one in Chicago’s O’Hare airport where pilots and mechanics of United Airlines witnessed a disc-shaped object hovering just below some dense clouds, and another where police in Phoenix received reports from residents of four bright lights near the horizon.

Why then, should there be dedicated search engines for things like this? The article goes on to claim that Google is an “excellent search engine on a mass scale, but finding specific information can be very difficult and requires fairly complex search queries.” When we discussed page ranking algorithms in class, used by these mass scale search engines, we were shown how hyperlinks to the page from other sites influence its ranking in a search query. When a paranormal enthusiast, for example, searches for “abduction” on Google, it will likely catch several alien abduction sites, but it will undoubtedly mix those results with child abduction sites, which to the general public is of more immediate importance and is likely to be hyperlinked more often. If we refine our search to “alien abduction,” once again there are a few sites central to the topic of alien abduction, but now we find that among the results are highly-linked articles from what’s generally perceived as a legitimate source (like the Harvard Gazette) explaining alien abduction away as an emotional response to trauma. A believer looking for the latest sightings, or trying to correlate historical sightings, does not want to hear gibberish like this. In a similar popular media discussion of the UFO Crawler search engine, it is cited that a search on “Area 51″ returns thousands of irrelevant and inaccurate results. It is very easy to see intuitively that a phrase like “Area 51,” which has made it to all corners of pop culture and has been referenced in so many songs, TV shows and movies to date, would give a great deal of bad results. Finally, it’s not as though Google doesn’t already have a solution to this refined database search in their Enterprise search applications, but the article goes on to say that the UFO Crawler prefers to use the free OmniFind Yahoo! Engine instead of paying at least $2000 for Google Enterprise Solutions. Offering a refined search engine for the famed Anomalies Network and digging into the vast network of paranormal stories is certainly a good step forward for the community of paranormal enthusiasts, but what does it mean for someone like me just looking for an old high school legend? Alas, typing “grey ghost” into the UFO Crawler returns me nothing relevant…

Posted in Topics: Technology, social studies

No Comments

Video PageRank?

Wednesday, March 14th, 2007 8:43 pm

Contributed by: outtatime

Interview with Google’s Matt Cutts about Next-Generation Search

In a recent interview with Read/WriteWeb, Google employee (head of the Webspam team) Matt Cutts talks briefly about applying Google’s PageRank concept to video search. Video search is becoming increasingly important to Google as it acquired and begins to integrate YouTube with its homebrewed Google Video offering. Where web search allows for a PageRank analysis just by text parsing an html page, video search has no simple analog. Much time and many resources could be put into developing and utilizing sophisticated tools to extract transcripts from uploaded video on the fly, but even for Google this would seem to be prohibitive. Instead, companies such as Google must rely on metadata to do video search.

In the interview, Cutts explains,

in the Web we have this notion of reputation - which is PageRank, it’s how many people link to your site and it’s also the quality of that incoming set of links. So it’s fun to think about things like reputation in video search - whether it be for Google Video or YouTube - because you don’t have links necessarily. You might have things that are somewhat similar to links, but you look at the quality of the users, the quality of the ratings. I think in lots of ways it gives Google good practice to think about the power of people, and the power of trust - and how to apply that in a lot of different areas.

Now, not much is known about Google’s video search algorithms (compared to web search PageRank), so I’ll speculate about how it could work:

Video search, in its current model, requires users to upload videos and describe them. The videos are then indexed into search databases and presented for public consumption. This leads to a complex graph with two different types of nodes - a “user” node and a “video” node. Each user has the videos it uploaded connected to it. But the user also is connected to videos it watches, and these edges can have weightings based on metadata such as rating given, amount of the video watched, number of times watched, number of people the user shared the video with, etc. The video nodes can have metadata such as text descriptions, uploading user, associated website, labels, titles, comments, date uploaded, length, format, language, etc. Data such as websites that link to or embed the video can be maintained. Users can have metadata such as a list of videos uploaded, a list of videos viewed, a list of videos shared, time spent on the site, etc.

All of these aspects can be combined to determine this concept of “reputation” that is central to the PageRank method. A user who watched “good” videos will been seen as a “good” user. A video that is watched by “good” users will be seen as a “good” video. Rankings and comments can be used to help determine “quality” of both users and videos. These should all afford a PageRank type analysis where a video’s rank is a function of the number and quality of users pointing to it, and a user’s rank is a function of the quality of video’s it points to.

It should be possible to observe a variant of triadic closure if you superimpose a graph of user relationships on the user/video graph. That is, if a pair of users has a strong tie, and one of those users has some strong tie (by some definition) to a video, it should be likely (or at least likelier) that the other user has developed at least a weak tie to the video (maybe has watched part of it once, or has heard of it).

Posted in Topics: Technology

View Comment (1) »

An example of failed viral marketing

Wednesday, March 14th, 2007 6:43 pm

Contributed by: icarus

New Sony Viral Marketing Ploy Angers Consumers

The Dynamics of Viral Marketing

The idea of viral marketing was touched on earlier in the blog, but recent decisions by Sony Corporation provide a particularly interesting example of how viral marketing sometimes can hurt more than help. The first link is a blog post about Sony’s recent campaign to promote the Playstation Portable. In an attempt to raise flagging sales of their handheld game console, Sony Corporation contracted a marketing firm to create a viral marketing campaign. Once the website was up, the public easily saw through the ruse. The response was…interesting, to say the least. The website was subsequently dismantled within weeks of its initial deployment, due to increasing negative attention directed at Sony for misleading consumers.

The second attached link is an analysis of buying patterns based upon recommendations. From the statistical analysis, inaccurately targeted viral marketing can cause, at best, indifference to the product, as the purchase vs. recommendation graph shows a power-law falloff as the number of recommendations increases. This is different from the discussed model of social networking, in which steadily increasing probability of a connection was assumed as edges between nodes (in this case, recommendations/purchases) were added. Also, instead of information flowing outward from well connected nodes affecting other groups, the paper shows that it is more likely that at the level of tight, localized groups that the most effect is felt from group like/dislikes, the effects falling off outside of these tight groupings.

In class, we discussed the role of social networks and triadic closure between links in a social network, and marketing is closely related to such. In any sort of marketplace, trust in links between nodes is a key issues, both customer-company (to ensure that a company maintains a customer base), and customer-customer (to add customers to that base). Much as in Kossinets and Watts’ study of triadic closure in e-mail networks, similar buying patterns emerge in close social networks where the recommendations of individuals (customer-customer ties) affect the nature of buying decisions. But the study shows that this is more common in tightly tied groups of smaller sizes, perhaps as a response to ubiquitious marketing.

Therefore, it might seem tempting to tailor your advertising to a specific subset of the network. However, even if the advertising is tightly focused advertising, it has to also be advertising of a compatible type. This stems back to the strength of “trust” ties, or at least respect, between people and a company. At best, it could increase purchases within a specific small subset, and hopefully stir at least some word of mouth. At worst, it can cause extremely negative perception of both the company and the product as a whole, as in the example of Sony. Here, the advertising was seen as patronizing, breaking the trust ties, and thereby the negative response heightened as word spread. Also, from the Sony campaign, it seems that while recommendations appear to be limited to within tight groupings, negative feedback is unfortunately not limited in the same way.

Posted in Topics: General, social studies

No Comments

How New Nodes Get Incorporated into the Brain

Wednesday, March 14th, 2007 6:20 pm

Contributed by: sithswine182

http://blog.sciam.com/index.php?title=where_new_neurons_go_to_work_1&more=1&c=1&tb=1&pb=1

The above link goes to a short series of posts on one of Scientific American’s seminar blogs. Contributors review and discuss an article by Kee et al. that appeared in Nature Neuroscience on Feb. 4, 2007. The name of the article is “Prefrontal incorporation of adult-generated granule cells into spatial memory networks in the dentate gyrus.”

Doug Fields, a principal investigator at the NIH, explains that new neurons are born in particular parts of the brain throughout our lives, but it’s still unclear how these new neurons are incorporated into established neural networks. In fact, many scientists believe that the new neurons are functionless or, if anything, are used by the brain only as simple substitutes for worn-out or damaged neurons.

The Key et al. (2007) article that’s reviewed in the blog suggests that newly generated neurons are integrated into existing neural networks in the hippocampus, and in fact play a distinct functional role in encoding new memories. The Key et al. study was carried out on mice that were trained to master the Morris water maze. In the Morris water maze, a mouse swims around to find an underwater platform that it can stand on and catch a breather. After doing the water maze a few times, a normal mouse will remember where the platform is and will swim right to it. If the mouse really has learned something new, certain memory-related genes in hippocampal neurons will be activated so that the neurons will build proteins that reinforce relevant synapses (edges). To identify newly generated neurons, Key et al. injected the mouse with a genetic marker that would light up in all the new neurons.

Key et al. found that after four to six weeks, memory-related genes were indeed activated in the newly generated neurons. In fact, there were more new neurons than old ones active in the new memory formation. While existing neurons were already established in neural networks and circuits, the new neurons were apparently more flexible in encoding new memories and forming new edges.

In lecture we learned about network exchange theory, and how adding new nodes to an economic-exchange network will affect the network’s set of balanced outcomes. Analogously, adding new nodes to the hippocampal neural network changes the network dynamics and functionality. Just as with characterizing the evolution of economic networks, understanding the ways in which new nodes change neural network dynamics is a hot research topic.

Posted in Topics: Science

No Comments

Game Advertising

Wednesday, March 14th, 2007 6:17 pm

Contributed by: Captain Planet

http://www.game-advertising-online.com/index.php

Advertising helped Google to stay up and running and now it’s helping other websites bring entertainment to the masses. game-advertising-online.com (GAO from here on) is helping internet games make more money by giving them exposure or an extra bit of income. The whole operation rests heavily on two principles that we have discussed in Networks class. The first is the idea of advertising to a target audience or people within a certain network, and the second is how to find the best price for the advertising service.

Google links advertisements to keyword searches, so advertisers pay for a word that their future customers might search for. With games, it is actually even easier to find a target audience. You want to show your advertisement to some one who likes games. If a person likes games, they will probably be playing them or reading about which ones to play. GAO pays sites with browser based games or discussion of games for a space to advertise other games. It builds a network of gaming sites around it and encourages them to point to each other, making more links and allowing gamers to explore.

To find a good price GAO uses a system based on cost per click, click through ratio, and cost per thousand clicks that is probably very similar to the system used by Google. The advertiser makes a banner and then places a bid for cost per click (these are typically around $0.05 but can be as high as $0.20). Click through ratio, CTR, is the percentage of impressions that result in a click. When the advertisement is new, its CTR is determined by showing the add 10,000 times and counting how many times it gets clicked. This allows a fair trial for every advertisement. An advertisement’s rating or CPM (cost per thousand impressions) is the product of its CPC, CTR, and 10 (this is the approximate price the advertiser will pay for a certain number of impressions). Advertisers can increase their ranking either by bidding more money or making their banners more attractive. So the higher the bid and the more often it is clicked, the higher its rating will be. GAO shows the highly rated sites most often because it leads to more advertisement clicks. Clicks mean that the consumers are getting a new game to play (or at least are somewhat interested), the game advertised is getting more players, and GAO is making money and passing some on to the sites that display its advertisements. So maximizing number of clicks maximizes benefit to people in every category.

Posted in Topics: Education

No Comments

Game Theory and Terrorism

Wednesday, March 14th, 2007 9:44 am

Contributed by: stella

“Game Theorist Describes Unintended Consequences of U.S. Counterterrorism Policies”

Link- http://nsf.gov/discoveries/disc_summ.jsp?cntn_id=100065&org=SBE

The above article by the National Science Foundation talks about research done in analyzing government strategies when it comes to terrorist incidents. Although dated in mid-2004, it is still worth a read. The particularly interesting part is that the researchers actually use Game Theory (and econometric methods as well) in order to analyze trends and form their conclusion. Such a study reminds me of the Hawk-Dove Game described during the first lecture involving Game Theory. It applies variations to the simple model that we put together in class.

For instance, the project analyzed best government response (such as deterrence) when the location of an attack is uncertain– the choice of terrorists. The researchers learned that terror actually increases as security measures are heightened. Imagine the resulting game that can be created.

Although the research started over 20 years ago, the area has become more and more valuable recently– especially after September 11th. The article is not very detailed, and it would be compelling to see exactly how Game Theory methods are applied. I would love to further research this topic for a future entry.

Posted in Topics: General, social studies

No Comments

Opening Up the Keyword Based Advertising Market

Freebase: A Semantic Web

Link bombs: Clever Manipulation of Search Algorithms

Privacy in the use of Keyword-Based Advertising

The Truth Is Out There: A UFO/Paranormal Refined Search Engine

Video PageRank?

An example of failed viral marketing

How New Nodes Get Incorporated into the Brain

Game Advertising

Game Theory and Terrorism

Information

Categories

Previous Posts