Video PageRank?

Interview with Google’s Matt Cutts about Next-Generation Search

In a recent interview with Read/WriteWeb, Google employee (head of the Webspam team) Matt Cutts talks briefly about applying Google’s PageRank concept to video search. Video search is becoming increasingly important to Google as it acquired and begins to integrate YouTube with its homebrewed Google Video offering. Where web search allows for a PageRank analysis just by text parsing an html page, video search has no simple analog. Much time and many resources could be put into developing and utilizing sophisticated tools to extract transcripts from uploaded video on the fly, but even for Google this would seem to be prohibitive. Instead, companies such as Google must rely on metadata to do video search.

In the interview, Cutts explains,

in the Web we have this notion of reputation - which is PageRank, it’s how many people link to your site and it’s also the quality of that incoming set of links. So it’s fun to think about things like reputation in video search - whether it be for Google Video or YouTube - because you don’t have links necessarily. You might have things that are somewhat similar to links, but you look at the quality of the users, the quality of the ratings. I think in lots of ways it gives Google good practice to think about the power of people, and the power of trust - and how to apply that in a lot of different areas.

Now, not much is known about Google’s video search algorithms (compared to web search PageRank), so I’ll speculate about how it could work:

Video search, in its current model, requires users to upload videos and describe them. The videos are then indexed into search databases and presented for public consumption. This leads to a complex graph with two different types of nodes - a “user” node and a “video” node. Each user has the videos it uploaded connected to it. But the user also is connected to videos it watches, and these edges can have weightings based on metadata such as rating given, amount of the video watched, number of times watched, number of people the user shared the video with, etc. The video nodes can have metadata such as text descriptions, uploading user, associated website, labels, titles, comments, date uploaded, length, format, language, etc. Data such as websites that link to or embed the video can be maintained. Users can have metadata such as a list of videos uploaded, a list of videos viewed, a list of videos shared, time spent on the site, etc.

All of these aspects can be combined to determine this concept of “reputation” that is central to the PageRank method. A user who watched “good” videos will been seen as a “good” user. A video that is watched by “good” users will be seen as a “good” video. Rankings and comments can be used to help determine “quality” of both users and videos. These should all afford a PageRank type analysis where a video’s rank is a function of the number and quality of users pointing to it, and a user’s rank is a function of the quality of video’s it points to.

It should be possible to observe a variant of triadic closure if you superimpose a graph of user relationships on the user/video graph. That is, if a pair of users has a strong tie, and one of those users has some strong tie (by some definition) to a video, it should be likely (or at least likelier) that the other user has developed at least a weak tie to the video (maybe has watched part of it once, or has heard of it).

Posted in Topics: Technology

Responses are currently closed, but you can trackback from your own site.

One response to “Video PageRank?”

  1. Cornell Info 204 Digest » Blog Archive » Privacy, Multimedia Search, and Neural Networks Says:

    […] Another post by sithswine182 touched on the topic of multimedia search (e.g., searching for pictures or videos). One way to make advancements in this area is to use pre-labeled data (such as images annotated with keywords) to detect statistically meaningful features to feed back into the ranking algorithms. Unfortunately, our collection of annotated multimedia data is very noisy. To address this deficiency, Google developed its Image Labeler site. This site is played like a game where you and another anonymous user try to annotate a random image with as many keywords as possible. All keywords submitted by both you and your partner are scored. As an incentive, Google keeps track of the top scoring users and user pairs. […]



* You can follow any responses to this entry through the RSS 2.0 feed.