The Future Demand for Effective Web Page Filtering

            I’ve never really been a big user of blogs (this post will be my second ever) so I’m not sure if the system that has been set up for us has been successful or not by conventional standards.  That being said, one thing that bothers me about our NSDL resource is that there does not appear to be a great deal of discussion about some of the great ideas that are being thrown out there.  That being said, this is my attempt to provoke some discussion about future means of web site identification so that we may begin to digest one another’s thoughts, rather than leaving that fun to the course instructors.

            Clearly, the volume and diversity of the internet’s web page population has made the design of efficient and productive search engines to be quite challenging.  Even though I now have a better idea about how search engine algorithms work I still can’t seem to ever find anything I’m looking for.  I must admit that I don’t have a very elaborate background in computers or software generation (Matlab is the only programming language I can effectively use) but I was wondering if it would be possible to come up with some sort of system that would help filter information out of web pages.  For example, if someone were interested in creating a page, how helpful would it be for some service, say a new type of search engine, to provide a type of ID tag to accompany the page?  This tag, which would most likely be created by the web site’s creator, could contain information about the title of any articles on the page, any companies which the page represents, any keywords appropriate to the page, et cetera.  Search engines could then consult this information to better filter the results for certain queries.  Obvious problems would include applying the technology to the millions and millions of pages already out there and dealing with the falsification of ID tag information, which would result in certain pages showing up in results for irrelevant queries.  Though the types of hyperlink sorting and matching methods currently in use are quite sophisticated and can be effective for some, I foresee a crippling problem in search engines’ inability to accommodate the demands of the ever growing web page population.  Maybe it’s not a good idea to talk about this stuff so openly, since I’m sure vast amounts of money can be found in effective solutions, but I was just wondering if anyone who is a little more internet savvy agreed with my concerns or had any interesting ideas of their own.

Posted in Topics: Education

Responses are currently closed, but you can trackback from your own site.

3 Responses to “The Future Demand for Effective Web Page Filtering”

  1. Cornell Info 204 - Networks » Blog Archive » Social Networking Systems and Web Information Retrieval Says:

    […] As beefcake points out in a recent post, there is significant need for effective web page filtering. Due to the web’s monolithic size and the dynamic nature of its content, effective filtering becomes very difficult. In class, we focused on exploiting the hyperlink structure of the web to rank pages relevant to a query. Though this is a powerful, proven method used by Google and other search engines, it confines the user to using queries to explore the web. While this helps users locate relevant items, it is not particularly useful for discovery of things that they aren’t yet aware of but may find interesting. […]

  2. Cornell Info 204 Digest » Blog Archive » Search and Information Cascades Says:

    […] Also on the topic of information search, beefcake raised the question of whether it would be possible to design a system which can filter information from web pages. An interesting answer was provided by ahuckk. In class, we focused on the traditional query-based search engines. Some new social media sites, such as blogs, wikis, Digg and Flickr, rely on the users to actively create, evaluate and distribute information, thus providing an effective information filtering approach. For example, Digg allows users to submit, vote on, and discuss news stories they find on the web. The votes of users determine which stories are promoted to the front page. Interestingly, the voting system of Digg is also related to another topic we covered in class: information cascades. A study by Kristina Lerman shows that Digg users are more likely to be interested in the news stories their friends also find interesting, and the filtering may result in overrepresentation of an article. […]

  3. Cornell Info 204 - Networks » Blog Archive » Value and Growth of Networked Information: Aggregation, Search in a Semantic Web, and HCI on Graphs Says:

    […] As pointed out by beefcake [link], finding relevant content often limits users on the internet more than lack of content.  Of course, companies like Google have profited greatly from search: utilizing the link structure of the internet to help users navigate.  Indeed, web growth these days is very much dependent on the aggregation, search, and classification of cheap content.  The age of “Content is King” on the web has gone. […]



* You can follow any responses to this entry through the RSS 2.0 feed.