Web Graph Project

 

Welcome to the web page of the Web Graph project!


About the Web graph project:

  The Web Graph project is a part of the Web Research Infrastructure Project, presently being done at Cornell University Computing and Information Science department. The Web Research Infrastructure project is a joint venture between the Internet Archives and Cornell to provide data and computing facilities to researchers for study of the World Wide Web. The Internet Archives provide data which comprises of web crawls. Cornell Theory Center provides infrastructure and computing facilities for hosting and processing the data. This project is funded by NSF.

What is the Web Graph?

  The Web Graph is a large directed graph that represents the link structure of the World Wide Web. The nodes in this graph are the various URLs in the WWW, and each edge in the graph represents a hyperlink. The project aims to extract this graph from a web crawl, and make it available to researchers who want to study it. This graph will provide new insights to the structure and evolution of the WWW. As you can imagine, the Web Graph has lot of practical uses also!

  The project keeps the Web Graph in a compressed format in the RAM of a server, and will provide an network based API for operating on it.

What are the Challenges?

  The Web Graph is a very large graph, one of the largest ever studied. The WWW has more than a billion URLs, each of which forms a node in the Web graph. Number of edges in the graph is much larger than that. The size and complexity of this graph present unique challenges in graph extraction, compression and storage. Fortunately, this graph  is very sparse and has specific demographic characteristics, so specialized algorithms can be used to compress it to a manageable size.

 

 This project is currently done by Jerrin Kallukalam and Karthik Jeyabalan under the guidance of Prof Bill Arms.

 This project is presented to BOOM 2005.

 

                                                                                                    more info coming soon