Team for Research in
Ubiquitous Secure Technology

A path-based approach for web page retrieval
Jian-qiang Li, Yu Zhao, Hector Garcia-Molina

Citation
Jian-qiang Li, Yu Zhao, Hector Garcia-Molina. "A path-based approach for web page retrieval". World Wide Web, 15(3):257-283, May 2012.

Abstract
Use of links to enhance page ranking has been widely studied. The underlying assumption is that links convey recommendations. Although this technique has been used successfully in global web search, it produces poor results for website search, because the majority of the links in a website are used to organize information and convey no recommendations. By distinguishing these two kinds of links, respectively for recommendation and information organization, this paper describes a path-based method for web page ranking. We define the Hierarchical Navigation Path (HNP) as a new resource for improving web search. HNP is composed of multi-step navigation information in visitors’ website browsing. It provides indications of the content of the destination page. We first classify the links inside a website. Then, the links for web page organization are exploited to construct the HNPs for each page. Finally, the PathRank algorithm is described for web page retrieval. The experiments show that our approach results in significant improvements over existing solutions.

Electronic downloads

Citation formats  
  • HTML
    Jian-qiang Li, Yu Zhao, Hector Garcia-Molina. <a
    href="http://www.truststc.org/pubs/896.html" >A
    path-based approach for web page retrieval</a>,
    <i>World Wide Web</i>, 15(3):257-283, May 2012.
  • Plain text
    Jian-qiang Li, Yu Zhao, Hector Garcia-Molina. "A
    path-based approach for web page retrieval".
    <i>World Wide Web</i>, 15(3):257-283, May 2012.
  • BibTeX
    @article{LiZhaoGarciaMolina12_PathbasedApproachForWebPageRetrieval,
        author = {Jian-qiang Li and Yu Zhao and Hector Garcia-Molina},
        title = {A path-based approach for web page retrieval},
        journal = {World Wide Web},
        volume = {15},
        number = {3},
        pages = {pp.257-283},
        month = {May},
        year = {2012},
        abstract = {Use of links to enhance page ranking has been
                  widely studied. The underlying assumption is that
                  links convey recommendations. Although this
                  technique has been used successfully in global web
                  search, it produces poor results for website
                  search, because the majority of the links in a
                  website are used to organize information and
                  convey no recommendations. By distinguishing these
                  two kinds of links, respectively for
                  recommendation and information organization, this
                  paper describes a path-based method for web page
                  ranking. We define the Hierarchical Navigation
                  Path (HNP) as a new resource for improving web
                  search. HNP is composed of multi-step navigation
                  information in visitors’ website browsing. It
                  provides indications of the content of the
                  destination page. We first classify the links
                  inside a website. Then, the links for web page
                  organization are exploited to construct the HNPs
                  for each page. Finally, the PathRank algorithm is
                  described for web page retrieval. The experiments
                  show that our approach results in significant
                  improvements over existing solutions.},
        URL = {http://www.truststc.org/pubs/896.html}
    }
    

Posted by Mary Stewart on 4 Apr 2012.
For additional information, see the Publications FAQ or contact webmaster at www truststc org.

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.