@nosebrain

Web Page Segmentation with Structured Prediction and Its Application in Web Page Classification

, , , , and . Proceedings of the 37th International ACM SIGIR Conference on Research &\#38; Development in Information Retrieval, page 767--776. New York, NY, USA, ACM, (2014)
DOI: 10.1145/2600428.2609630

Abstract

We propose a framework which can perform Web page segmentation with a structured prediction approach. It formulates the segmentation task as a structured labeling problem on a transformed Web page segmentation graph (WPS-graph). WPS-graph models the candidate segmentation boundaries of a page and the dependency relation among the adjacent segmentation boundaries. Each labeling scheme on the WPS-graph corresponds to a possible segmentation of the page. The task of finding the optimal labeling of the WPS-graph is transformed into a binary Integer Linear Programming problem, which considers the entire WPS-graph as a whole to conduct structured prediction. A learning algorithm based on the structured output Support Vector Machine framework is developed to determine the feature weights, which is capable to consider the inter-dependency among candidate segmentation boundaries. Furthermore, we investigate its efficacy in supporting the development of automatic Web page classification.

Description

Web page segmentation with structured prediction and its application in web page classification

Links and resources

Tags

community

  • @nosebrain
  • @dblp
@nosebrain's tags highlighted