Web Page Segmentation with Structured Prediction and Its Application in Web Page Classification
L. Bing, R. Guo, W. Lam, Z. Niu, and H. Wang. Proceedings of the 37th International ACM SIGIR Conference on Research &\#38; Development in Information Retrieval, page 767--776. New York, NY, USA, ACM, (2014)
DOI: 10.1145/2600428.2609630
Abstract
We propose a framework which can perform Web page segmentation with a structured prediction approach. It formulates the segmentation task as a structured labeling problem on a transformed Web page segmentation graph (WPS-graph). WPS-graph models the candidate segmentation boundaries of a page and the dependency relation among the adjacent segmentation boundaries. Each labeling scheme on the WPS-graph corresponds to a possible segmentation of the page. The task of finding the optimal labeling of the WPS-graph is transformed into a binary Integer Linear Programming problem, which considers the entire WPS-graph as a whole to conduct structured prediction. A learning algorithm based on the structured output Support Vector Machine framework is developed to determine the feature weights, which is capable to consider the inter-dependency among candidate segmentation boundaries. Furthermore, we investigate its efficacy in supporting the development of automatic Web page classification.
Description
Web page segmentation with structured prediction and its application in web page classification
%0 Conference Paper
%1 Bing:2014:WPS:2600428.2609630
%A Bing, Lidong
%A Guo, Rui
%A Lam, Wai
%A Niu, Zheng-Yu
%A Wang, Haifeng
%B Proceedings of the 37th International ACM SIGIR Conference on Research &\#38; Development in Information Retrieval
%C New York, NY, USA
%D 2014
%I ACM
%K classification page prediction segmentation structured web
%P 767--776
%R 10.1145/2600428.2609630
%T Web Page Segmentation with Structured Prediction and Its Application in Web Page Classification
%U http://doi.acm.org/10.1145/2600428.2609630
%X We propose a framework which can perform Web page segmentation with a structured prediction approach. It formulates the segmentation task as a structured labeling problem on a transformed Web page segmentation graph (WPS-graph). WPS-graph models the candidate segmentation boundaries of a page and the dependency relation among the adjacent segmentation boundaries. Each labeling scheme on the WPS-graph corresponds to a possible segmentation of the page. The task of finding the optimal labeling of the WPS-graph is transformed into a binary Integer Linear Programming problem, which considers the entire WPS-graph as a whole to conduct structured prediction. A learning algorithm based on the structured output Support Vector Machine framework is developed to determine the feature weights, which is capable to consider the inter-dependency among candidate segmentation boundaries. Furthermore, we investigate its efficacy in supporting the development of automatic Web page classification.
%@ 978-1-4503-2257-7
@inproceedings{Bing:2014:WPS:2600428.2609630,
abstract = {We propose a framework which can perform Web page segmentation with a structured prediction approach. It formulates the segmentation task as a structured labeling problem on a transformed Web page segmentation graph (WPS-graph). WPS-graph models the candidate segmentation boundaries of a page and the dependency relation among the adjacent segmentation boundaries. Each labeling scheme on the WPS-graph corresponds to a possible segmentation of the page. The task of finding the optimal labeling of the WPS-graph is transformed into a binary Integer Linear Programming problem, which considers the entire WPS-graph as a whole to conduct structured prediction. A learning algorithm based on the structured output Support Vector Machine framework is developed to determine the feature weights, which is capable to consider the inter-dependency among candidate segmentation boundaries. Furthermore, we investigate its efficacy in supporting the development of automatic Web page classification.},
acmid = {2609630},
added-at = {2016-01-18T23:36:37.000+0100},
address = {New York, NY, USA},
author = {Bing, Lidong and Guo, Rui and Lam, Wai and Niu, Zheng-Yu and Wang, Haifeng},
biburl = {https://www.bibsonomy.org/bibtex/2a94669026484ed98d3b92e5dd7e60de6/nosebrain},
booktitle = {Proceedings of the 37th International ACM SIGIR Conference on Research \&\#38; Development in Information Retrieval},
description = {Web page segmentation with structured prediction and its application in web page classification},
doi = {10.1145/2600428.2609630},
interhash = {29b12aa71341250eba417b4f2b1fdefe},
intrahash = {a94669026484ed98d3b92e5dd7e60de6},
isbn = {978-1-4503-2257-7},
keywords = {classification page prediction segmentation structured web},
location = {Gold Coast, Queensland, Australia},
numpages = {10},
pages = {767--776},
publisher = {ACM},
series = {SIGIR '14},
timestamp = {2016-01-18T23:36:37.000+0100},
title = {Web Page Segmentation with Structured Prediction and Its Application in Web Page Classification},
url = {http://doi.acm.org/10.1145/2600428.2609630},
year = 2014
}