Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
D. Mollá, and B. Hutchinson. Proceedings of the EACL 2003 Workshop on EvaluationInitiatives in Natural Language Processing: are evaluation methods,metrics and resources reusable?, page 43--50. Association for Computational Linguistics, (2003)