Abstract

We present a new approach to clustering based on the observation that is easier to criticize than to construct." Our approach of semi-supervised clustering allows a user to iteratively provide feedback to a clustering algorithm. The feedback is incorporated in the form of constraints which the clustering algorithm attempts to satisfy on future iterations. These constraints allow the user to guide the clusterer towards clusterings of the data that the user nds more useful. We demonstrate semi-supervised clustering with a system that learns to cluster news stories from a Reuters data set. Introduction Consider the following problem: you are given 100,000 text documents (e.g., papers, newsgroup articles, or web pages) and asked to group them into classes or into a hierarchy such that related documents are grouped together. You are not told what classes or hierarchy to use or what documents are related; you have some criteria in mind, but may not be able to say exactly w...

Description

CiteSeerX — Semi-supervised Clustering with User Feedback

Links and resources

Tags

community

  • @schwemmlein
  • @ldietz
@schwemmlein's tags highlighted