tag :: near duplicate

bookmarks (hide)1
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1Near-duplicates and shingling
can now generate all pairs $i,j$ for which $x_i^\pi$ is present in both their sketches. From these we can compute, for each pair $i,j$ with non-zero sketch overlap, a count of the number of $x_i^\pi$ values they have in common. By applying a preset threshold, we know which pairs $i,j$ have heavily overlapping sketches. For instance, if the threshold were 80%, we would need the count to be at least 160 for any $i,j$. As we identify such pairs, we run the union-find to group documents into near-duplicate ``syntactic clusters''. This is essentially a variant of the single-link clustering algorithm introduced in Section 17.2 (page [*]).
13 years ago by @stroeh
show all tags
duplicate
near
shingle
shingling
duplicatenearshingleshingling
(0)
copydelete
- community post
- history of this post

⟨⟨
⟨
1
⟩
⟩⟩

publications (hide)9
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

2Wikipedia in the pocket: indexing technology for near-duplicate detection and high similarity search.
M. Potthast. SIGIR, page 909. ACM, (2007)
13 years ago by @stroeh
show all tags
detection
duplicate
near
similarity
detectionduplicatenearsimilarity
(0)
copydeleteadd this publication to your clipboard
1Efficient similarity joins for near duplicate detection.
C. Xiao, W. 0011, X. Lin, and J. Yu. WWW, page 131-140. ACM, (2008)
13 years ago by @stroeh
show all tags
detection
duplicate
near
detectionduplicatenear
(0)
copydeleteadd this publication to your clipboard
1Duplicate and Near Duplicate Documents Detection: A Review
J. Kumar, and P. Govindarajulu. European Journal of Scientific Research, (2009)
13 years ago by @stroeh
show all tags
detection
duplicate
near
detectionduplicatenear
(0)
copydeleteadd this publication to your clipboard
2Lexicon randomization for near-duplicate detection with I-Match
A. Kolcz, and A. Chowdhury. J. Supercomput., (September 2008)
13 years ago by @stroeh
show all tags
detection
duplicate
near
detectionduplicatenear
(0)
copydeleteadd this publication to your clipboard
4Achieving both high precision and high recall in near-duplicate detection.
L. Huang, L. Wang, and X. Li. CIKM, page 63-72. ACM, (2008)
13 years ago by @stroeh
show all tags
detection
duplicate
near
detectionduplicatenear
(0)
copydeleteadd this publication to your clipboard
2Adaptive near-duplicate detection via similarity learning.
H. Hajishirzi, W. tau Yih, and A. Kolcz. SIGIR, page 419-426. ACM, (2010)
13 years ago by @stroeh
show all tags
detection
duplicate
near
similarity
detectionduplicatenearsimilarity
(0)
copydeleteadd this publication to your clipboard
2Detecting Near-Duplicates in Large-Scale Short Text Databases.
C. Gong, Y. Huang, X. Cheng, and S. Bai. PAKDD, volume 5012 of Lecture Notes in Computer Science, page 877-883. Springer, (2008)
13 years ago by @stroeh
show all tags
detection
duplicate
near
detectionduplicatenear
(0)
copydeleteadd this publication to your clipboard
2Finding similar files in large document repositories.
G. Forman, K. Eshghi, and S. Chiocchetti. KDD, page 394-400. ACM, (2005)
13 years ago by @stroeh
show all tags
detection
document
duplicate
near
similar
detectiondocumentduplicatenearsimilar
(0)
copydeleteadd this publication to your clipboard
5Identifying and Filtering Near-Duplicate Documents.
A. Broder. CPM, volume 1848 of Lecture Notes in Computer Science, page 1-10. Springer, (2000)
13 years ago by @stroeh
show all tags
detection
duplicate
near
detectionduplicatenear
(0)
copydeleteadd this publication to your clipboard

⟨⟨
⟨
1
⟩
⟩⟩

BibSonomy

bookmarks (hide)1
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1Near-duplicates and shingling

publications (hide)9
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

2Wikipedia in the pocket: indexing technology for near-duplicate detection and high similarity search.

1Efficient similarity joins for near duplicate detection.

1Duplicate and Near Duplicate Documents Detection: A Review

2Lexicon randomization for near-duplicate detection with I-Match

4Achieving both high precision and high recall in near-duplicate detection.

2Adaptive near-duplicate detection via similarity learning.

2Detecting Near-Duplicates in Large-Scale Short Text Databases.

2Finding similar files in large document repositories.

5Identifying and Filtering Near-Duplicate Documents.

browse

related tags

bookmarks (hide)1 displayallbookmarks onlybookmarks per page5102050100 sort byadded attitle RSSBibTeXXML

publications (hide)9 displayallpublications onlypublications per page5102050100 sort byadded attitleauthorpublication dateentry typehelp for advanced sorting... RSSBibTeXRDFmore...

browse

related tags

bookmarks (hide)1
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

publications (hide)9
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...