High Quality High Performance Clustering
Leland McInnes, John Healy
- Audience level:
- Novice
- Category:
- Python Libraries
Description
Clustering data is a common problem in data science. In the absence of labelled data having confidence in the results of clustering can be challenging. We present a high performance clustering library for scalable high quality clustering.
Abstract
Clustering data is a common problem in data science. In the absence of labelled data having confidence in the results of clustering can be challenging. We present the [hdbscan](https://github.com/lmcinnes/hdbscan) library, a high performance clustering library for scalable high quality clustering. We will discuss the requirements for high quality clustering, and compare and contrast our clustering implementation with other clustering algorithms available in python for both clustering quality and performance.