Название: Cluster Analysis: A Primer Using R Автор: Lior Rokach Издательство: World Scientific Publishing Год: 2025 Страниц: 303 Язык: английский Формат: pdf (true) Размер: 10.1 MB
Cluster analysis is a fundamental data analysis task that aims to group similar data points together, revealing the inherent structure and patterns within complex datasets. This book serves as a comprehensive and accessible guide, taking readers on a captivating journey through the foundational principles of cluster analysis.
The heart of the book is dedicated to a thorough examination of the various clustering algorithms, spanning partitioning methods, hierarchical methods, and more advanced techniques, such as mixture density-based clustering, graph clustering, and grid-based clustering. Each method is presented with a clear and concise explanation, accompanied by illustrative examples and hands-on implementations in the R programming language, a popular and powerful tool for data analysis and visualization.
Recognizing the importance of cluster validation and evaluation, the book devotes a dedicated chapter to exploring a wide range of internal and external quality criteria, equipping readers with the necessary tools to assess the performance of clustering algorithms. For those eager to stay at the forefront of the field, the book also presents Deep Learning-based clustering methods, showcasing the remarkable capabilities of neural networks in uncovering hidden structures within complex, high-dimensional data.
Cluster analysis, at its core, is the art of grouping similar data points together, revealing the inherent structure and patterns within complex datasets. In this book, readers will become familiar with the foundational principles of cluster analysis, starting with an overview of data science and data mining, followed by a deep dive into the taxonomy of Machine Learning tasks. This solid groundwork sets the stage for the exploration of crucial concepts, such as similarity measures, which form the backbone of the clustering process.
R is a free software programming language widely utilized by data scientists to develop data mining algorithms. The extensive and diverse range of contributed packages available in the Comprehensive R Archive Network (CRAN) allows most Data Science tasks to be efficiently completed with concise script code in R. The strength of R comes from the various functions which are provided by different freely available packages. Specifically, several packages offer the implementation of various clustering methods. In this section, we will focus on the clustering methods that come with the stats package of R. Stats is a basic package that is automatically loaded at the start of an R session. It provides a broad range of statistical functionality, including the implementation of the two most popular clustering algorithms: kmeans and hclust. The usage of these two methods is described below.
Whether you are a student seeking to expand your knowledge, a data analyst looking to enhance your toolbox, or a researcher exploring the frontiers of data analysis, this book will provide you with a solid foundation in cluster analysis and empower you to tackle a wide range of data-driven problems.
Readership: Advanced undergraduate and graduate students, researchers and practitioners in the fields of Machine Learning, statistics, social sciences, data analysis, Data Science, data mining and bioinformatics.