Название: Introduction to Classifier Performance Analysis with R Автор: Sutaip L.C. Saw Издательство: CRC Press Серия: Data Science Series Год: 2025 Страниц: 222 Язык: английский Формат: pdf (true), epub Размер: 10.1 MB
Classification problems are common in business, medicine, science, engineering and other sectors of the economy. Data scientists and Machine Learning professionals solve these problems through the use of classifiers. Choosing one of these data driven classification algorithms for a given problem is a challenging task. An important aspect involved in this task is classifier performance analysis (CPA).
Introduction to Classifier Performance Analysis with R provides an introductory account of commonly used CPA techniques for binary and multiclass problems, and use of the R software system to accomplish the analysis. Coverage draws on the extensive literature available on the subject, including descriptive and inferential approaches to CPA. Exercises are included at the end of each chapter to reinforce learning.
This book is for those who want a reasonably complete (at least at an introductory level) and up-to-date coverage on the analysis of classification algorithms through the use of performance measures and curves. It attempts to synthesize useful material from the vast published literature on the subject. Another motivation for the book is to show how R can be used to perform the required analysis. As computational software, R has already demonstrated its excellence to a large international community of users. Its appeal is further enhanced by recently developed packages and meta-packages for Data Science, Machine Learning, and classification performance analysis in particular.
Given the recent advances in computing technology and power, disciplines like Data Science (DS) and Machine Learning (ML) have grown to be increasingly relevant, important, and useful. Classification problems are common in applications like fraud detection, target marketing, credit scoring, disease detection, customer churn prediction, spam filtering, and quality control (this list is not exhaustive). Data scientists and Machine Learning professionals solve these problems through the use of classifiers, i.e., data-driven classification models or algorithms that facilitate prediction of class labels and membership probabilities based on the features of cases (e.g., individuals or objects). Choosing the right classifier for a given problem is a challenging task. A critical aspect of what is involved is classifier performance analysis (CPA).
This book provides an introduction to CPA and the use of the R software system to accomplish the analysis. Much of what is known about CPA is scattered throughout the published literature, including books and journals in disciplines other than DS and ML. To have the relevant material in one book, with expanded introductory discussions and elementary theoretical support where necessary, and illustrated with help from R is certainly useful for those who have to engage in CPA, particularly those who have yet to master the techniques and conceptual foundations underlying the analysis.
Python is a programming language that is commonly used by DS and ML professionals, and it does a great job for what it was designed for based on publications and information in alternative media the author has seen on this software. However, R and its superlative integrated development environment RStudio offers some appealing competitive advantages. Its excellence in serving the needs of analysts in DS and ML and other disciplines engaged in computational statistics and data mining is unquestionable. In particular, it can be used to train a wide variety of classifiers and, for our purpose, it provides a powerful and well-integrated collection of tools for binary and multiclass CPA given the availability of packages like yardstick and meta-packages like Tidyverse and Tidymodels. On a personal note, the author regards R the best choice for students learning to solve problems in computational statistics, Data Science, Machine Learning, and related disciplines.
Key Features:
An introduction to binary and multiclass classification problems is provided, including some classifiers based on statistical, machine and ensemble learning. Commonly used techniques for binary and multiclass CPA are covered, some from less well-known but useful points of view. Coverage also includes important topics that have not received much attention in textbook accounts of CPA. Limitations of some commonly used performance measures are highlighted. Coverage includes performance parameters and inferential techniques for them. Also covered are techniques for comparative analysis of competing classifiers. A key contribution involves the use of key R meta-packages like Tidyverse and Tidymodels for CPA, particularly the very useful Yardstick package.
This is a useful resource for upper level undergraduate and masters level students in Data Science, Machine Learning and related disciplines. Practitioners interested in learning how to use R to evaluate classifier performance can also potentially benefit from the book. The material and references in the book can also serve the needs of researchers in CPA.