Название: The Data Science Handbook, 2nd Edition Автор: Field Cady Издательство: Wiley Год: 2025 Страниц: 368 Язык: английский Формат: True/Retail EPUB, PDF Размер: 10.1 MB
Practical, accessible guide to becoming a data scientist, updated to include the latest advances in Data Science and related fields.
Becoming a data scientist is hard. The job focuses on mathematical tools, but also demands fluency with software engineering, understanding of a business situation, and deep understanding of the data itself. This book provides a crash course in data science, combining all the necessary skills into a unified discipline.
The focus of The Data Science Handbook is on practical applications and the ability to solve real problems, rather than theoretical formalisms that are rarely needed in practice.
Among its key points are: • An emphasis on software engineering and coding skills, which play a significant role in most real Data Science problems. • Extensive sample code, detailed discussions of important libraries, and a solid grounding in core concepts from Computer Science (computer architecture, runtime complexity, and programming paradigms). • A broad overview of important mathematical tools, including classical techniques in statistics, stochastic modeling, regression, numerical optimization, and more. • Extensive tips about the practical realities of working as a data scientist, including understanding related jobs functions, project life cycles, and the varying roles of data science in an organization. • Exactly the right amount of theory. A solid conceptual foundation is required for fitting the right model to a business problem, understanding a tool's limitations, and reasoning about discoveries.
Data Science is a quickly evolving field, and this 2nd edition has been updated to reflect the latest developments, including the revolution in AI that has come from Large Language Models and the growth of ML Engineering as its own discipline. Much of data science has become a skillset that anybody can have, making this book not only for aspiring data scientists, but also for professionals in other fields who want to use analytics as a force multiplier in their organization.
As the discipline has expanded, the tools have also evolved, and I felt that a second edition was in order. By far the most important change I have made is more coverage of deep learning: previously I barely touched on RNNs, but now I continue up through topics such as encoder–decoder architectures, diffusion models, LLMs, and prompt engineering. AI tools are coming of age (perhaps AI is now where data science was 10 years ago) and a data scientist needs to be familiar with them. I have also updated my treatment of Spark to cover its new DataFrame interface, and reduced the emphasis on Hadoop since it is on the decline. Other changes include a reduced emphasis on Bayesian networks (which have waned in popularity with the rise of Deep Learning), a switch from Python 2 to Python 3, and numerous improvements to the prose.
The example code in this book is all in Python, except for a few domain‐specific languages such as SQL. My goal isn’t to push you to use Python; there are lots of good tools out there, and you can use whichever ones you want. However, I wanted to use one language for all of my examples, which lets readers follow the whole book while only knowing one language. Of the various languages available, there are two reasons why I chose Python:
• Python is without question the most popular language for data scientists. R is its only major competitor, at least when it comes to free tools. I have used both extensively, and I think that Python is flat‐out better (except for some obscure statistics packages that have been written in R and that are rarely needed anyway). • I like to say that Python is the second‐best language for any task. It’s a jack‐of‐all‐trades. If you only need to worry about statistics, or numerical computation, or web parsing, then there are better options out there. But if you need to do all of these things within a single project, then Python is your best bet. Since Data Science is so inherently multidisciplinary, this makes it a perfect fit.