Название: Practical Lakehouse Architecture: Designing and Implementing Modern Data Platforms at Scale (Final Release)
Автор: Gaurav Ashok Thalpati
Издательство: O’Reilly Media, Inc.
Год: 2024
Страниц: 351
Язык: английский
Формат: epub
Размер: 10.1 MB
This concise yet comprehensive guide explains how to adopt a data lakehouse architecture to implement modern data platforms. It reviews the design considerations, challenges, and best practices for implementing a lakehouse and provides key insights into the ways that using a lakehouse can impact your data platform, from managing structured and unstructured data and supporting BI and AI/ML use cases to enabling more rigorous data governance and security measures. Lakehouse architecture is one such modern architectural pattern that has evolved in the last few years. It has become a popular choice for data architects who are designing data platforms. In the Chapter 1, I’ll introduce you to fundamental concepts related to data architecture, data platform and its core components, and how data architecture helps build a data platform. Once you have understood these, I’ll explain why there is a need for new architectural patterns like lakehouse, lakehouse fundamentals, its characteristics, and the benefits of implementing a data platform using lakehouse architecture. I’ll conclude the chapter with key takeaways to summarize everything we discuss and help you remember the key points while reading the subsequent chapters in this book. This book is for all data practitioners who handle large volumes of data and are responsible for designing and implementing modern data platforms. This book is a comprehensive guide for data architects and can help them understand key considerations, establish design principles, and make critical decisions when implementing a data platform. For data engineers, this book will help them understand key concepts like open table formats, schema evolution, and time travel, which they can leverage when implementing data pipelines. Other data personas, like data analysts and data scientists, will learn about crucial topics like lakehouse data management, data discovery, access control, and sensitive data handling.
Автор: Gaurav Ashok Thalpati
Издательство: O’Reilly Media, Inc.
Год: 2024
Страниц: 351
Язык: английский
Формат: epub
Размер: 10.1 MB
This concise yet comprehensive guide explains how to adopt a data lakehouse architecture to implement modern data platforms. It reviews the design considerations, challenges, and best practices for implementing a lakehouse and provides key insights into the ways that using a lakehouse can impact your data platform, from managing structured and unstructured data and supporting BI and AI/ML use cases to enabling more rigorous data governance and security measures. Lakehouse architecture is one such modern architectural pattern that has evolved in the last few years. It has become a popular choice for data architects who are designing data platforms. In the Chapter 1, I’ll introduce you to fundamental concepts related to data architecture, data platform and its core components, and how data architecture helps build a data platform. Once you have understood these, I’ll explain why there is a need for new architectural patterns like lakehouse, lakehouse fundamentals, its characteristics, and the benefits of implementing a data platform using lakehouse architecture. I’ll conclude the chapter with key takeaways to summarize everything we discuss and help you remember the key points while reading the subsequent chapters in this book. This book is for all data practitioners who handle large volumes of data and are responsible for designing and implementing modern data platforms. This book is a comprehensive guide for data architects and can help them understand key considerations, establish design principles, and make critical decisions when implementing a data platform. For data engineers, this book will help them understand key concepts like open table formats, schema evolution, and time travel, which they can leverage when implementing data pipelines. Other data personas, like data analysts and data scientists, will learn about crucial topics like lakehouse data management, data discovery, access control, and sensitive data handling.