Название: Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS Автор: Joe Minichino Издательство: Wiley Год: 2023 Страниц: 411 Язык: английский Формат: pdf (true) Размер: 11.9 MB
A comprehensive and accessible roadmap to performing data analytics in the AWS cloud.
InData Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS, accomplished software engineer and data architect Joe Minichino delivers an expert blueprint to storing, processing, analyzing data on the Amazon Web Services cloud platform. In the book, you'll explore every relevant aspect of data analytics--from data engineering to analysis, business intelligence, DevOps, and MLOps--as you discover how to integrate Machine Mearning predictions with analytics engines and visualization tools.
A Data Lake is a centralized repository of structured, semi- structured, and unstructured data, upon which you can run insightful analytics. This is my ultra-short version of the definition. While in the past we referred to a data lake strictly as the facility where all of our data was stored, nowadays the definition has extended to include all of the possible data stores that can be linked to the centralized data storage, in a kind of hybrid data lake that comprises flat-file storage, data warehouses, and operational data stores.
There are other solutions for storage in AWS, but aside from one that has some use cases (the EMR File System, or EMRFS), you should rely on S3. Note that EMRFS is actually based on S3, too. The main actors in the realm of analytics in the context of big data and data lakes are undoubtedly S3, Athena, and Kinesis. EMR is useful for data preparation/transformation, and the output is generally data that is made available to Athena and QuickSight. Other tools, like AWS Glue and Lake Formation, are not less important (Glue in particular is vital to the creation and maintenance of an analytics pipeline), but they are not directly generating or performing analytics. MSK is AWS’s fully managed version of Kafka, and we will take a quick look at it, but we will generally favor Kinesis (as it performs a similar role in the stack). Opting for MSK or plain Kafka comes down to cost and performance choices. CloudSearch is a search engine for websites, and therefore is of limited interest to us in this context. In addition, SageMaker can be a nice addition if you want to power your analytics with predictive models or any other Machine Learning/Artificial Intelligence (ML/AI) task.
You'll also find:
Real-world use cases of AWS architectures that demystify the applications of data analytics Accessible introductions to data acquisition, importation, storage, visualization, and reporting Expert insights into serverless data engineering and how to use it to reduce overhead and costs, improve stability, and simplify maintenance A can't-miss for data architects, analysts, engineers and technical professionals,Data Analytics in the AWS Cloud will also earn a place on the bookshelves of business leaders seeking a better understanding of data analytics on the AWS cloud platform.
Скачать Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS