9 dokumenter merket med "data-engineering"

Data Drift

Data drift refers to the phenomenon where the statistical properties of a dataset used for machine learning or analysis change over time. This alteration can be due to various factors, such as shifts in data collection processes, changes in the underlying distribution of the data, or modifications in the environment from which the data originates. Detecting and addressing data drift is crucial to maintaining the performance and reliability of machine learning models and analytical systems.

Data Engineering with dbt

This is a book about data engineering, with a sprinkle of dbt as well. What it is not is a book on dbt, it most definitely is a book on data engineering. It contains data engineering knowledge and ways of working.

Data Warehouse

When talking about a data warehouse, data lakehouse or data lake, it mostly refers to having a common place to store, manipulate and apply data.

Data Warehouse

When talking about a data warehouse, data lakehouse or data lake, it mostly refers to having a common place to store, manipulate and apply data.

DBT

dbt is an open-source command-line tool that enables data transformation and modeling in a structured and efficient manner. It allows data engineers and analysts to define and manage the data transformation pipeline using SQL queries. With dbt, you can write modular and reusable SQL code called "models," which define the transformations required to convert raw data into structured and analysis-ready data. These models can be organized, tested, and documented within the dbt framework. dbt leverages the power of SQL and provides a layer of abstraction on top of the data warehouse, making it easier to develop, test, and maintain complex data transformations. It promotes best practices such as version control, testing, and documentation, enabling collaborative and maintainable data modeling workflows. dbt integrates with various data warehouses and can be used in conjunction with other data tools and orchestration platforms to create a robust and reliable data pipeline.

Designing Machine Learning Systems

This book covers the fundamentals of designing machine learning systems. It goes through the entire lifecycle of a machine learning system and then discusses the ecosystem and the challenges and cases that need to be considered.

Feature Engineering

Dummy vs One-hot

Fluent Python

This book is a more advanced book on Python and dives more into the nitty-gritty of the language. It is about a lot of the core functionalities of Python and how they work. A lot of internal things, such as iterators, data objects and methods and functions, are discussed and analysed in detail.

Fundamentals of Data Engineering

This book covers the fundamentals of data engineering and how to solve problems. of data engineering without going to much into detail of the programming. It introduces concepts such as data warehouse and Kafka and data pipelines and ETL.