Skip to main content

10 docs tagged with "#data"

View All Tags

Data Drift

Data drift refers to the phenomenon where the statistical properties of a dataset used for machine learning or analysis change over time. This alteration can be due to various factors, such as shifts in data collection processes, changes in the underlying distribution of the data, or modifications in the environment from which the data originates. Detecting and addressing data drift is crucial to maintaining the performance and reliability of machine learning models and analytical systems.

Data Engineering with dbt

This is a book about data engineering, with a sprinkle of dbt as well. What it is not is a book on dbt, it most definitely is a book on data engineering. It contains data engineering knowledge and ways of working.

Data Science

What is data science? It is a bunch of different jobs bunched together and given the tie of AI to make a company sound innovative.

Data Science the Hard Parts

This book dives into the difficult aspects of data science. The difficult aspects are business value proposition, communication and measuring impact. These topics are discussed and methods for doing this the right way are presented.

Data Strategy

This book is about strategy, and data is the context in which strategy is discussed. There are some things like the McKinsey data maturity model that are discussed, but the main jist is the strategy. ‘Change is inevitable. … Change is constant.’ This is an important aspect of this entire book.

Database

In the context of business, everything is a database. Databases are the bedrock of how we design things nowadays.

DBT

dbt is an open-source command-line tool that enables data transformation and modeling in a structured and efficient manner. It allows data engineers and analysts to define and manage the data transformation pipeline using SQL queries. With dbt, you can write modular and reusable SQL code, called "models," which define the transformations required to convert raw data into structured and analysis-ready data. These models can be organized, tested, and documented within the dbt framework. dbt leverages the power of SQL and provides a layer of abstraction on top of the data warehouse, making it easier to develop, test, and maintain complex data transformations. It promotes best practices such as version control, testing, and documentation, enabling collaborative and maintainable data modeling workflows. dbt integrates with various data warehouses and can be used in conjunction with other data tools and orchestration platforms to create a robust and reliable data pipeline.

GCP

Firebase and Flutter is a very cool combo. It has great material on how to work with and helpful docs. It makes creating a simple crud app in Flutter super easy.

Getting Started with Streamlit for Data Science

This book is a welcoming introduction to a Python module that has seen rapid growth. It offers a brief overview of the application's capabilities and shows how its user-friendly nature makes it an inclusive tool for both new and experienced data scientists.

Hands-On Unsupervised Learning Using Python

This book is an introduction to unsupervised machine learning techniques and practices. It introduces methods of unsupervised learning for clustering, correlations and time series analysis. It analyses models and provides guidance on how to use them.