Skip to main content

7 docs tagged with "data/warehouse"

View all tags

Data Engineering

Apache Spark is an open-source, distributed computing system that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. Spark extends the MapReduce model to efficiently cover more types of computations, which include interactive queries and stream processing. One of Spark's key features is its in-memory cluster computing which increases the processing speed of an application.

Data Warehouse

When talking about a data warehouse, data lakehouse or data lake, it mostly refers to having a common place to store, manipulate and apply data.

Data Warehouse

When talking about a data warehouse, data lakehouse or data lake, it mostly refers to having a common place to store, manipulate and apply data.

DBT

dbt is an open-source command-line tool that enables data transformation and modeling in a structured and efficient manner. It allows data engineers and analysts to define and manage the data transformation pipeline using SQL queries. With dbt, you can write modular and reusable SQL code called "models," which define the transformations required to convert raw data into structured and analysis-ready data. These models can be organized, tested, and documented within the dbt framework. dbt leverages the power of SQL and provides a layer of abstraction on top of the data warehouse, making it easier to develop, test, and maintain complex data transformations. It promotes best practices such as version control, testing, and documentation, enabling collaborative and maintainable data modeling workflows. dbt integrates with various data warehouses and can be used in conjunction with other data tools and orchestration platforms to create a robust and reliable data pipeline.

Fabric

In the fabric section, I write as much as I am able about the Microsoft data warehouse ecosystem and use Microsoft Fabric as an overarching theme for all things considered. I will not cover Databricks or Snowflake on Azure here, as they have their own sections.

Snowflake

Snowflake is a company providing a data warehouse.

Snowflake

Snowflake is a company providing a data warehouse.