Skip to main content

2 docs tagged with "data/data-engineering"

View all tags

Data Engineering

Apache Spark is an open-source, distributed computing system that provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. Spark extends the MapReduce model to efficiently cover more types of computations, which include interactive queries and stream processing. One of Spark's key features is its in-memory cluster computing which increases the processing speed of an application.

Data Engineering with dbt

This is a book about data engineering, with a sprinkle of dbt as well. What it is not is a book on dbt, it most definitely is a book on data engineering. It contains data engineering knowledge and ways of working.