dbt on DuckDB — from raw tables to mart models

In the previous post I described how to load Parquet exports into a local DuckDB database — a fast, free, file-based data warehouse you can have running in an afternoon. The raw tables are queryable straight away. But to turn them into something reliable, documented, and ready for an analytics... [Read More]
Tags: DuckDB, dbt, SQL, data modeling, data engineering, staging, marts

A personal data warehouse — free, fast, and local

Many business systems don’t offer a live database connection. They export data periodically — one Parquet file per table, dropped into a folder. That works fine for a one-off look. It becomes a problem the moment you want to join tables, apply consistent transformations, or connect the data to an... [Read More]
Tags: DuckDB, dbt, data warehouse, Parquet, SQL, data engineering

Geocoding made easy

It happens from time to time that I’m using datasets which include physical addesses that need to be turned into longitude/latitude data. For example, when working data of mass shootings in the US the Gun Violence Archive GVA provides physical addresses only. [Read More]
Tags: R, tidygeocoder, geocoding, geo, map, visualization, viz

PFAS Map

I came across this topic in a recent LinkedIn post by investigative journalist Daniel Drepper. An international research network investigated the spread of PFAS (per- and polyfluoroalkyl substances) to unveil the scale of pollution. This group of chemicals are linked to various deseases such as cancer and infertility. As a... [Read More]
Tags: R, Shiny, map, visual analytics, visualization, viz, PFAS, journalism, interactive, exploration

Waffle charts in R

Displaying proportional data, i.e., subsets of data that contribute to a whole, can be done in various ways. However, two particular suitable types of visualisations are isotypes and waffle charts. [Read More]
Tags: R, ggplot, visual analytics, visualization, viz, waffle, isotype, chart, CoViD-19