Probabilistic Linkage Using Splink
With Application to Crash Outcomes Data Evaluation System (CODES)

Presenters Stephanie Jansson Divyanka Pawaskar Tara Sangal

Welcome to the workshop! If you're here, the linkage worked.
This workshop covers probabilistic record linkage using Splink, an open-source Python library built for linking records that don't share a unique identifier. We'll work through a hands-on example using CODES (Crash Outcomes Data Evaluation System) data, walking through blocking strategies, model estimation, and how to evaluate your match quality.

Everything you need is on this page. Follow along at your own pace — no installation required.

🎯

Fun Activity

Warm up your record linkage instincts before we dive into the code. This activity gives you a feel for how probabilistic matching works — no Python required.

Open Activity
💻

Live Demo

Click below to launch the interactive workshop notebook in your browser. No installation, no account — just click and run. Works on any device.

Launch Notebook

⏳ First load takes ~1 minute. Keep the tab open once it's running.

📊

Workshop Slides

Follow along with the presentation slides. Posted here during or shortly after the workshop session.

View Slides

If it's not working, pretend it's a linkage problem and bring it to us. That's what we're here for.

Let's Stay Linked — No Blocking Rules Here