Certified Dogs and Cats!
Toronto's pet licensing data reveals an unexpected pattern: extreme breed concentration among cats (73.9% Domestic Shorthair) versus remarkable diversity among dogs (300+ breeds, top breed at 9.6%).
This concentration gap - 3 in 4 cats versus fewer than 1 in 10 dogs - reveals fundamentally different ownership patterns. Over 300 dog breeds compete across the city's neighborhoods, while cat ownership converges on a single dominant breed.
This dashboard surfaces these patterns from 175,000+ licensing records, updated daily through automated pipelines processing Toronto's Open Data. The entire stack—ingestion, transformation, quality checks, and serving—runs for under $10/month while maintaining 99.5%+ data accuracy.
🔧 How it works (click to expand)
Every morning at 10 AM, a fully automated pipeline wakes up and does this:
Ingestion (Raw → Bronze)
A Python script connects to Toronto's Open Data CKAN API, downloads the latest pet licensing records,
and uploads them to AWS S3 as dated snapshots (raw/2025-11-02/).
This creates a permanent audit trail: I can replay any historical load if needed.
The bronze layer performs basic validation: rejecting records with invalid animal types (must be DOG or CAT), duplicate IDs, null primary keys, or malformed postal codes. Clean records are partitioned by registration year and written to Delta tables. All operations are idempotent: rerunning the same date produces identical output.
Transformation (Silver → Gold)
The silver layer standardizes messy breed names using fuzzy matching against a reference list. "lab," "Labrador Retriever," and "LABRADOR RETRIEVER (YELLOW)" all become "Labrador Retriever." This normalization handles 300+ breed variants and achieves 99.5%+ mapping accuracy.
The gold layer creates a curated source view joining all key attributes, then builds focused analytical views:
totals_by_year_type– yearly registration countsbreed_stats– breed frequencies and rankingstop3_breed_by_fsa– geographic breed patternsdaily_stats– ingestion volume for freshness monitoringgold_health– data quality metrics (mapped %, null rates)
Serving
Databricks runs a daily job that writes compact CSV and JSON from gold views to S3. Every Monday, a GitHub Actions workflow picks up the new files from S3, commits any changes, and triggers a Netlify deploy. The dashboard updates every Monday with pipeline-fresh data, served as static assets for sub-second load times.
Orchestration & Cost
Databricks Workflows runs five notebooks sequentially every morning:
ingest → bronze → silver → gold → export .
Each stage depends on the previous one's success. The entire stack (S3 storage + Databricks compute +
Netlify hosting) costs under $10/month thanks to single-node
spot clusters, auto-termination, and static serving.
Every commit, every transformation and every quality check is traceable from API to chart.
At a glance
Last 7 days of new record ingestions per animal type from daily_stats. Higher points mean busier intake days.
Key Findings
Cat breed concentration is extreme
73.9% of Toronto cats are Domestic Shorthair. The top breed dominates so heavily that "Other" barely registers. Cat owners overwhelmingly choose one breed.
Dog diversity is striking
The most popular dog breed (Labrador Retriever) represents just 9.6% of registrations. Dogs show massive breed diversity, over 300 breeds were represented, with a long tail of rare breeds.
Pet registrations dropped sharply in 2025
Both cats (-18.4%) and dogs (-18.8%) saw nearly identical declines. This could signal post-pandemic pet returns, economic pressures, or changes in licensing enforcement.
Cat breed rankings are volatile, dog breeds are stable
The top 10 cat breeds shifted significantly between 2023-2025, with multiple breeds swapping positions. Dog breed rankings remained remarkably stable. This likely reflects sampling bias as the top 3 cat breeds dominate 80%+ of registrations, leaving fewer breeds competing for remaining spots.
Note: Geographic patterns show no clear urban/suburban split for breed size preferences. Both small and large dog breeds are distributed relatively evenly across Toronto's neighborhoods.
Yearly scale and mix
Breed composition
Breed rank and momentum
Rank vs share
Each dot is a breed of licensed pets in 2025. X is the breed's popularity rank, Y is the breed's share of the animal type, dot size scales with count.
Excludes Domestic Shorthair for readability. Full cat composition shown above in donut.
Health and pipeline quality
Data Quality: 99.5%+ Accuracy
Out of 175,000+ pets, 99.5%+ successfully mapped to canonical breed names and valid postal codes.
What about the other 0.5%? Rare or misspelled breeds (e.g., "Pomski" vs "Pomeranian-Husky Mix"), custom names or invalid addresses that don't match reference data. These unmapped records are flagged and quarantined so that they don't pollute the charts above.
Choropleths by FSA ℹ️
These maps show licensed pets by Forward Sortation Area using the latest gold view.
Hover to see totals and top breeds per FSA.
Boundaries come from Statistics Canada FSA shapes filtered to Toronto.
Top 3 breeds by FSA (preview)
| Year | Type | FSA | Top 1 | Top 2 | Top 3 |
|---|
Source files: totals_by_year_type.csv, breed_stats.csv, breed_share_citywide_all_years.csv, daily_stats.csv, top3_breed_by_fsa.csv, gold_health.csv
What's Next?
This pipeline has been running reliably since October 2025, but there's always room to improve:
- → Email alerts when data quality drops below thresholds
- → Predictive model to forecast breed trends (will Corgis overtake Chihuahuas?)
- → Real-time updates (currently daily batch processing)
- → Integration with Animal Services data (lost/found pets)