Your HubSpot is a mess. Native dedup doesn't catch the things that actually matter.
Distill is the demo for an AI-assisted CRM data-cleanup tool — embeddings + blocking + HDBSCAN + a human-review queue + reversible writes. Production-shaped, not production-grade.
Blocking that scales
Domain, phone, name+company predicates. Avoid the O(N²) all-pairs comparison entirely.
Embeddings for the hard cases
text-embedding-3-small + rapidfuzz edit distance. Catches j.doe@acme.com ≡ jane.doe@acme.com that rule-based dedup misses.
Reversible writes
Every merge stores the full pre-merge snapshot. Undo recreates the records via the HubSpot Associations API, surfaces the recreation caveat.
What rule-based dedup misses
- jane@acme.com ≡ jane.doe@acme.com ≡ j.doe@acme.com — same person, three records.
- Acme Inc ≡ Acme, Inc. ≡ ACME — same company, three records.
- +1-415-555-0100 ≡ 4155550100 ≡ (415) 555-0100 — same phone, three formats.