Distill.

Your HubSpot is a mess. Native dedup doesn't catch the things that actually matter.

Distill is the demo for an AI-assisted CRM data-cleanup tool — embeddings + blocking + HDBSCAN + a human-review queue + reversible writes. Production-shaped, not production-grade.

Blocking that scales

Domain, phone, name+company predicates. Avoid the O(N²) all-pairs comparison entirely.

Embeddings for the hard cases

text-embedding-3-small + rapidfuzz edit distance. Catches j.doe@acme.com ≡ jane.doe@acme.com that rule-based dedup misses.

Reversible writes

Every merge stores the full pre-merge snapshot. Undo recreates the records via the HubSpot Associations API, surfaces the recreation caveat.

What rule-based dedup misses

  • jane@acme.com jane.doe@acme.com j.doe@acme.com — same person, three records.
  • Acme Inc Acme, Inc. ACME — same company, three records.
  • +1-415-555-0100 4155550100 (415) 555-0100 — same phone, three formats.