In Phase 2, I thought I had the Data Redacted project ready to roll. I had my Docker containers, my Llama 3 mapping, and a list of 7,300 brokers. But as soon as I tried to move from a test environment to actual execution, the Senior Architect in me realized I had a massive gap in my logic.
The “numbers game” of 7,300 targets was a distraction. This week was about a hard refactor and a reality check on how data brokers actually hide their “Delete” buttons.
The Refactor: Finding the Tags
The original plan was to let the AI semantically map form fields. It failed. The AI couldn’t distinguish between a “First Name” box and a hidden tracking token or a mandatory security field.
I had to stop the “AI magic” and go back to Tag-Based Extraction. I spent the week rewriting the logic to target specific HTML attributes—IDs, Names, and Class tags. Without this, the engine is just guessing. With it, the automation actually sticks. This was the prerequisite for the next step: sourcing the targets that matter.
Sourcing the “Silent 10”
I didn’t need to automate 7,300 brokers today. I needed to automate the ones that are actually “silent.”
By merging the California, Texas, and Oregon registries, I isolated 10 national hubs. I call them the “Silent 10” because they’ve completely removed the easy, “loud” path of email opt-outs. They force you into proprietary webforms.
Sourcing these 10 wasn’t just a deduplication task; it was a way to find the targets that require the Playwright orchestration I’ve been building. If the engine can handle these 10, the other 7,000 are just noise.
The “Boss Fights”: Interstitials and 404s
Actually running against the Silent 10 revealed the real-world friction:
- Cookie Walls (Acxiom): You can’t just fill the form; you have to script the “Dismiss” click on a massive modal first.
- Dead Links (Epsilon): They move their URLs constantly. My system now detects a 404 and automatically pulls a secondary link from the other state registries.
- Multi-Step Forms (Oracle): Some of these targets require navigating three pages of “info” before you ever see an input field.
The Project Management Pivot
The project is no longer about “mass deletion.” It’s about Asymmetric Precision.
I’m focusing the engineering on these 10 high-fidelity targets because they represent the core of the data economy. This isn’t a bottleneck; it’s a scope. We are building the engine to handle the hardest targets first, creating a Sovereign Compliance Guardrail that actually works in the messy, real-world web.