How Do You Build an Automated B2B Lead Generator for Technical Installation Companies?
A 6-stage pipeline architecture for automated B2B lead generation, from geo-targeted scraping to verified contact delivery, built for Dutch technical installation and engineering firms.
Jack van der Vall
16 min read
Summary: An automated B2B lead generation pipeline scrapes companies from Google Maps, deduplicates against historical data, scores B2B fit with an LLM, enriches contacts through a 3-tier waterfall, verifies emails, and delivers CSV exports. Total cost per verified lead: $0.02 to $0.28. The average cost per lead in the construction sector is $227 (Sopro, 2025). For Dutch technical companies, this pipeline replaces manual prospecting at $97.66/hour with sub-$0.30 automation.
Last updated: March 3, 2026 · By Jack van der Vall, AI Engineer at Opusmatic
Related reading: see where AI automation saves time for technical installers, which business processes can be automated responsibly, and our AI systems integration services.
Why do generic lead generation tools fail for technical installation companies?
Most lead generation tools were not built for technical installation companies. Generic B2B databases underrepresent Dutch MKB firms in HVAC, electrical, piping, and construction. According to Techniek Nederland and Wij Techniek (Trendfiles, Q4 2025), there are 9,538 employer-based technical installation companies in the Netherlands, with a broader count of approximately 46,000 including solo entrepreneurs (ZZP). These companies are classified under SBI codes 43.21 (electrical installation), 43.22 (plumbing/HVAC), and 42.21 (pipeline construction).
Global B2B databases like Apollo.io (275M contacts globally) and ZoomInfo (174M+ emails globally) typically achieve only a 40-60% hit rate for direct dials and verified emails in the Dutch SME market. Local providers leveraging direct KvK (Chamber of Commerce) feeds often achieve 80%+ coverage.
The solution: build a pipeline that starts from where these companies actually exist, Google Maps, and works its way through intelligent filtering, enrichment, and verification to deliver qualified, verified contacts.
This guide breaks down the 6-stage architecture of an automated lead generation pipeline, with real cost benchmarks from production deployment across Dutch technical sectors.
Raw data available: B2B Lead Generation Pipeline Benchmarks
How does an automated lead generation pipeline work?
The pipeline operates in six sequential stages. Each stage filters or enriches the data, ensuring you only spend money on leads that matter.
graph TD
A["Geo-Query: e.g. 'HVAC Rotterdam' + 15km"] -->|"Google Maps Scraper"| B["Scraped Companies"]
B -->|"Domain matching"| C{"Seen before?"}
C -->|"Yes"| D["Skip: saves LLM cost"]
C -->|"No"| E["LLM Fit Score 0.0-1.0"]
E -->|"Below threshold"| F["Low fit: Skip"]
E -->|"Above threshold"| G["Waterfall Enrichment"]
G -->|"Tier 1: $0.01"| H{"Email found?"}
H -->|"Yes"| K["Verify Email"]
H -->|"No"| I["Tier 2: $0.03"]
I --> H2{"Email found?"}
H2 -->|"Yes"| K
H2 -->|"No"| J["Tier 3: $0.25"]
J --> K
K -->|"Valid"| L["Deliver Lead"]
K -->|"Invalid"| M["Reject"]
Why this order matters: most tutorials put scoring before deduplication. That is backwards. If 40-60% of scraped companies already exist in your database, you are burning LLM tokens scoring duplicates. Deduplicate first, score second.
Stage 1: How does geo-targeted ingestion work?
The pipeline starts with a geographic search query, just like typing into Google Maps yourself, but automated and at scale.
Defining a campaign
You define a campaign with:
- Search query: a descriptive term (e.g., “Loodgieter”, “Elektricien”, “Piping contractor”)
- Location: city or region center point (e.g., Rotterdam, Europoort)
- Radius: search area in kilometers
The scraper returns structured company data: name, address, website, phone number, business categories, and a unique place identifier.
Why Google Maps?
For technical installation companies in the Netherlands, Google Maps is the richest public data source. These companies may not appear in Apollo, ZoomInfo, or LinkedIn Sales Navigator, but they almost always have a Google Business listing.
| Data Source | Coverage of Dutch MKB Technical Firms | Cost |
|---|---|---|
| Google Maps | Very high | Low (scraping cost) |
| LinkedIn Sales Navigator | Moderate | €80-150/month |
| Global B2B databases (Apollo, ZoomInfo) | Low for niche sectors (40-60% hit rate) | €200-500/month |
| KvK Handelsregister | High (official, 9,538 employer-based firms) | Per-query fees |
The Kamer van Koophandel (KvK) registry is the authoritative source for Dutch company data. Use SBI sector codes to validate scraped companies: 43.21 for electrical installation, 43.22 for plumbing/HVAC, 42.21 for pipeline construction. This adds a verification layer that generic scraping misses.
Stage 2: Why is deduplication the most overlooked step?
This is where most DIY lead generation pipelines fail. Without deduplication, you score the same companies repeatedly, enrich contacts you already have, and annoy prospects with duplicate outreach.
B2B contact data decays rapidly. According to industry benchmarks (Digital Di / PGM Solutions, 2026), the annual data decay rate ranges from 22.5% to 30%, which translates to a monthly decay rate of 2.1% to 3.6% (RevenueBase / Landbase, 2024/2025). Without ongoing deduplication, stale records compound quickly.
Three-layer dedup strategy
graph LR
A["Scraped Company"] --> B{"Domain in history?"}
B -->|"Yes"| C["Skip"]
B -->|"No domain"| D{"Place ID in history?"}
D -->|"Yes"| C
D -->|"No"| E["New company: proceed to scoring"]
F["Enriched Contact"] --> G{"Email in email_history?"}
G -->|"Yes"| H["Skip: already contacted"]
G -->|"No"| I["New contact: proceed"]
| Dedup Layer | Key | Purpose |
|---|---|---|
| Domain-level | Company website URL | Primary, catches most duplicates |
| Place ID fallback | Google Maps identifier | For companies without websites |
| Email-level | Contact email address | Prevents re-contacting across campaigns |
What is the cost impact of deduplication?
If you scrape 500 companies and 45% already exist in your database, deduplication saves you from scoring 225 companies with an LLM. At approximately $0.003 per scoring call (website crawl + LLM inference), that is $0.675 saved per campaign. Over 50 campaigns, dedup alone saves $33.75 in LLM costs, and more importantly, prevents duplicate outreach that damages your reputation.
Deduplication before scoring is not just about cost savings. It is about data hygiene. Every duplicate lead that reaches a salesperson erodes trust in your pipeline. Zero-duplicate guarantees are what separates professional lead generation from spray-and-pray.
Stage 3: How does AI-powered fit scoring filter the right companies?
Not every company on Google Maps is a good prospect. A plumber in Rotterdam might be a sole proprietor doing residential work, not the 20-person B2B installation firm you are targeting. AI scoring separates signal from noise.
Scoring mechanics
- Crawl the company website to extract text content, service descriptions, and team size indicators
- Feed the content to an LLM along with your ideal customer profile (ICP) as context
- Receive a fit score from 0.0 (no fit) to 1.0 (perfect fit)
- Apply a threshold: companies scoring above it proceed to enrichment
What does the LLM evaluate?
The scoring prompt instructs the model to assess:
- Company size indicators: team page, number of locations, fleet size
- B2B vs B2C orientation: does the company serve businesses or consumers?
- Service complexity: simple maintenance vs. complex project work
- Geographic scope: local handyman vs. regional contractor
- Digital maturity: a modern website suggests openness to automation tools
Scoring economics
Because you have already deduplicated, you are only scoring genuinely new companies. This is a critical cost optimization:
| Scenario | Companies Scored | LLM Cost (~$0.003/ea) |
|---|---|---|
| Without dedup | 500 | $1.50 |
| With dedup (45% reduction) | 275 | $0.83 |
| Savings | - | $0.67 per campaign (44%) |
The fit score threshold is a business decision, not a technical one. A lower threshold (e.g., 0.3) casts a wider net but increases enrichment costs. A higher threshold (e.g., 0.6) is more selective but may miss good prospects with poor websites. Start at 0.4 and adjust based on your sales team’s feedback on lead quality.
Stage 4: How does waterfall enrichment reduce costs by 71%?
Enrichment is where you find actual contact information, names, email addresses, job titles, for the companies that passed scoring. This is also the most expensive stage, which is why the waterfall pattern exists.
The waterfall principle
Stop at the first success. Try the cheapest provider first. Only escalate to more expensive providers when the cheaper option fails.
graph TD
A["Approved Company"] --> B["Tier 1: Web Scraping $0.01"]
B --> C{"Found email?"}
C -->|"Yes"| D["Done: total cost $0.01"]
C -->|"No"| E["Tier 2: Aggregated Sources $0.03"]
E --> F{"Found email?"}
F -->|"Yes"| G["Done: total cost $0.04"]
F -->|"No"| H["Tier 3: Decision-Maker Lookup $0.25"]
H --> I{"Found email?"}
I -->|"Yes"| J["Done: total cost $0.29"]
I -->|"No"| K["No contact found"]
Tier breakdown
| Tier | Method | Cost | Timeout | Hit Rate | Best For |
|---|---|---|---|---|---|
| 1 | Direct website scraping | $0.01 | 10 seconds | ~40-50% | Companies with contact pages |
| 2 | Aggregated data sources | $0.03 | 30 seconds | ~25-35% | Companies in business databases |
| 3 | Decision-maker lookup | $0.25 | 30 seconds | ~20-30% | Hard-to-find contacts, specific roles |
Why not just use tier 3 for everything?
Economics. If Tier 1 finds 45% of contacts at $0.01 each, and Tier 2 finds another 30% at $0.03, only 25% of companies ever need the $0.25 lookup. Your blended cost per enriched lead drops to approximately $0.07 instead of $0.25.
For a campaign of 200 approved companies:
| Strategy | Total Cost | Contacts Found |
|---|---|---|
| Tier 3 only | $50.00 | ~180 (90%) |
| Waterfall (1 then 2 then 3) | $14.50 | ~180 (90%) |
| Savings | $35.50 (71%) | Same result |
Compare this to industry benchmarks: the average cost per lead in the construction sector is $227, with multi-channel prospecting averaging $188 per lead (Sopro, 2025). Manual prospecting costs approximately $97.66 per hour (Sailes, 2024). The waterfall pipeline produces verified leads at $0.02 to $0.28, roughly 800 to 11,000 times cheaper than the industry average.
Many Dutch MKB installation companies have simple websites with a “Contact” page listing the owner’s email directly. This makes Tier 1 (web scraping) disproportionately effective for this sector compared to enterprise B2B where contacts are hidden behind forms.
Stage 5: Why is email verification non-negotiable?
An enriched email that bounces is worse than no email. It damages your sender reputation, wastes outreach effort, and can get your domain blacklisted. Verification is the quality gate.
How verification works
The verification service checks each email address against the recipient’s mail server:
| Verification Status | Meaning | Action |
|---|---|---|
| valid | Mailbox exists and accepts mail | Deliver lead |
| accept_all | Server accepts all addresses (catch-all) | Deliver lead (enterprise servers) |
| invalid | Mailbox does not exist | Reject |
| unknown | Server unreachable or inconclusive | Flag for manual review |
The accept-all nuance
Large enterprises often configure “accept-all” (catch-all) mail servers, meaning any address @company.com returns “valid.” This does not confirm the specific person exists. However, for B2B outreach to technical companies, accept-all results are generally reliable because:
- The company domain is verified as active
- The email pattern (info@, contact@, firstname@) follows standard conventions
- Even if the specific address is a catch-all, someone at the company reads that inbox
Verification economics
At approximately $0.005 per verification, checking 150 enriched contacts costs $0.75. A single bounced email to a cold prospect can damage your sender score for weeks. The ROI is clear.
Stage 6: How are leads delivered to your sales team?
The final stage packages verified leads into a format your sales team can actually use.

Delivered fields
| Field | Source | Example |
|---|---|---|
| Company name | Scraping (Stage 1) | “Van der Berg Installatietechniek” |
| Address | Scraping (Stage 1) | “Industrieweg 12, Rotterdam” |
| Website | Scraping (Stage 1) | “www.vdberginstallatie.nl” |
| Contact name | Enrichment (Stage 4) | “Pieter van der Berg” |
| Enrichment (Stage 4) | “p.vanderberg@vdberginstallatie.nl” | |
| Email verified | Verification (Stage 5) | Valid |
| Fit score | Scoring (Stage 3) | 0.72 |
| Enrichment source | Pipeline metadata | Tier 1 (web scraping) |
| Cost per lead | Pipeline metadata | $0.015 |
Delivery formats
- CSV export for import into any CRM (HubSpot, Salesforce, Pipedrive)
- Dashboard view for immediate browsing and outreach status tracking
- API endpoint for direct CRM integration in automated workflows
What makes production pipelines fail, and how do you prevent it?
Building a prototype is the easy part. Running it reliably in production is where complexity compounds.
1. Data decay
B2B contact information degrades at 2.1% to 3.6% per month (RevenueBase / Landbase, 2024/2025), or 22.5% to 30% annually (Digital Di / PGM Solutions, 2026). Your pipeline needs continuous re-verification and a suppression list for stale contacts.
2. Provider reliability
Any single enrichment provider has downtime, rate limits, and coverage gaps. Your pipeline needs graceful fallback, retry logic, and timeout handling. When one provider is down, another needs to take over seamlessly, not crash the entire campaign.
3. AVG/GDPR compliance
Scraping publicly available B2B data is permissible under AVG Article 6(1)(f), legitimate interest, but important nuances apply. The Autoriteit Persoonsgegevens (AP) explicitly states that scraping personal data from the internet is “almost always illegal” under the AVG.
For B2B prospecting, you must satisfy the three-part test for legitimate interest:
- Purpose: the interest is legitimate (not against the law)
- Necessity: the processing is necessary to achieve the goal
- Balancing: the interest of the company outweighs the privacy rights of the individual
Critical distinction for Dutch outreach: the “opt-out” rule applies to legal entities (BV/NV), but the “opt-in” rule applies to natural persons (ZZP/VOF) unless there is an existing customer relationship. In practice, this means:
- Document your legal basis and legitimate interest assessment
- Only collect business contact data (never B2C personal data)
- Offer an opt-out mechanism in your first outreach email
- Maintain a suppression list of opt-outs
- Respond to data subject access requests within 30 days
The AP imposed a €30.5 million fine on Clearview AI in 2024 for building a database through web scraping without a valid legal basis. While that case concerned facial recognition, the ruling reinforced that scraping personal data for database building without lawful grounds is a severe violation.
4. LLM scoring drift
When OpenAI or another provider updates their model, your scoring behavior changes. A company that scored 0.45 last month might score 0.38 today and silently fall below your threshold. Monitor approval rates across campaigns and re-calibrate when they drift more than 10% from baseline.
5. Maintenance overhead
This system has 7+ external service dependencies (scraper, database, LLM, 3 enrichment providers, verifier). Each has its own API changes, pricing updates, and deprecation schedules. Someone needs to maintain this.
The build cost of an automated lead pipeline is 20% of the total cost of ownership. The other 80% is maintenance, monitoring, and optimization over time. This is the reality most tutorials do not mention.
How do the costs compare: build vs. manual vs. managed?
| Approach | Cost per Lead | Time Investment | Maintenance |
|---|---|---|---|
| Manual prospecting | €4-6 (labor at $97.66/hr) | 5-10 hrs/week | None (it is you) |
| Build your own pipeline | $0.02-$0.28 (API costs) | 80-120 hrs to build | 5-10 hrs/month ongoing |
| Industry average (construction) | $227 per lead (Sopro, 2025) | Varies | Vendor-dependent |
| Managed service | Fixed monthly rate | 0 hrs | Included |
The math for a mid-size installer
A mid-size technical installer spending 8 hours/week on manual prospecting at €45/hour administrative cost:
- Monthly manual cost: 8 x 4.3 x €45 = €1,548/month
- Monthly pipeline cost (200 leads): 200 x $0.07 avg = approximately €13/month in API costs
- True cost with hosting, monitoring, and maintenance: approximately €200-400/month
The automation saves €1,100-1,300 per month but requires significant upfront engineering to build and ongoing maintenance to operate.
What are your options going forward?
You have now seen the full architecture. Here is the honest assessment.
Option A: build it yourself
If you have a developer on your team and want full control, this guide gives you the blueprint. You need:
- A backend framework (Python/FastAPI recommended)
- A database (Supabase/Postgres)
- API accounts with a scraper, LLM provider, enrichment services, and email verifier
- 80-120 hours of development time
- Ongoing maintenance commitment
This makes sense if you have in-house engineering talent, you want to customize every aspect, and you are comfortable maintaining 7+ external integrations.
Option B: let Opusmatic handle it
We built this pipeline. We run it in production for Dutch technical companies. Two options:
Integration: we deploy the pipeline connected to your CRM and systems. You own the infrastructure, we build and maintain it. Fixed monthly SLA.
Managed delivery: we run campaigns on your behalf and deliver verified, scored leads weekly or monthly. You focus on closing deals, not building software.
Both options include AVG/GDPR compliance, zero-duplicate guarantees, and Dutch sector-specific targeting (SBI codes, KvK validation).
Frequently Asked Questions
How do I handle AVG/GDPR compliance when scraping B2B lead data in the Netherlands?
B2B prospecting from publicly available sources (Google Maps, company websites, KvK registry) is permissible under AVG Article 6(1)(f), legitimate interest, for legal entities (BV/NV). For natural persons (ZZP/VOF), opt-in is required unless an existing customer relationship exists. You must:
- Document your legal basis and legitimate interest assessment
- Only collect business contact data (not personal/B2C)
- Include an opt-out mechanism in your first outreach
- Maintain a suppression list of opt-outs
- Respond to data subject access requests within 30 days
The Autoriteit Persoonsgegevens enforces these requirements. The AP imposed a €30.5 million fine on Clearview AI in 2024, reinforcing that scraping personal data without lawful grounds is treated as a severe violation. When in doubt, consult a Dutch privacy specialist.
Why does single-provider enrichment only find 60-70% of contacts?
No single data provider has complete coverage. Dutch MKB firms are especially underrepresented in global databases like Apollo.io (275M contacts globally) or ZoomInfo (174M+ emails globally), which typically achieve only a 40-60% hit rate for the Dutch SME market. A waterfall approach pushes combined hit rates to 85-95% while keeping average cost at approximately $0.07 per lead instead of $0.25.
What happens when the AI scoring model gives false positives or inconsistent results?
LLM scoring drift is a real operational concern. Model updates shift score distributions between campaigns. Monitor your approval rate (percentage of companies passing the fit threshold) across campaigns. If it drifts more than 10% from baseline, re-calibrate your prompt or adjust the threshold. Always human-review the first 50-100 leads of a new campaign.
How do I prevent sending duplicate leads across multiple campaigns?
Implement three-layer deduplication:
- Domain-level: check company website against history database
- Place ID fallback: for companies without websites, use the Google Maps identifier
- Email-level: separate table prevents re-contacting the same person across campaigns
This guarantees zero duplicates across the lifetime of your pipeline.
What is the average cost per B2B lead in construction?
According to Sopro’s 2025 B2B benchmarks, the average cost per lead in the construction sector is $227, with a range of $174 to $280. Multi-channel prospecting averages $188 per lead. Manual prospecting costs approximately $97.66 per hour (Sailes, 2024). An automated pipeline brings this down to $0.02 to $0.28 per verified lead.
Key Takeaways
- Automated lead generation follows 6 stages: ingest, dedup, score, enrich, verify, deliver
- Deduplicating before scoring saves 40-60% on LLM costs; most tutorials get this order wrong
- Waterfall enrichment reduces costs by 71% vs. using premium providers for every contact
- Total cost per verified lead: $0.02-$0.28 vs. the industry average of $227 (Sopro, 2025)
- AVG/GDPR compliance under Article 6(1)(f) permits B2B prospecting from public sources, but the opt-in rule applies to ZZP/VOF (natural persons)
- The Netherlands has 9,538 employer-based technical installation firms (Techniek Nederland, Q4 2025), most underrepresented in global B2B databases
- Building is 20% of total cost; maintenance is the other 80%
About the Author
Jack van der Vall is an AI Engineer at Opusmatic, specializing in automated B2B data pipelines and AI automation for technical installation companies and SMEs in South Holland. He builds systems that replace manual prospecting with automated, verifiable contact generation.
Read next
Where exactly does AI automation save time for technical installation companies?
Discover how AI-powered automation helps HVAC, electrical, and plumbing companies streamline backoffice operations, from invoice processing to quote generation and customer service.
How does AI contract analysis reduce business risk?
Discover how AI contract analysis helps technical SMEs reduce failure costs, handle Wkb compliance, and review standard terms like UAV 2012 in minutes.
How do you know if AI usage at your workplace complies with GDPR?
Learn how to audit AI usage at your workplace, block unauthorized models, and work GDPR-compliant with enterprise or local AI.