How Do You Build an Automated B2B Lead Generator for Technical Installation Companies?

A 6-stage pipeline architecture for automated B2B lead generation, from geo-targeted scraping to verified contact delivery, built for Dutch technical installation and engineering firms.

Jack van der Vall

Jack van der Vall

16 min read

Lees in het Nederlands
Abstract geometric visualization representing automated data pipeline architecture

Summary: An automated B2B lead generation pipeline scrapes companies from Google Maps, deduplicates against historical data, scores B2B fit with an LLM, enriches contacts through a 3-tier waterfall, verifies emails, and delivers CSV exports. Total cost per verified lead: $0.02 to $0.28. The average cost per lead in the construction sector is $227 (Sopro, 2025). For Dutch technical companies, this pipeline replaces manual prospecting at $97.66/hour with sub-$0.30 automation.

Last updated: March 3, 2026 · By Jack van der Vall, AI Engineer at Opusmatic

Related reading: see where AI automation saves time for technical installers, which business processes can be automated responsibly, and our AI systems integration services.

Why do generic lead generation tools fail for technical installation companies?

Most lead generation tools were not built for technical installation companies. Generic B2B databases underrepresent Dutch MKB firms in HVAC, electrical, piping, and construction. According to Techniek Nederland and Wij Techniek (Trendfiles, Q4 2025), there are 9,538 employer-based technical installation companies in the Netherlands, with a broader count of approximately 46,000 including solo entrepreneurs (ZZP). These companies are classified under SBI codes 43.21 (electrical installation), 43.22 (plumbing/HVAC), and 42.21 (pipeline construction).

Global B2B databases like Apollo.io (275M contacts globally) and ZoomInfo (174M+ emails globally) typically achieve only a 40-60% hit rate for direct dials and verified emails in the Dutch SME market. Local providers leveraging direct KvK (Chamber of Commerce) feeds often achieve 80%+ coverage.

The solution: build a pipeline that starts from where these companies actually exist, Google Maps, and works its way through intelligent filtering, enrichment, and verification to deliver qualified, verified contacts.

This guide breaks down the 6-stage architecture of an automated lead generation pipeline, with real cost benchmarks from production deployment across Dutch technical sectors.

Raw data available: B2B Lead Generation Pipeline Benchmarks


How does an automated lead generation pipeline work?

The pipeline operates in six sequential stages. Each stage filters or enriches the data, ensuring you only spend money on leads that matter.

graph TD
    A["Geo-Query: e.g. 'HVAC Rotterdam' + 15km"] -->|"Google Maps Scraper"| B["Scraped Companies"]
    B -->|"Domain matching"| C{"Seen before?"}
    C -->|"Yes"| D["Skip: saves LLM cost"]
    C -->|"No"| E["LLM Fit Score 0.0-1.0"]
    E -->|"Below threshold"| F["Low fit: Skip"]
    E -->|"Above threshold"| G["Waterfall Enrichment"]
    G -->|"Tier 1: $0.01"| H{"Email found?"}
    H -->|"Yes"| K["Verify Email"]
    H -->|"No"| I["Tier 2: $0.03"]
    I --> H2{"Email found?"}
    H2 -->|"Yes"| K
    H2 -->|"No"| J["Tier 3: $0.25"]
    J --> K
    K -->|"Valid"| L["Deliver Lead"]
    K -->|"Invalid"| M["Reject"]

Why this order matters: most tutorials put scoring before deduplication. That is backwards. If 40-60% of scraped companies already exist in your database, you are burning LLM tokens scoring duplicates. Deduplicate first, score second.


Stage 1: How does geo-targeted ingestion work?

The pipeline starts with a geographic search query, just like typing into Google Maps yourself, but automated and at scale.

Defining a campaign

You define a campaign with:

  • Search query: a descriptive term (e.g., “Loodgieter”, “Elektricien”, “Piping contractor”)
  • Location: city or region center point (e.g., Rotterdam, Europoort)
  • Radius: search area in kilometers

The scraper returns structured company data: name, address, website, phone number, business categories, and a unique place identifier.

Why Google Maps?

For technical installation companies in the Netherlands, Google Maps is the richest public data source. These companies may not appear in Apollo, ZoomInfo, or LinkedIn Sales Navigator, but they almost always have a Google Business listing.

Data SourceCoverage of Dutch MKB Technical FirmsCost
Google MapsVery highLow (scraping cost)
LinkedIn Sales NavigatorModerate€80-150/month
Global B2B databases (Apollo, ZoomInfo)Low for niche sectors (40-60% hit rate)€200-500/month
KvK HandelsregisterHigh (official, 9,538 employer-based firms)Per-query fees

The Kamer van Koophandel (KvK) registry is the authoritative source for Dutch company data. Use SBI sector codes to validate scraped companies: 43.21 for electrical installation, 43.22 for plumbing/HVAC, 42.21 for pipeline construction. This adds a verification layer that generic scraping misses.


Stage 2: Why is deduplication the most overlooked step?

This is where most DIY lead generation pipelines fail. Without deduplication, you score the same companies repeatedly, enrich contacts you already have, and annoy prospects with duplicate outreach.

B2B contact data decays rapidly. According to industry benchmarks (Digital Di / PGM Solutions, 2026), the annual data decay rate ranges from 22.5% to 30%, which translates to a monthly decay rate of 2.1% to 3.6% (RevenueBase / Landbase, 2024/2025). Without ongoing deduplication, stale records compound quickly.

Three-layer dedup strategy

graph LR
    A["Scraped Company"] --> B{"Domain in history?"}
    B -->|"Yes"| C["Skip"]
    B -->|"No domain"| D{"Place ID in history?"}
    D -->|"Yes"| C
    D -->|"No"| E["New company: proceed to scoring"]
    F["Enriched Contact"] --> G{"Email in email_history?"}
    G -->|"Yes"| H["Skip: already contacted"]
    G -->|"No"| I["New contact: proceed"]
Dedup LayerKeyPurpose
Domain-levelCompany website URLPrimary, catches most duplicates
Place ID fallbackGoogle Maps identifierFor companies without websites
Email-levelContact email addressPrevents re-contacting across campaigns

What is the cost impact of deduplication?

If you scrape 500 companies and 45% already exist in your database, deduplication saves you from scoring 225 companies with an LLM. At approximately $0.003 per scoring call (website crawl + LLM inference), that is $0.675 saved per campaign. Over 50 campaigns, dedup alone saves $33.75 in LLM costs, and more importantly, prevents duplicate outreach that damages your reputation.

Deduplication before scoring is not just about cost savings. It is about data hygiene. Every duplicate lead that reaches a salesperson erodes trust in your pipeline. Zero-duplicate guarantees are what separates professional lead generation from spray-and-pray.


Stage 3: How does AI-powered fit scoring filter the right companies?

Not every company on Google Maps is a good prospect. A plumber in Rotterdam might be a sole proprietor doing residential work, not the 20-person B2B installation firm you are targeting. AI scoring separates signal from noise.

Scoring mechanics

  1. Crawl the company website to extract text content, service descriptions, and team size indicators
  2. Feed the content to an LLM along with your ideal customer profile (ICP) as context
  3. Receive a fit score from 0.0 (no fit) to 1.0 (perfect fit)
  4. Apply a threshold: companies scoring above it proceed to enrichment

What does the LLM evaluate?

The scoring prompt instructs the model to assess:

  • Company size indicators: team page, number of locations, fleet size
  • B2B vs B2C orientation: does the company serve businesses or consumers?
  • Service complexity: simple maintenance vs. complex project work
  • Geographic scope: local handyman vs. regional contractor
  • Digital maturity: a modern website suggests openness to automation tools

Scoring economics

Because you have already deduplicated, you are only scoring genuinely new companies. This is a critical cost optimization:

ScenarioCompanies ScoredLLM Cost (~$0.003/ea)
Without dedup500$1.50
With dedup (45% reduction)275$0.83
Savings-$0.67 per campaign (44%)

The fit score threshold is a business decision, not a technical one. A lower threshold (e.g., 0.3) casts a wider net but increases enrichment costs. A higher threshold (e.g., 0.6) is more selective but may miss good prospects with poor websites. Start at 0.4 and adjust based on your sales team’s feedback on lead quality.


Stage 4: How does waterfall enrichment reduce costs by 71%?

Enrichment is where you find actual contact information, names, email addresses, job titles, for the companies that passed scoring. This is also the most expensive stage, which is why the waterfall pattern exists.

The waterfall principle

Stop at the first success. Try the cheapest provider first. Only escalate to more expensive providers when the cheaper option fails.

graph TD
    A["Approved Company"] --> B["Tier 1: Web Scraping $0.01"]
    B --> C{"Found email?"}
    C -->|"Yes"| D["Done: total cost $0.01"]
    C -->|"No"| E["Tier 2: Aggregated Sources $0.03"]
    E --> F{"Found email?"}
    F -->|"Yes"| G["Done: total cost $0.04"]
    F -->|"No"| H["Tier 3: Decision-Maker Lookup $0.25"]
    H --> I{"Found email?"}
    I -->|"Yes"| J["Done: total cost $0.29"]
    I -->|"No"| K["No contact found"]

Tier breakdown

TierMethodCostTimeoutHit RateBest For
1Direct website scraping$0.0110 seconds~40-50%Companies with contact pages
2Aggregated data sources$0.0330 seconds~25-35%Companies in business databases
3Decision-maker lookup$0.2530 seconds~20-30%Hard-to-find contacts, specific roles

Why not just use tier 3 for everything?

Economics. If Tier 1 finds 45% of contacts at $0.01 each, and Tier 2 finds another 30% at $0.03, only 25% of companies ever need the $0.25 lookup. Your blended cost per enriched lead drops to approximately $0.07 instead of $0.25.

For a campaign of 200 approved companies:

StrategyTotal CostContacts Found
Tier 3 only$50.00~180 (90%)
Waterfall (1 then 2 then 3)$14.50~180 (90%)
Savings$35.50 (71%)Same result

Compare this to industry benchmarks: the average cost per lead in the construction sector is $227, with multi-channel prospecting averaging $188 per lead (Sopro, 2025). Manual prospecting costs approximately $97.66 per hour (Sailes, 2024). The waterfall pipeline produces verified leads at $0.02 to $0.28, roughly 800 to 11,000 times cheaper than the industry average.

Many Dutch MKB installation companies have simple websites with a “Contact” page listing the owner’s email directly. This makes Tier 1 (web scraping) disproportionately effective for this sector compared to enterprise B2B where contacts are hidden behind forms.


Stage 5: Why is email verification non-negotiable?

An enriched email that bounces is worse than no email. It damages your sender reputation, wastes outreach effort, and can get your domain blacklisted. Verification is the quality gate.

How verification works

The verification service checks each email address against the recipient’s mail server:

Verification StatusMeaningAction
validMailbox exists and accepts mailDeliver lead
accept_allServer accepts all addresses (catch-all)Deliver lead (enterprise servers)
invalidMailbox does not existReject
unknownServer unreachable or inconclusiveFlag for manual review

The accept-all nuance

Large enterprises often configure “accept-all” (catch-all) mail servers, meaning any address @company.com returns “valid.” This does not confirm the specific person exists. However, for B2B outreach to technical companies, accept-all results are generally reliable because:

  1. The company domain is verified as active
  2. The email pattern (info@, contact@, firstname@) follows standard conventions
  3. Even if the specific address is a catch-all, someone at the company reads that inbox

Verification economics

At approximately $0.005 per verification, checking 150 enriched contacts costs $0.75. A single bounced email to a cold prospect can damage your sender score for weeks. The ROI is clear.


Stage 6: How are leads delivered to your sales team?

The final stage packages verified leads into a format your sales team can actually use.

Blurred leads dashboard showing verified delivery output

Delivered fields

FieldSourceExample
Company nameScraping (Stage 1)“Van der Berg Installatietechniek”
AddressScraping (Stage 1)“Industrieweg 12, Rotterdam”
WebsiteScraping (Stage 1)www.vdberginstallatie.nl
Contact nameEnrichment (Stage 4)“Pieter van der Berg”
EmailEnrichment (Stage 4)p.vanderberg@vdberginstallatie.nl
Email verifiedVerification (Stage 5)Valid
Fit scoreScoring (Stage 3)0.72
Enrichment sourcePipeline metadataTier 1 (web scraping)
Cost per leadPipeline metadata$0.015

Delivery formats

  • CSV export for import into any CRM (HubSpot, Salesforce, Pipedrive)
  • Dashboard view for immediate browsing and outreach status tracking
  • API endpoint for direct CRM integration in automated workflows

What makes production pipelines fail, and how do you prevent it?

Building a prototype is the easy part. Running it reliably in production is where complexity compounds.

1. Data decay

B2B contact information degrades at 2.1% to 3.6% per month (RevenueBase / Landbase, 2024/2025), or 22.5% to 30% annually (Digital Di / PGM Solutions, 2026). Your pipeline needs continuous re-verification and a suppression list for stale contacts.

2. Provider reliability

Any single enrichment provider has downtime, rate limits, and coverage gaps. Your pipeline needs graceful fallback, retry logic, and timeout handling. When one provider is down, another needs to take over seamlessly, not crash the entire campaign.

3. AVG/GDPR compliance

Scraping publicly available B2B data is permissible under AVG Article 6(1)(f), legitimate interest, but important nuances apply. The Autoriteit Persoonsgegevens (AP) explicitly states that scraping personal data from the internet is “almost always illegal” under the AVG.

For B2B prospecting, you must satisfy the three-part test for legitimate interest:

  1. Purpose: the interest is legitimate (not against the law)
  2. Necessity: the processing is necessary to achieve the goal
  3. Balancing: the interest of the company outweighs the privacy rights of the individual

Critical distinction for Dutch outreach: the “opt-out” rule applies to legal entities (BV/NV), but the “opt-in” rule applies to natural persons (ZZP/VOF) unless there is an existing customer relationship. In practice, this means:

  • Document your legal basis and legitimate interest assessment
  • Only collect business contact data (never B2C personal data)
  • Offer an opt-out mechanism in your first outreach email
  • Maintain a suppression list of opt-outs
  • Respond to data subject access requests within 30 days

The AP imposed a €30.5 million fine on Clearview AI in 2024 for building a database through web scraping without a valid legal basis. While that case concerned facial recognition, the ruling reinforced that scraping personal data for database building without lawful grounds is a severe violation.

4. LLM scoring drift

When OpenAI or another provider updates their model, your scoring behavior changes. A company that scored 0.45 last month might score 0.38 today and silently fall below your threshold. Monitor approval rates across campaigns and re-calibrate when they drift more than 10% from baseline.

5. Maintenance overhead

This system has 7+ external service dependencies (scraper, database, LLM, 3 enrichment providers, verifier). Each has its own API changes, pricing updates, and deprecation schedules. Someone needs to maintain this.

The build cost of an automated lead pipeline is 20% of the total cost of ownership. The other 80% is maintenance, monitoring, and optimization over time. This is the reality most tutorials do not mention.


How do the costs compare: build vs. manual vs. managed?

ApproachCost per LeadTime InvestmentMaintenance
Manual prospecting€4-6 (labor at $97.66/hr)5-10 hrs/weekNone (it is you)
Build your own pipeline$0.02-$0.28 (API costs)80-120 hrs to build5-10 hrs/month ongoing
Industry average (construction)$227 per lead (Sopro, 2025)VariesVendor-dependent
Managed serviceFixed monthly rate0 hrsIncluded

The math for a mid-size installer

A mid-size technical installer spending 8 hours/week on manual prospecting at €45/hour administrative cost:

  • Monthly manual cost: 8 x 4.3 x €45 = €1,548/month
  • Monthly pipeline cost (200 leads): 200 x $0.07 avg = approximately €13/month in API costs
  • True cost with hosting, monitoring, and maintenance: approximately €200-400/month

The automation saves €1,100-1,300 per month but requires significant upfront engineering to build and ongoing maintenance to operate.


What are your options going forward?

You have now seen the full architecture. Here is the honest assessment.

Option A: build it yourself

If you have a developer on your team and want full control, this guide gives you the blueprint. You need:

  • A backend framework (Python/FastAPI recommended)
  • A database (Supabase/Postgres)
  • API accounts with a scraper, LLM provider, enrichment services, and email verifier
  • 80-120 hours of development time
  • Ongoing maintenance commitment

This makes sense if you have in-house engineering talent, you want to customize every aspect, and you are comfortable maintaining 7+ external integrations.

Option B: let Opusmatic handle it

We built this pipeline. We run it in production for Dutch technical companies. Two options:

Integration: we deploy the pipeline connected to your CRM and systems. You own the infrastructure, we build and maintain it. Fixed monthly SLA.

Managed delivery: we run campaigns on your behalf and deliver verified, scored leads weekly or monthly. You focus on closing deals, not building software.

Both options include AVG/GDPR compliance, zero-duplicate guarantees, and Dutch sector-specific targeting (SBI codes, KvK validation).

Book a discovery call


Frequently Asked Questions

How do I handle AVG/GDPR compliance when scraping B2B lead data in the Netherlands?

B2B prospecting from publicly available sources (Google Maps, company websites, KvK registry) is permissible under AVG Article 6(1)(f), legitimate interest, for legal entities (BV/NV). For natural persons (ZZP/VOF), opt-in is required unless an existing customer relationship exists. You must:

  1. Document your legal basis and legitimate interest assessment
  2. Only collect business contact data (not personal/B2C)
  3. Include an opt-out mechanism in your first outreach
  4. Maintain a suppression list of opt-outs
  5. Respond to data subject access requests within 30 days

The Autoriteit Persoonsgegevens enforces these requirements. The AP imposed a €30.5 million fine on Clearview AI in 2024, reinforcing that scraping personal data without lawful grounds is treated as a severe violation. When in doubt, consult a Dutch privacy specialist.

Why does single-provider enrichment only find 60-70% of contacts?

No single data provider has complete coverage. Dutch MKB firms are especially underrepresented in global databases like Apollo.io (275M contacts globally) or ZoomInfo (174M+ emails globally), which typically achieve only a 40-60% hit rate for the Dutch SME market. A waterfall approach pushes combined hit rates to 85-95% while keeping average cost at approximately $0.07 per lead instead of $0.25.

What happens when the AI scoring model gives false positives or inconsistent results?

LLM scoring drift is a real operational concern. Model updates shift score distributions between campaigns. Monitor your approval rate (percentage of companies passing the fit threshold) across campaigns. If it drifts more than 10% from baseline, re-calibrate your prompt or adjust the threshold. Always human-review the first 50-100 leads of a new campaign.

How do I prevent sending duplicate leads across multiple campaigns?

Implement three-layer deduplication:

  1. Domain-level: check company website against history database
  2. Place ID fallback: for companies without websites, use the Google Maps identifier
  3. Email-level: separate table prevents re-contacting the same person across campaigns

This guarantees zero duplicates across the lifetime of your pipeline.

What is the average cost per B2B lead in construction?

According to Sopro’s 2025 B2B benchmarks, the average cost per lead in the construction sector is $227, with a range of $174 to $280. Multi-channel prospecting averages $188 per lead. Manual prospecting costs approximately $97.66 per hour (Sailes, 2024). An automated pipeline brings this down to $0.02 to $0.28 per verified lead.


Key Takeaways

  • Automated lead generation follows 6 stages: ingest, dedup, score, enrich, verify, deliver
  • Deduplicating before scoring saves 40-60% on LLM costs; most tutorials get this order wrong
  • Waterfall enrichment reduces costs by 71% vs. using premium providers for every contact
  • Total cost per verified lead: $0.02-$0.28 vs. the industry average of $227 (Sopro, 2025)
  • AVG/GDPR compliance under Article 6(1)(f) permits B2B prospecting from public sources, but the opt-in rule applies to ZZP/VOF (natural persons)
  • The Netherlands has 9,538 employer-based technical installation firms (Techniek Nederland, Q4 2025), most underrepresented in global B2B databases
  • Building is 20% of total cost; maintenance is the other 80%

About the Author

Jack van der Vall is an AI Engineer at Opusmatic, specializing in automated B2B data pipelines and AI automation for technical installation companies and SMEs in South Holland. He builds systems that replace manual prospecting with automated, verifiable contact generation.

Opusmatic | LinkedIn | Contact