How We Built an AI Compliance Scanner That Checks Marketing Assets Against EU Regulations in Under 2 Minutes
A production AI platform that scans marketing materials — images, PDFs, packaging — against Dutch and EU health claims regulations. Upload an asset, get a traffic-light compliance verdict with per-rule results, evidence quotes, and fix suggestions. No manual review required for the first pass.
The Challenge
Drowning in Manual Review
A European regulatory body responsible for reviewing marketing claims in the health, nutrition, and medical product space needed to fundamentally rethink how it processes compliance checks. Their reviewers were drowning.
Every marketing asset — social media posts, retail packaging, advertisements, e-commerce images, video scripts — had to be manually checked against a complex web of Dutch and EU health claims regulations, approved ingredient databases, and previously approved materials. A single review could take 2-3 hours. The backlog was growing faster than the team could clear it.
Reviewers manually cross-referencing each marketing claim against multiple regulatory databases, dossiers, and previously approved materials
No automated way to extract text from visual assets — images, packaging mockups, and designed PDFs required manual transcription
Inconsistent reviewer judgments — different reviewers flagging different issues on the same material
Customers waiting days for compliance feedback, delaying product launches and campaign rollouts
Generic AI chatbots failed immediately — could not handle Dutch regulatory language, visual assets, or multi-step compliance logic
No audit trail connecting verdicts back to specific regulatory sources, creating accountability gaps
What We Built
The AI Compliance Scanner
A self-service compliance checking platform where customers upload marketing assets and get an instant, structured regulatory verdict — complete with per-rule pass/fail results, evidence quotes, suggested fixes, and links to the exact regulatory documents that informed each finding.
Compliance Verdict System
All rules pass. Asset is cleared for use. No issues detected against applicable regulations.
Some rules flagged. Needs human reviewer attention. Specific issues identified with fix suggestions.
Critical compliance violations detected. Asset cannot be used as-is. Detailed remediation steps provided.
How It Works
Four Steps, Under Two Minutes
Every asset passes through a structured pipeline. The system automatically adapts which stages run based on the file type and content composition selected.
Upload
Drag & drop images or PDFs, provide product context & choose content type
Smart Check
File fingerprinted — if scanned before, instant results via Replay Mode
Extract
Text & visuals extracted via GPT-4 Vision or Document Intelligence
Analyse
Claims identified, regulatory docs retrieved, 11-step compliance check run
Content-Aware Pipeline
Only Run What You Need
The processing path adapts automatically based on file type and content selection — only the stages needed are executed, optimizing cost and speed.
| Content Type | PDF Conversion | Vision / OCR | Doc Intelligence | Claims Detection | RAG + AI Validation |
|---|---|---|---|---|---|
| Image Only | — | ✓ | — | ✓ | ✓ |
| Image & Text | — | ✓ | — | ✓ | ✓ |
| Text Only | — | — | ✓ | ✓ | ✓ |
| PDF + Image | ✓ | ✓ | — | ✓ | ✓ |
| PDF + Text Only | — | — | ✓ | ✓ | ✓ |
Cost-Efficient by Design
Smart replay mode (SHA-256 fingerprinting) reuses results for identical files within a configurable time window — eliminating redundant AI processing costs and delivering instant repeat results. Combined with content-aware routing that skips unnecessary pipeline stages, the system minimizes Azure OpenAI spend on every scan.
Under the Hood
The Full AI Pipeline
Here is what happens at each stage when a new asset enters the system.
Asset Upload & Classification
User uploads an image, PDF, or text document. Selects the content composition (image only, image + text, or text only), product category (health product, medical device, etc.), and campaign type (advertisement, packaging, social media, retail folder, etc.). This metadata drives the downstream compliance rules that apply.
Smart Replay Check
The file is fingerprinted using SHA-256 hashing. If an identical file has been scanned within the configured replay window, the system returns prior results instantly — no AI processing needed. This eliminates redundant costs when the same asset is submitted multiple times.
Intelligent Text Extraction
For visual assets (images, designed PDFs), GPT-4 Vision reads and extracts all text — including text embedded in product shots, marketing banners, and styled layouts that OCR would miss. For text-only documents, Azure Document Intelligence handles extraction. Multi-page PDFs are converted to page images for visual analysis.
Claim Identification & Segmentation
Extracted text is split into packshot lines and marketing copy. A Dutch ingredient lexicon identifies nutrient and botanical claims. The system distinguishes between health claims, promotional statements, ingredient references, and retailer branding — each governed by different rules.
RAG Retrieval of Regulatory Context
A hybrid semantic + vector search pipeline retrieves relevant regulatory documents from Azure AI Search: approved health claims, on-hold claims, regulatory guidelines, training corrections from past reviews, product dossiers, and previously approved materials. This context grounds every AI decision in actual regulation.
Structured 11-Step Compliance Validation
GPT-4 runs a structured compliance check across 11 rule categories: training overrides, packshot approval, dossier matching, previous approval lookup, new text verification, banned phrase detection, imagery validation, wording deviation checks, ingredient-to-claim linkage, mandatory statement verification, and superlative claim detection. Each rule returns pass/fail with severity, evidence, and fix suggestions.
Full Audit Trail & Results
Every scan stores the complete chain: prompts sent, model parameters, raw AI responses, retrieved regulatory documents, confidence scores, and timestamps. Results display with per-rule verdicts, regulatory source citations, and a one-click "Submit for Internal Review" workflow.
The Difference
Before and After
Before: Manual Review
- ✕2-3 hours per asset, reviewer manually cross-referencing databases
- ✕Inconsistent verdicts across different reviewers
- ✕No structured trail linking findings to regulatory sources
- ✕Customers waiting days for compliance feedback
- ✕Text in visual assets had to be manually transcribed
- ✕No self-service option — every check went through the review queue
- ✕Scaling meant hiring more reviewers
After: AI Compliance Scanner
- Under 2 minutes per asset with structured, repeatable results
- Consistent 11-rule validation applied identically every time
- Every finding cites the specific regulatory document and source
- Customers get instant first-pass compliance feedback on upload
- GPT-4 Vision extracts text from images, packaging, and designed PDFs
- Self-service upload for customers; reviewer dashboard for oversight
- Scaling means processing more assets, not hiring more people
Platform Capabilities
More Than a Chatbot
A production platform with role-based access, a reviewer dashboard, admin configuration, and full bilingual support.
Role-Based Access
Customers upload and view their own scan results. Reviewers and admins see everything, manage settings, and audit any scan across all customers.
Reviewer Dashboard
Filterable, searchable list of all scanned assets with aggregate statistics. 620+ assets tracked with status breakdowns: compliant, review needed, rejected, pending.
Smart Replay
Identical files (matched by SHA-256 fingerprint) automatically reuse prior results within a configurable time window. Saves processing cost and returns results instantly.
Multi-Format Processing
Images, multi-page PDFs, and text documents. Visual assets use GPT-4 Vision; text-heavy documents use Azure Document Intelligence. The system auto-detects and routes.
Regulatory Source Citations
Every finding links to the specific regulatory document that informed it — complete with source URLs, document names, and applicable rule references.
Full Bilingual Support
Complete English and Dutch translations throughout the entire UI. Regulatory rules, campaign types, product categories, and system messages all available in both languages.
Admin Configuration
Settings panel for Azure OpenAI endpoints, AI Search indexes, Document Intelligence, replay windows, citation display, compliance rules, product categories, and user management.
Complete Audit Trail
Every scan stores prompts, model parameters, raw AI responses, retrieved documents, and timestamps. Full traceability from verdict back to source data.
One-Click Workflows
Submit for Internal Review, Copy Full Report, and Upload Another Asset — all accessible directly from the results page. Designed for speed, not clicks.
We went from a process that took hours per asset and still produced inconsistent results, to a system where customers get structured compliance feedback in under two minutes. The citation transparency changed everything — every finding traces back to the exact regulation.
Head of Digital Operations, European Regulatory Body
Tech Stack
Built for Production in Regulated Environments
Built on Azure AI services with a modern React frontend and Supabase backend. Every component chosen for production reliability in a regulated environment.
Frontend
- React
- TypeScript
- Tailwind CSS
Backend
- Supabase
- Edge Functions
- PostgreSQL
AI Services
- Azure OpenAI GPT-4
- GPT-4 Vision
- Custom Embeddings
Search & Retrieval
- Azure AI Search
- Hybrid Semantic
- Vector Retrieval
Doc Processing
- Azure Doc Intelligence
- PDF → Image
- OCR Extraction
Auth & Storage
- Supabase Auth
- Role-Based Access
- Blob Storage
Key Lessons
What We Learned Building Compliance AI
Vision models unlock compliance for visual marketing
The majority of marketing assets in this domain are images, not text documents. Without GPT-4 Vision, the system could not read packaging mockups, social media designs, or retail displays. This one capability is what makes self-service compliance checking possible.
Structured validation beats open-ended analysis
Running a single prompt that says "check this for compliance" produces vague, inconsistent results. Breaking the check into 11 specific rule categories — each with its own pass/fail logic, evidence requirements, and fix suggestions — produces results reviewers actually trust.
RAG context must include institutional memory
Regulations alone are not enough. The system also retrieves previously approved materials, product dossiers, and training corrections from past reviews. This institutional memory is what makes the AI behave like an experienced reviewer, not a rule-reading machine.
Regulatory source citations are the trust mechanism
Without linking every finding back to the specific regulatory document, the system is just another AI opinion. With citations, it is a defensible compliance tool. This single feature drove adoption more than anything else.
The reviewer does not go away — they get elevated
The AI handles the first pass. Human reviewers focus on edge cases, training corrections, and judgment calls. The result is faster throughput and better consistency, not fewer experts.
Building Compliance AI for Your Regulated Industry?
Start with a 2-week assessment. We will evaluate your regulatory landscape, document workflows, and deliver a production roadmap.