A Zurich insurance company hired an “AI security firm” in late 2024 to test its claims-processing chatbot. The provider ran a standard web application pentest, submitted a 40-page report full of OWASP Top 10 findings, and never once attempted a prompt injection. The chatbot remained vulnerable to data exfiltration for another six months. With the market for AI security services expanding fast, more and more providers list “AI red teaming” in their portfolio. The real question for Swiss companies is straightforward: how do you tell genuine AI security expertise from relabelled standard services?

This guide provides an independent evaluation framework for selecting an AI red teaming provider in Switzerland.

What AI Red Teaming Is, and What It Is Not

Definition

AI red teaming is the systematic, adversarial testing of AI systems for security vulnerabilities, misbehaviour, and abuse potential. Unlike traditional penetration tests, AI red teaming does not primarily target infrastructure vulnerabilities but rather the specific risks arising from the use of machine learning and particularly Large Language Models.

ServiceFocusAI-Specific?
Traditional penetration testInfrastructure, network, web applicationsNo
AI red teamingLLM security, prompt injection, model behaviourYes
AI auditCompliance, fairness, transparencyPartially
Adversarial ML testingSolidness of ML models against manipulationYes
AI safety assessmentAlignment, unintended behaviourYes

A qualified provider should clearly communicate which of these services they offer and how they differ. Be cautious of providers who bundle everything under the label “AI security” without methodological differentiation.

The Five Decisive Selection Criteria

1. CREST Certification

Why it matters: CREST (Council of Registered Ethical Security Testers) is the international gold standard for security testers. CREST certification is not a marketing label, it requires:

  • Technical examinations of individual testers (not just the company)
  • Regular recertification: competencies are continuously verified
  • Methodological audits of work processes
  • Proof of insurance and NDA commitments
  • Adherence to a binding code of ethics

What to look for:

  • Is the company as a whole CREST-accredited (CREST Member Company)?
  • Do the individual testers hold CREST qualifications (CRT, CCT)?
  • Does the certification also cover AI-specific areas?

Reality check: In Switzerland, there are only a handful of CREST-certified providers. Many companies advertise ISO 27001 or generic security certifications (that is not the same thing). ISO 27001 certifies an information security management system, not penetration testing competence.

2. OWASP LLM Top 10 Expertise

Why it matters: The OWASP (Open Web Application Security Project) LLM Top 10 is the reference framework for LLM security. A provider that does not know or systematically apply this framework cannot deliver a well-founded AI security assessment.

The ten categories:

  1. LLM01: Prompt Injection – Manipulation of model behaviour through crafted inputs
  2. LLM02: Insecure Output Handling: Insufficient validation of model outputs
  3. LLM03: Training Data Poisoning: Manipulation of training data
  4. LLM04: Model Denial of Service: Resource exhaustion through targeted queries
  5. LLM05: Supply Chain Vulnerabilities: Compromised models, libraries, or plugins
  6. LLM06: Sensitive Information Disclosure: Unintended data exfiltration
  7. LLM07: Insecure Plugin Design: Vulnerabilities in third-party extensions
  8. LLM08: Excessive Agency: Overly broad permissions for the model
  9. LLM09: Overreliance: Uncritical adoption of model outputs
  10. LLM10: Model Theft: Extraction of model weights or training data

Ask the provider:

  • How do you structure your tests along the OWASP LLM Top 10?
  • Which of these categories are relevant for our system?
  • Can you document your methodology per category?

Further information on each OWASP category can be found in the Cybersecurity Encyclopedia at cybersecurityswitzerland.com.

3. EU AI Act Compliance Competence

Why it matters: The EU AI Act sets concrete requirements for high-risk AI systems, including security testing. From August 2026, companies must demonstrate that their high-risk AI systems are compliant. An AI red teaming provider that cannot cover this regulatory dimension delivers only half the picture.

What the provider must be able to do:

  • Risk categorisation: Classification of your AI systems into the AI Act’s risk categories
  • Conformity assessment: Testing against the specific requirements for high-risk AI
  • Technical documentation: Support in creating the required technical documentation
  • Governance consulting: Recommendations for AI governance structures that meet the AI Act

Ask the provider:

  • Have you already conducted EU AI Act conformity assessments?
  • Can you support us with the risk categorisation of our AI systems?
  • Does your report also cover regulatory compliance?

4. Methodological Transparency

Why it matters: AI red teaming is a relatively new field. There are no generally accepted standards yet, such as the PTES (Penetration Testing Execution Standard) for traditional penetration tests. This makes it all the more important that the provider transparently presents their methodology.

A reputable provider discloses:

  • Scope definition: How is the test scope determined?
  • Attack taxonomy: Which attack categories are tested?
  • Test scenarios: Which specific scenarios are played through?
  • Tooling: Which tools and frameworks are used?
  • Assessment criteria: How are severity and risk evaluated?
  • Reporting: How are results documented and prioritised?

Warning signs:

  • “Our methodology is proprietary and confidential”. A reputable provider can explain the basics of their methodology without revealing trade secrets
  • No clear distinction between AI-specific and general security testing
  • No defined success criteria or KPIs

5. Industry-Specific Experience

Why it matters: AI risks are context-dependent. An AI chatbot in a retailer’s customer service has a different risk profile than an AI-powered credit scoring system at a bank or a diagnostic AI system in healthcare.

Relevant industry expertise includes:

  • Financial sector: FINMA circulars, banking secrecy, transaction security
  • Healthcare: EPD (Electronic Patient Record), Medical Devices Regulation, patient data protection
  • Pharma: GxP compliance, validation requirements
  • Public administration: Public procurement law, special data protection requirements
  • E-commerce: PCI-DSS, customer data protection, payment security

Ask the provider:

  • Do you have experience in our industry?
  • Are you familiar with the specific regulatory requirements?
  • Can you provide references from comparable projects?

Market Overview: AI Red Teaming in Switzerland)

Current State

The Swiss market for specialised AI security services is still young. Three categories of providers can be distinguished:

1. Specialised AI Security Providers Companies that specialise in AI security and bring deep expertise in LLM security. This category is still small in Switzerland but growing.

2. Traditional Cybersecurity Firms with AI Extension Established penetration testing firms expanding their portfolio to include AI topics. Quality varies widely, from superficial “AI washing” to serious capability development.

3. Consulting Houses with AI Security Practice Large consulting firms (Big Four and similar) offering AI security as part of their cybersecurity consulting. Often strong in the regulatory area but less deep in technical security testing.

RedTeam Partners

RedTeam Partners is a specialised AI security provider focused on offensive security testing. Relevant characteristics:

  • CREST certified (one of the few CREST-accredited providers in Switzerland)
  • OWASP LLM methodology. Structured tests along the OWASP LLM Top 10
  • EU AI Act compliance. Support for regulatory conformity
  • Specialisation. Focus on offensive security and AI, not a general store
  • Proven methodology. Experience with LLM systems of various sizes and architectures

For a detailed analysis of the McKinsey Lilli incident and its implications for AI security, we recommend the article on the RedTeam Partners Blog.

Cost Framework: What AI Red Teaming Costs

Price Overview by Scope

Test ScopePrice RangeTypical DurationSuitable For
Quick assessmentCHF 5,000 – 10,0001–2 daysInitial assessment, single chatbot
Standard red teamCHF 15,000 – 35,0005–10 daysSingle AI application with data connection
thorough red teamCHF 35,000 – 80,00010–20 daysMultiple AI systems, complex integrations
Enterprise assessmentCHF 80,000 – 150,000+20–40 daysEnterprise-wide AI landscape

Price Factors

Costs depend on various factors:

  • Number of AI systems: Each system requires separate test scenarios
  • Integration complexity: Standalone chatbot vs. AI with database access and action capability
  • Data sensitivity: Tests with regulated data require additional precautions
  • Reporting depth: Management summary vs. detailed technical report with proofs of concept
  • Compliance requirements: EU AI Act conformity assessment as additional service
  • Provider certification: CREST-certified providers are generally more expensive but offer demonstrable quality

Cost Comparison with Traditional Pentesting

For comparison: a traditional web application penetration test costs between CHF 5,000 and CHF 60,000 in Switzerland, depending on complexity and scope. Detailed cost comparisons can be found in our guide What Does a Penetration Test Cost in Switzerland? and on cybersecurityswitzerland.ch.

AI red teaming is typically 20–40% more expensive than comparable traditional penetration tests because:

  • The attack surface is less standardised
  • More specialised expertise is required
  • Test scenarios must be more individually designed
  • Result assessment is more complex

Questions to Ask Every Provider

On Qualifications

  1. Is your company CREST-accredited?
  2. What individual certifications do your testers hold?
  3. How many AI red teaming projects have you completed in the last 12 months?
  4. Can you provide references from our industry?
  5. How do you keep your testers up to date regarding new AI attack vectors?

On Methodology

  1. How do you structure your tests along the OWASP LLM Top 10?
  2. What specific prompt injection techniques do you test?
  3. How do you handle indirect prompt injection via documents and data sources?
  4. Do you also test the supply chain (model origin, libraries, plugins)?
  5. How do you assess the severity of vulnerabilities found?

On the Project

  1. How do you define the scope together with us?
  2. How do you ensure that production systems are not impacted?
  3. What happens if you find a critical vulnerability during testing?
  4. What does your reporting format look like? (Ask for a sample report)
  5. Do you offer support in remediating found vulnerabilities?

On Compliance

  1. Can you map test results to EU AI Act requirements?
  2. Do you support technical documentation as required by the AI Act?
  3. How do you account for industry-specific regulations?
  4. Can we use the results for regulatory evidence?
  5. Do you offer re-tests after vulnerability remediation?

Warning Signs: How to Spot Unserious Providers

  • “We test everything”. A reputable provider clearly defines what they can and cannot do
  • No clear methodology. Vague descriptions like “complete AI security review” without details
  • Guarantees. “We guarantee 100% security” is always unserious
  • Extremely low prices. AI red teaming under CHF 5,000 cannot be serious
  • No references. Unwillingness to name reference clients
  • Only automated tools. AI red teaming requires manual expertise, not just scanning tools
  • No clear report. The provider cannot show an example of their reporting format
  • Pressure for quick engagement. Reputable providers give you time for evaluation

Phase 1: Longlist (1 Week)

  • Identify 3–5 potential providers
  • Check CREST certification and publicly available information
  • Review publications and expert contributions by providers

Phase 2: RFI / Initial Meetings (2 Weeks)

  • Send a structured information request
  • Conduct initial meetings with all providers
  • Ask the questions listed above

Phase 3: RFP / Proposal Evaluation (2 Weeks)

  • Request detailed proposals with methodology, timeline, and costs
  • Compare using a weighted evaluation matrix
  • Ask for sample reports

Phase 4: Reference Check (1 Week)

  • Contact at least two reference clients
  • Ask about quality, communication, and practicality of recommendations
  • Inquire about cooperation during and after the project

Phase 5: Contract

  • Ensure clear NDA agreements
  • Contractually define scope, deliverables, and timeline
  • Agree on escalation processes for critical findings

Contract Design: What to Watch For

Essential Contract Components

A contract for AI red teaming should explicitly regulate the following points:

Scope and Boundaries

  • Which systems will be tested?
  • Which test methods are permitted?
  • Which systems and methods are explicitly excluded?
  • At what times will tests take place?

Data Protection and Confidentiality

  • NDA with clearly defined confidentiality levels
  • Rules for handling sensitive data discovered during testing
  • Data retention and deletion after project completion
  • Compliance with nDSG and, where applicable, GDPR

Deliverables and Reporting

  • Scope and format of the final report
  • Management summary and technical detail report
  • Severity classification of findings (e.g., per CVSS)
  • Recommendations with prioritisation and effort estimation
  • Presentation of results to management and technical teams

Escalation and Emergency Processes

  • How are critical findings handled during testing?
  • Who is informed within what timeframe?
  • May the tester escalate a third-party active attack discovered during testing?

Re-Tests and Follow-Up

  • Are re-tests after vulnerability remediation included in the price?
  • Within what timeframe can re-tests be conducted?
  • Is there remediation support available?

Typical Contract Pitfalls

  • Too narrow scope: If only the “frontend” of an AI system is tested, API and backend vulnerabilities remain undiscovered
  • No re-test clause: Without a re-test clause, you pay double if you want to verify remediation
  • Unclear IP rights: Who owns the exploits and tools developed during testing?
  • Missing liability provisions: What happens if the test unintentionally impacts production systems?

Further Resources

Choosing Your Provider

Selecting an AI red teaming provider is a strategic decision. In a young market where quality differences are large, a systematic evaluation pays off. The five core criteria, namely CREST certification, OWASP LLM expertise, EU AI Act competence, methodological transparency, and industry experience. These provide a solid evaluation framework.

Invest a little more time in the selection and pay an appropriate price for demonstrable quality. The alternative (a superficial test that creates a false sense of security) is the more expensive option in the long run.

Last updated: March 2026. This guide is regularly reviewed and updated. Alpine Excellence is an independent editorial platform and receives no compensation for provider recommendations.