Operational Resilience in Financial Services
Operational resilience has emerged as a critical priority for financial regulators and institutions worldwide. Unlike traditional operational risk management — which focuses on preventing and measuring losses — operational resilience focuses on an institution's ability to continue delivering critical services through disruption.
From Operational Risk to Operational Resilience
The shift in perspective is fundamental:
| Operational Risk | Operational Resilience |
|---|---|
| Prevent failures from happening | Assume failures will happen |
| Measure losses after events | Maintain services during events |
| Focus on individual risks | Focus on end-to-end services |
| Backward-looking (loss data) | Forward-looking (scenario testing) |
| Risk transfer (insurance) | Service continuity (redundancy) |
The COVID-19 pandemic and escalating cyber threats demonstrated that even well-managed firms face disruptions. The question is not whether disruption will occur, but how quickly critical services can be restored.
Regulatory Frameworks
UK Framework (PRA/FCA): The UK pioneered operational resilience regulation with PS6/21 and PS15/21, requiring firms to:
- Identify Important Business Services (IBS)
- Set impact tolerances for maximum tolerable disruption
- Map the resources supporting each IBS
- Test ability to remain within impact tolerances through severe but plausible scenarios
- Full compliance deadline: March 2025
EU Framework (DORA): The Digital Operational Resilience Act (DORA) focuses on ICT resilience:
- ICT risk management frameworks
- ICT incident reporting
- Digital operational resilience testing (including threat-led penetration testing)
- Third-party ICT risk management
- Information sharing
US Framework: US regulators (Fed, OCC, FDIC) issued joint guidance emphasizing:
- Critical operations identification
- Governance and risk management
- Scenario testing and business continuity planning
- Third-party dependency management
Building an Operational Resilience Framework
Step 1: Identify Important Business Services
Map the services your institution delivers to external end-users (customers, market participants, counterparties). These are business services, not internal processes. Examples:
- Processing payments
- Settling securities trades
- Providing market liquidity
- Administering deposits and lending
Step 2: Set Impact Tolerances
For each important business service, define the maximum tolerable disruption — the point at which disruption would cause intolerable harm to consumers, market integrity, or financial stability. Impact tolerances are expressed in terms of:
- Time: Maximum duration of service unavailability
- Data: Maximum acceptable data loss or corruption
- Volume: Minimum transaction throughput during disruption
Step 3: Map Dependencies
For each important business service, map all resources required for delivery:
- People — Key personnel and skills
- Technology — Systems, applications, infrastructure
- Data — Critical data stores and flows
- Facilities — Physical locations and equipment
- Third parties — Vendors, cloud providers, market infrastructure
This mapping reveals single points of failure and concentration risks.
Step 4: Scenario Testing
Test the ability to remain within impact tolerances under severe but plausible scenarios:
- Major cyber attack (ransomware, DDoS)
- Cloud provider outage
- Key vendor failure
- Pandemic/workforce unavailability
- Natural disaster affecting data centers
- Regulatory intervention or sanctions event
Step 5: Remediation and Investment
Identify vulnerabilities where impact tolerances would be breached and invest in:
- System redundancy and failover capability
- Alternative processing arrangements
- Enhanced stress testing procedures
- Improved incident management and communication protocols
Third-Party Risk and Concentration
Financial institutions increasingly depend on a small number of critical third parties — particularly cloud service providers (AWS, Azure, Google Cloud). This creates concentration risk that individual firms cannot fully mitigate.
Regulators are responding with:
- Direct oversight powers over critical third parties (EU DORA)
- Multi-cloud and exit strategy requirements
- Enhanced due diligence and contractual protections
- Regular testing of third-party failure scenarios
Connection to ERM
Operational resilience sits within the broader enterprise risk management framework but requires distinct governance:
- Board-level ownership of important business services
- Cross-functional coordination spanning IT, operations, compliance, and business units
- Investment decisions driven by service criticality, not just risk appetite
- Regular reporting on resilience posture and testing outcomes
FRM Exam Perspective
While operational resilience is evolving rapidly, FRM candidates should understand:
- The distinction between operational risk and operational resilience
- Important business service identification and impact tolerance setting
- The role of scenario testing in resilience
- Key regulatory frameworks (UK, EU DORA, US guidance)
- Third-party and concentration risk considerations
- Integration with Basel III operational risk capital requirements