Home AI Enterprise AI Architecture: Balancing Centralization and Decentralization in Banking

Enterprise AI Architecture: Balancing Centralization and Decentralization in Banking

by Vamsi Chemitiganti

Large banks face a significant technical challenge: building AI and machine learning infrastructure that serves diverse business needs across the enterprise. From fraud detection systems using traditional ML models to customer service chatbots powered by large language models, the variety of AI applications creates substantial engineering complexity. This blog explores the technical challenges in building unified AI systems for banks and examines practical architectural approaches.

The Technical Diversity Problem

 

Banking AI systems must accommodate diversity across several dimensions:

  1. Model Types and Computational Requirements

Banking applications require a spectrum of AI models:

  • Traditional ML Models: Gradient boosting models (XGBoost, LightGBM) for credit scoring, fraud detection, and risk modeling with moderate computational needs but strict interpretability requirements
  • Deep Learning Models: CNNs, RNNs, and transformers for document processing, customer segmentation, and anomaly detection with higher computational demands
  • Large Language Models: Foundation models with billions of parameters requiring specialized infrastructure for customer service automation and document analysis
  • Time Series Models: ARIMA, Prophet, and ML-based forecasting for market prediction and liquidity management

Each model type needs different hardware acceleration strategies, with traditional ML often CPU-bound while LLMs require multi-GPU clusters or specialized accelerators. The resulting heterogeneous compute environment creates orchestration challenges that resist standardization.

  1. Latency Requirements and Deployment Patterns

Banking applications have different latency requirements:

  • Real-time applications (fraud detection, trading systems) require sub-10ms response times
  • Near-real-time applications (chatbots, recommendation engines) tolerate 100-500ms latencies
  • Batch applications (credit scoring, risk analysis) can process in minutes or hours

This latency spectrum requires different deployment architectures:

  • Edge deployments for ultra-low latency
  • Containerized microservices for standard online inference
  • Batch processing frameworks for high-throughput workloads

No single inference architecture efficiently serves all these patterns.

  1. Data Governance and Regulatory Requirements

Financial services operate under strict regulations:

  • Data residency: Many jurisdictions require customer data to remain within national boundaries
  • Explainability: Credit decisions and risk models must provide interpretable explanations
  • Auditability: Model decision paths must be traceable for regulatory examination
  • Privacy: PII and financial data require special protection

These constraints create partitioning requirements that resist architectural uniformity. Models using PII data may need isolated infrastructure that can’t easily integrate with enterprise-wide platforms.

The Centralization Debate

Banks face pressure to centralize AI platforms for efficiency, creating tension with the diversity of requirements outlined above. The centralization debate has compelling arguments on both sides:

Arguments for Centralization

  1. Resource Efficiency: Consolidated compute infrastructure enables higher utilization rates, reducing hardware costs. JPMorgan Chase saved approximately $20M annually by centralizing ML compute resources across trading desks.
  2. Implementation of Standards: Centralized platforms can enforce standard practices for model validation, monitoring, and compliance. Goldman Sachs’ “Marquee” platform enforces consistent model risk practices across trading algorithms.
  3. Knowledge and Code Reuse: Shared code repositories, model registries, and feature stores facilitate reuse. Bank of America’s feature store allows models to leverage pre-computed features, accelerating model development.
  4. Talent Concentration: Centralized teams can build deeper specialized expertise. Capital One’s Machine Learning Center of Excellence serves as a talent hub, developing innovations that benefit the organization.
  5. Consistent Governance: A unified platform enables consistent implementation of model validation and compliance processes. HSBC’s Model Risk Management framework applies consistent validation standards across AI models.

Arguments for Decentralization

  1. Specialized Requirements: Different banking functions have different technical needs. Citigroup maintains specialized real-time ML infrastructure for trading operations that would be inefficient for other applications.
  2. Agility and Innovation: Business-aligned teams can innovate faster without centralized bottlenecks. Wells Fargo’s distributed innovation labs brought AI-based fraud detection systems to market more quickly than previous centralized efforts.
  3. Reduced Operational Risk: Distributed architectures limit the impact of failures. After a major outage in a centralized modeling system, Deutsche Bank moved to a more federated architecture for critical trading models.
  4. Domain Expertise Integration: AI effectiveness depends on business context. Morgan Stanley’s wealth management AI systems are developed by embedded teams working directly with financial advisors, resulting in higher adoption rates.
  5. Variable Governance Requirements: Regulatory standards differ by function. Retail banking models face different regulatory scrutiny than market-making algorithms, and standardized governance can create inefficiency.

Real-World Hybrid Architectures

In practice, successful banks implement hybrid architectures with strategic centralization and pragmatic decentralization. Here are common patterns:

  1. Federated Infrastructure with Centralized Standards

Banks like JP Morgan Chase maintain a federated model where infrastructure is distributed but governed by centralized standards:

Central Services:

  • Enterprise feature store
  • Model registry
  • Data validation library
  • Monitoring frameworks

Federated Components:

  • Business-unit specific compute environments
  • Domain-specific development environments
  • Specialized real-time serving infrastructure

This provides standardization where beneficial while allowing variation where necessary.

  1. Platform-as-a-Service Model

Goldman Sachs uses an internal AI platform-as-a-service model where centralized teams provide standardized building blocks that business units assemble for specific needs. Their architecture includes:

  • Core ML infrastructure as centralized services
  • Standardized but configurable deployment templates
  • Business-specific model development freedom
  • Unified monitoring and observability

This approach balances standardization with flexibility through platform guardrails.

  1. Tiered Governance Model

Many banks implement tiered governance based on model risk classification:

Tier Risk Level Framework
1 Critical Fully centralized development and validation
2 High Centralized validation with decentralized development
3 Medium Standard guidelines with business unit implementation
4 Low Minimal oversight with standard monitoring

This approach allocates governance resources proportionally to risk while avoiding over-standardization.

  1. Hub-and-Spoke Model for AI Expertise

Banks like HSBC implement hub-and-spoke models for AI talent:

  • Central “hub” provides specialized expertise and technology standards
  • “Spokes” embed AI practitioners directly in business units
  • Regular rotation between hub and spokes ensures knowledge sharing
  • Platform tools developed centrally but implemented locally

This model addresses the need for both centralized excellence and distributed domain expertise.

Technical Implementation Challenges

Even with optimal architectural decisions, significant engineering challenges remain:

  1. Model Deployment Heterogeneity

Different model types require different deployment stacks:

  • Traditional ML models: KFServing or custom Python services
  • Time series models: Specialized streaming computation frameworks
  • Deep learning models: TensorRT, ONNXRuntime or similar optimized runtimes
  • LLMs: Specialized inference engines with techniques like quantization, KV caching

Creating a unified deployment framework across this diversity is challenging. Wells Fargo found that standardizing on a single framework (TensorFlow Serving) for all models created performance bottlenecks for specialized applications.

  1. Hardware Optimization Conflicts

Models have conflicting hardware requirements:

  • LLMs benefit from high GPU memory and interconnect bandwidth
  • Inference-heavy applications prefer accelerators optimized for low precision
  • Training clusters need high-bandwidth networking
  • Some traditional models perform better on CPU architectures

Citigroup found that segregating hardware by workload type improved utilization by 35% compared to a general-purpose pool, despite reduced flexibility.

  1. Software Dependency Management

AI systems rely on complex software stacks with dependencies that frequently conflict:

  • Different frameworks (PyTorch, TensorFlow, JAX) have different CUDA requirements
  • Specialized libraries often have version-specific dependencies
  • LLMs may require custom modified frameworks

Container technologies help but don’t fully resolve the challenge. HSBC reported needing to maintain over 20 distinct container configurations to support their model diversity.

  1. Integration with Legacy Systems

Banks operate critical systems on legacy technology that must integrate with modern AI:

  • COBOL-based core banking systems
  • Mainframe transaction processing
  • Legacy databases without modern APIs

These integration points create architectural complexity that resists standardization. Bank of America’s AI modernization includes middleware specifically to bridge AI systems with legacy infrastructure.

Emerging Best Practices

The most successful banks employ several approaches to navigate these challenges:

  1. Layered Architectural Standards

Rather than forcing uniform standards, effective organizations define different standardization layers:

  • Infrastructure Layer: Standardized compute, storage, and networking
  • Platform Services Layer: Common but configurable MLOps components
  • Model Development Layer: Flexible frameworks with guidance instead of restrictions
  • Application Layer: Business-specific implementations with minimal constraints

This layered approach concentrates standardization where it provides most benefit while allowing necessary variation.

  1. Capability-Based Organization

Instead of centralization vs. decentralization as binary choices, leading banks organize around capabilities:

  • Research and innovation (often centralized)
  • Development efficiency tools (centralized platform, distributed implementation)
  • Production operations (federated with centralized monitoring)
  • Business domain expertise (fully embedded in business units)

This model recognizes that different aspects of AI development benefit from different organizational patterns.

  1. Reference Architectures Rather Than Rigid Standards

The most effective approach often involves creating reference architectures that guide rather than dictate implementation:

  • Standard patterns for common use cases
  • Decision frameworks for architectural choices
  • Clear interfaces between components
  • Documented flexibility boundaries

This approach encourages standardization through influence rather than mandate.

Conclusion

The diversity of AI applications in banking makes a single, universal architecture impractical. The most successful organizations implement hybrid approaches that balance necessary standardization with pragmatic flexibility.

As AI technology evolves—particularly with the advancement of foundation models—maintaining this balance becomes more critical. The banks that succeed will establish architectural governance that can evolve alongside technology, providing enough structure for efficiency without constraining innovation.

The future likely belongs to banks that build “platforms of platforms”—meta-architectures that accommodate different AI subdomains while providing consistent governance, security, and operational excellence across the enterprise.

Featured image: https://www.freepik.com/free-ai-image/futuristic-robot-interacting-with-money_236279970.htm#fromView=search&page=1&position=3&uuid=4840da3f-3856-42c0-b2ff-70fda5cd3e9a&query=ai+in+banking

Discover more at Industry Talks Tech: your one-stop shop for upskilling in different industry segments!

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.