Home Digital Apache NiFi Eases Dataflow Management & Accelerates Time to Analytics In Banking (2/3)..

Apache NiFi Eases Dataflow Management & Accelerates Time to Analytics In Banking (2/3)..

by vamsi_cz5cgo

The previous post did a somewhat comprehensive job of cataloging the issues and problems in financial services with data ingestion & flow management at large scale.  The need for a technology platform to help ingest data @ scale across many kinds of endpoints in an Omni Channel world while adhering to stringent regulations (security, chain of custody and encryption) is a crying need both from an Operators as well as a Data Architect’s perspective. Apache NiFi is a groundbreaking 100% Open Source technology that can be leveraged to provide all of the above capabilities. This post explores NiFi from a technology standpoint. The next (and final) post will examine both the Defensive (Risk, Fraud & Compliance) as well as Offensive (Digital Transformation et al) use cases from a Banking standpoint.

Why is Data Flow Management Such a Massive Challenge in Banking-

Banks need to operate their IT across two distinct prongs – defense and offense. Defensive in areas like Risk, Fraud and Compliance (RFC) ; Offensive as in revenue producing areas of the business like Customer 360 (whether Institutional or Retail), Digital Marketing, Mobile Payments, Omni channel Wealth Management etc. If one really thinks about it – the biggest activity that banks do is manipulate and deal in information whether related to a Customer or Transaction or General Ledger etc.

Thus, Financial Services are a data intensive business. Forward looking Banks, insurance companies and securities firms have already begun to store and process huge amounts of data in Apache Hadoop and they have better insight into both their risks and opportunities.

However, the key challenges with current architectures in ingesting & processing the multi-varied data found in banking –

  1.  A high degree of Data is duplicated from system to system leading to multiple inconsistencies at the summary as well as transaction levels. Because different groups perform different risk reporting functions (e.g Credit and Market Risk) – the feeds, the ingestion, the calculators end up being duplicated as well
  2. Traditional Banking algorithms cannot scale with this explosion of data as well as the heterogeneity inherent in reporting across areas such as Risk management. E.g Certain kinds of Credit Risk need access to around 200 days of historical data where one is looking at the probability of the counter-party defaulting & to obtain a statistical measure of the same. The data inputs required are complex,multi-varied and multi source
  3. Agile and Smooth Data Ingestion is a massive challenge at most Banks. Approaching Ingestion and Dataflow Management as an enterprise rather than an IT concern  includes understanding examples of a) data ingestion from the highest priority of systems b) apply the correct governance rules to the data. c) Applying Stream Processing based Analytics to realize business value at a low temporal latency for many kinds of key use cases

The Hadoop ecosystem has lacked an Open Source alternative in this space. No more with the rapid entry & maturation of Apache NiFi.

Introducing Apache NiFi (Niagara Files)-

Incubated at the NSA (National Security Agency) and later open sourced at the Apache Foundation, Apace NiFi is a platform used to securely collect any and all enterprise data from virtually any source system (batch or realtime or streaming etc) from outside a firewall while ensuring complete tracking of the history and provenance of that data.

Apache NiFi, is the first integrated open source platform that solves the real time complexity and challenges of collecting and transporting data from a multitude of sources be they big or small, fast or slow, always connected or intermittently available. NiFi is a single combined platform which provides the data acquisition, simple event processing, transport and delivery mechanism designed to accommodate the highly diverse and complicated dataflows generated by a world of connected people, systems and things.

From a high level, Apache NiFI can be used in the financial services space in many many scenarios including Client-360,  Fraud, Cybersecurity and AML Compliance etc.

As Banks launch new initiatives in response to business challenges (the typical RFC continuum) or invest in additional capabilities (around Customer Analytics or Digital), a plethora of issues usually show up around data architectures. For instance, Hadoop can now be used to generate insights (at a latency of a few milliseconds) that can assist Banks in detecting fraud as soon as it happens. This usually means rearranging or reordering existing data-flow patterns (from a protocol or message format translation standpoint) as well as ingesting data that was simply uncollected before.

When existing dataflow based architectures cannot scale up to handle new requests, the response always has been one or a combination of the below –

  • Add new systems to mediate between protocols
  • Add new systems to transform or reorder data
  • Add new capabilities to filter the data

Core Capabilities of Apache NiFi – 

NiFI_FS

                  Illustration – NiFi Eases Enterprise Dataflow Management In Banking

NiFi obviates all of the above.

The core value add provided by NiFi is as below –

  1. Provides strong lifecycle management capabilities for Data In Motion. This includes not just data flowing in from traditional sources of data including Customer Account data, Transaction Data, Wire Data, Trade Data, Customer Relationship Management (CRM), General Ledger but also from streaming datasources like Market Data Systems which feed instrument data, Machine generated data – ATMs, POS & systems supporting Digital functions like Social Media, Customer Feedback etc. It is format and protocol agnostic
  2. Helps maintain a strong Chain of Custody capability for this data. Essentially, know and understand how every piece of data originated from, how it was modified and it’s lineage. This is key from both a regulatory and a market lineage perspective
  3. Provides Data and System Administrators an ability to visualize enterprise wide or departmental flows in one place and to be able to modify it was on the fly
  4. Ability to run Stream based Processing for the data as it  is primarily a dataflow system based on the concepts of flow-based programming. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.

Architecture –  

NiFi is built to tackle enterprise data flow problems and as a technology is augmentative & complementary to Hadoop. NiFi provides a web-based user interface for design, control, feedback, and monitoring of dataflows. It is also highly configurable and supports an agile methodology to developing these additive capabilities in large, medium or small projects. Indeed the core advantage of NiFi is it’s ability to shorten the time to deriving analytic value from business projects.

Note that NiFi does not seek to replace existing messaging brokers or data acquisition systems but layers on top of all these and augments their capabilities to add them several dimensions such as quality of service (e.g loss-tolerant versus guaranteed delivery), low latency versus high throughput, and priority-based queuing. Further, 
NiFi also provides fine-grained data provenance for all data received, forked, joined cloned, modified, sent, and ultimately dropped upon reaching its configured end-state.



 NiFi ships with a wide variety of connectors to projects like Kafka, Storm, Sqoop, Flume etc.

The below illustration from Hortonworks (the lead committers to NiFi like with most Hadoop projects) clarifies NiFi’s place in the Data ecosystem.

NiFI_Positioning

                  Illustration – NiFi & Other Data Movement/Processing Technologies

So what are the core features of NiFi from a run time & platform standpoint

 ?

  1. Implementation-The NiFi Graphical Flow Designer provides a visual, code free. UI enables easy development of live process flows and leaves less to code and enables a greater empowered workforce to positive affect change to the enterprise
  2. Architecture [2]-Centralized architecture. NiFi executes within a JVM living within a host operating system. A NiFi cluster is comprised of one or more NiFi Nodes (Node) controlled by a single NiFi Cluster Manager (NCM). The design of clustering is a simple master/slave model where the NCM is the master and the Nodes are the slaves. The NCM’s reason for existence is to keep track of which Nodes are in the cluster, their status, and to replicate requests to modify or observe the flow. Fundamentally, then, the NCM keeps the state of the cluster consistent. While the model is that of master and slave, if the master dies the Nodes are all instructed to continue operating as they were to ensure the data flow remains live. The absence of the NCM simply means new nodes cannot join the cluster and cluster flow changes cannot occur until the NCM is restored.[2]nifi-arch
    Illustration – NiFi Runtime

  3. Security, Governance and Management – NiFi provides coverage from edge through the lake for security governance and management
  4. Bi-directionality – Able to interact with the source of data, sending back actionable responses
  5. An extensible platform with custom processors; where developers can plug in existing code into these custom processors – 90 (and growing) with pre developed actions for ingest, forward etc
  6. Supports a Rich Selection of data formats, protocols, schemes And designed for easy extension to support more and allow organizations to add support for their own proprietary formats/schemas/ and protocols]
  7. Adaptive to data flow conditions (latency, bandwidth) to maintain scalability and reliability
  8. Distributed: NiFi manages the data flow from edge to central data store with end to end security and governance
  9. 

Data Provenance:
- It provides visual graph based tooling for data lineage and traceability
-*Determine chain of custody graphs Governance and compliance end to end -all the way down to the edge
  10. Analysis of data in flight:
-Filter data in motion.
-Analyze and Prioritize
-Process based on predefined rules (modify, re route, delay, etc.

An ideal solution for financial services, NiFi enables simple, fast data acquisition, secure data transport, prioritized data flow and clear traceability of data from the very edge of customer applications and endpoints all the way to the core data center.

The final (and third) post will then examine the use of NiFi around Financial Services usecases frequently discussed in this blog.

References

[1] Hortonworks Acquires Onyara

http://hortonworks.com/hdf/

[2] Apache NiFi

https://nifi.apache.org/

Discover more at Industry Talks Tech: your one-stop shop for upskilling in different industry segments!

You may also like

1 comment

Puja January 9, 2016 - 6:29 pm

You are amazing! Thank you for putting this up.

Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.