Open Enterprise Hadoop – as secure as Fort Knox

Previous posts in this blog have discussed customers leveraging Open Source, Big Data and Hadoop related technologies for a range of use cases across industry verticals. We have seen how a Hadoop-powered “Data Lake” can not only provide a solid foundation for a new generation of applications that provide analytics and insight, but can also increase the number of access points to an organization’s data. As diverse types of both external and internal enterprise data are ingested into a central repository, the inherent security risks must be understood and addressed by a host of actors in the architecture. Security is thus highly essential for organizations that store and process sensitive data in the Hadoop ecosystem. Many organizations must adhere to strict corporate security polices as well as rigorous industry guidelines. So how does open source Hadoop stack upto demanding standards such as PCI-DSS? 

We have from time to time, noted the ongoing digital transformation across industry verticals. For instance, banking organizations are building digital platforms that aim to engage customers, partners and employees. Retailers & Banks now recognize that the key to win the customer of the day is to offer a seamless experience across multiple channels of engagement. Healthcare providers want to offer their stakeholders – patients, doctors,nurses, suppliers etc with multiple avenues to access contextual data and services; the IoT (Internet of Things) domain is abuzz with the possibilities of Connected Car technology.

The aim of this blogpost is to disabuse those notions which float around from time to time where a Hadoop led 100% open source ecosystem is cast as being somehow insecure or unable to fit well into a corporate security model. It is only to dispel such notions about open source, the Open Source Alliance has noted well that – “Open source enables anyone to examine software for security flaws. The continuous and broad peer-review enabled by publicly available source code improves security through the identification and elimination of defects that might otherwise be missed. Gartner for example, recommends the open source Apache Web server as a more secure alternative to closed source Internet Information servers. The availability of source code also facilitates in-depth security reviews and audits by government customers.” [2]

It is a well understood fact that data is the most important asset a business possess and one that nefarious actors are usually after. Let us consider the retail industry- cardholder data such as card numbers or PAN (Primary Account Numbers) & other authentication data is much sought after by the criminal population.

The consequences of a data breach are myriad & severe and can include –

  • Revenue losses
  • Reputational losses
  • Regulatory sanction and fines etc

Previous blogposts have chronicled cybersecurity in some depth. Please refer to this post as a starting point for a somewhat exhaustive view of cybersecurity. This awareness has led to an increased adoption in risk based security frameworks. E.g ISO 27001, the US National Institute of Standards and Technology (NIST) Cybersecurity Framework and SANS Critical Controls. These frameworks offer a common vocabulary, a set of guidelines that enable enterprises to  identify and prioritize threats, quickly detect and mitigate risks and understand security gaps.

In the realm of payment card data – regulators,payment networks & issuer banks themselves recognize this and have enacted compliance standard – the PCI DSS (Personal Cardholder Information – Data Security Standards). PCI is currently in its third generation incarnation or v3.0 which was introduced over the course of 2014. It is the most important standard for a host of actors –  merchants, processors, payment service providers or really any entity that stores or uses payment card data. It is also important to note that the core process of compliance all applications and systems in a merchant or a payment service provider.

The  PCI standards council recommends the following 12 components for PCI-DSS as depicted in the below table.

PCI_DSS_12_requirements_grande

Illustration: PCI Data Security Standard – high level overview (source: shopify.com)

While PCI covers a whole range of areas that touch payment data such as POS terminals, payment card readers, in store networks etc – data security is front & center.

It is to be noted though that according to the Data Security Standards body who oversee the creation & guidance around the PCI , a technology vendor or product cannot be declared as being cannot “PCI Compliant.”

Thus, the standard has wide implications on two different dimensions –

1. The technology itself as it is incorporated at a merchant as well as

2. The organizational culture around information security policies.

My experience in working at both Hortonworks & Red Hat has shown me that open source is certified at hundreds of enterprise customers running demanding workloads in verticals such as financial services, retail, insurance, telecommunications & healthcare. The other important point to note is that these customers are all PCI, HIPPA and SOX compliant across the board.

It is a total misconception that off the shelf and proprietary point solutions are needed to provide broad coverage across the above pillars. Open enterprise Hadoop offers comprehensive and well rounded implementations across all of the five areas and what more it is 100% open source. 

Let us examine how security in Hadoop works.

The Security Model for Open Enterprise Hadoop – 

The Hadoop community has thus adopted both a top down as well as bottom up approach when looking at security as well as examining at all potential access patterns and across all components of the platform.

Hadoop and Big Data security needs to be considered across the below two prongs – 

  1. What do the individual projects themselves need to support to guarantee that business architectures built using them are highly robust from a security standpoint? 
  2. What are the essential pillars of security that the platform which makes up every enterprise cluster needs to support? 

Let us consider the first. The Apache Hadoop project contains 25+ technologies in the realm of data ingestion, processing & consumption. While anything beyond a cursory look is out of scope here, an exhaustive list of the security hooks provided into each of the major projects are covered here [1].

For instance, Apache Ranger manages fine-grained access control through a rich user interface that ensures consistent policy administration across Hadoop data access components. Security administrators have the flexibility to define security policies for a database, table and column, or a file, and can administer permissions for specific LDAP-based groups or individual users. Rules based on dynamic conditions such as time or geolocation, can also be added to an existing policy rule. The Ranger authorization model is highly pluggable and can be easily extended to any data source using a service-based definition.[1]

Administrators can use Ranger to define a centralized security policy for the following Hadoop components and the list is constantly enhanced:

  • HDFS
  • YARN
  • Hive
  • HBase
  • Storm
  • Knox
  • Solr
  • Kafka

Ranger works with standard authorization APIs in each Hadoop component, and is able to enforce centrally administered policies for any method used to access the data lake.[1]

 Now the second & more important question from an overall platform perspective. 

There are five essential pillars from a security standpoint that address critical needs that security administrators place on data residing in a data lake. If any of these pillars is vulnerable from an implementation standpoint, it ends up creating risk built into organization’s Big Data environment. Any Big Data security strategy must address all five pillars, with a consistent implementation approach to ensure their effectiveness.

Security_Pillars

                             Illustration: The Essential Components of Data Security

  1. Authentication – does the user possess appropriate credentials? This is implemented via the Kerberos authentication protocol & allied concepts such as Principals, Realms & KDC’s (Key Distribution Centers).
  2. Authorization – what resources is the user allowed to access based on business need & credentials?  Implemented in each Hadoop project & integrated with an organizations LDAP/AD/.
  3. Perimeter Security – prevents unauthorized outside access to the cluster. Implemented via Apache Knox Gateway which extends the reach of Hadoop services to users outside of a Hadoop cluster. Knox also simplifies Hadoop security for users who access the cluster data and execute jobs.
  4. Centralized Auditing  – implemented via Apache Atlas and it’s integration with Apache Ranger.
  5. Security Administration – deals with the central setup & control all security information using a central console.  uses Apache Ranger to provide centralized security administration and management. The Ranger Administration Portal is the central interface for security administration. Users can create and update policies, which are then stored in a policy database.

ranger_centralized_admin

                                           Illustration: Centralized Security Administration

It is also to be noted that as Hadoop adoption grows at an incremental pace, workloads that harness data for complex business analytics and decision-making may need more robust data-centric protection (namely data masking, encryption, tokenization). Thus, in addition to the above Hadoop projects as Apache Ranger, enterprises can essentially take an augmentative approach.  Partner solutions that offer data centric protection for Hadoop data such as Dataguise DgSecure for Hadoop which clearly complement an enterprise ready Hadoop distribution (such as those from the open source leader Hortonworks) are definitely worth a close look.

Summary

While implementing Big Data architectures in support of business needs, security administrators should look to address coverage for components across each of the above areas as they design the infrastructure. A rigorous & bottom-up approach to data security makes it possible to enforce and manage security across the stack through a central point of administration, which will likely prevent any potential security gaps and inconsistencies. This approach is especially important for newer technology like Hadoop where exciting new projects & data processing engines are always being incubated at a rapid clip. After all, the data lake is all about building a robust & highly secure platform on which data engines – Storm,Spark etc and processing frameworks like Mapreduce function to create business magic. 

 References – 

[1] Hortonworks Data Security Guide
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_Security_Guide/

[2] Open Source Alliance of America
http://opensourceforamerica.org/learn-more/benefits-of-open-source-software/

Why Software Defined Infrastructure & why now..(1/2)

The ongoing digital transformation in key verticals like financial services, manufacturing, healthcare and telco has incumbent enterprises fending off a host of new market entrants. Enterprise IT’s best answer is to increase the pace of innovation as a way of driving increased differentiation in business processes. Though data analytics & automation remain the lynchpin of this approach – software defined infrastructure (SDI) built on the notions of cloud computing has emerged as the main infrastructure differentiator & that for a host of reasons which we will discuss in this two part blog.

Software Defined Infrastructure (SDI) is essentially an idea that brings together  advances in a host of complementary areas spanning both infrastructure software, data as well as development environments. It supports a new way of building business applications. The core idea in SDI is that massively scalable applications (in support of diverse customer needs) describe their behavior characteristics (via configuration & APIs) to underlying datacenter infrastructure which simply obeys those commands in an automated fashion while abstracting away the underlying complexities.

SDI as an architectural pattern was originally made popular by the web scale giants – the so-called FANG companies of tech — Facebook , Amazon , Netflix and Alphabet (the erstwhile Google) but has begun making it’s way into the enterprise world gradually.

Common Business IT Challenges prior to SDI – 
  1. Cost of hardware infrastructure is typically growing at a high percentage every year as compared to  growth in the total  IT budget. Cost pressures are driving an overall re look at the different tiers across the IT landscape.
  2. Infrastructure is not completely under the control of the IT-Application development teams as yet.  Business realities that dictate rapid app development to meet changing business requirements
  3. Even for small, departmental level applications, still needed to deploy expensive proprietary stacks which are not only cost and deployment footprint prohibitive but also take weeks to spin up in terms of provisioning cycles.
  4. Big box proprietary solutions leading to a hard look at Open Source technologies which are lean and easy to use with lightweight deployment footprint.Apps need to dictate footprint; not vendor provided containers.
  5. Concerns with acquiring developers who are tooled on cutting edge development frameworks & methodologies. You have zero developer mindshare with Big Box technologies.

Key characteristics of an SDI

  1. Applications built on a SDI can detect business events in realtime and respond dynamically by allocating additional resources in three key areas – compute, storage & network – based on the type of workloads being run.
  2. Using an SDI, application developers can seamlessly deploy apps while accessing higher level programming abstractions that allow for the rapid creation of business services (web, application, messaging, SOA/ Microservices tiers), user interfaces and a whole host of application elements.
  3. From a management standpoint, business application workloads are dynamically and automatically assigned to the available infrastructure (spanning public & private cloud resources) on the basis of the application requirements, required SLA in a way that provides continuous optimization across the life cycle of technology.
  4. The SDI itself optimizes the entire application deployment by both externally provisioned APIs & internal interfaces between the five essential pieces – Application, Compute, Storage, Network & Management.

The SDI automates the technology lifecycle –

Consider the typical tasks needed to create and deploy enterprise applications. This list includes but is not limited to –

  • onboarding hardware infrastructure,
  • setting up complicated network connectivity to firewalls, routers, switches etc,
  • making the hardware stack available for consumption by applications,
  • figure out storage requirements and provision those
  • guarantee multi-tenancy
  • application development
  • deployment,
  • monitoring
  • updates, failover & rollbacks
  • patching
  • security
  • compliance checking etc.
The promise of SDI is to automate all of this from a business, technology, developer & IT administrator standpoint.
 SDI Reference Architecture – 
 The SDI encompasses SDC (Software Defined Compute) , SDS (Software Defined Storage), SDN (Software Defined Networking), Software Defined Applications and Cloud Management Platforms (CMP) into one logical construct as can be seen from the below picture.
FS_SDDC

                      Illustration: The different tiers of Software Defined Infrastructure

The core of the software defined approach are APIs.  APIs control the lifecycle of resources (request, approval, provisioning,orchestration & billing) as well as the applications deployed on them. The SDI implies commodity hardware (x86) & a cloud based approach to architecting the datacenter.

The ten fundamental technology tenets of the SDI –

1. Highly elastic – scale up or scale down the gamut of infrastructure (compute – VM/Baremetal/Containers, storage – SAN/NAS/DAS, network – switches/routers/Firewalls etc) in near real time

2. Highly Automated – Given the scale & multi-tenancy requirements, automation at all levels of the stack (development, deployment, monitoring and maintenance)

3. Low Cost – Oddly enough, the SDI operates at a lower CapEx and OpEx compared to the traditional datacenter due to reliance on open source technology & high degree of automation. Further workload consolidation only helps increase hardware utilization.

4. Standardization –  The SDI enforces standardization and homogenization of deployment runtimes, application stacks and development methodologies based on lines of business requirements. This solves a significant IT challenge that has hobbled innovation at large financial institutions.

5. Microservice based applications –  Applications developed for a SDI enabled infrastructure are developed as small, nimble processes that communicate via APIs and over infrastructure like messaging & service mediation components (e.g Apache Kafka & Camel). This offers huge operational and development advantages over legacy applications. While one does not expect Core Banking applications to move over to a microservice model anytime soon, customer facing applications that need responsive digital UIs will need definitely consider such approaches.

6. ‘Kind-of-Cloud’ Agnostic –  The SDI does not enforce the concept of private cloud, or rather it encompasses a range of deployment options – public, private and hybrid.

7. DevOps friendly –  The SDI enforces not just standardization and homogenization of deployment runtimes, application stacks and development methodologies but also enables a culture of continuous collaboration among developers, operations teams and business stakeholders i.e cross departmental innovation. The SDI is a natural container for workloads that are experimental in nature and can be updated/rolled-back/rolled forward incrementally based on changing business requirements. The SDI enables rapid deployment capabilities across the stack leading to faster time to market of business capabilities.

8. Data, Data & Data –  The heart of any successful technology implementation is Data. This includes customer data, transaction data, reference data, risk data, compliance data etc etc. The SDI provides a variety of tools that enable applications to process data in a batch, interactive, low latency manner depending on what the business requirements are.

9. Security –  The SDI shall provide robust perimeter defense as well as application level security with a strong focus on a Defense In Depth strategy.

10. Governance –  The SDI enforces strong governance requirements for capabilities ranging from ITSM requirements – workload orchestration, business policy enabled deployment, autosizing of workloads to change management, provisioning, billing, chargeback & application deployments.

The next & final blog in this series will look at current & specific technology choices – as of 2016 – in building out an SDI.

Data Lakes power the future of Industrial Analytics..(1/4)

The first post in this four part series on Data lakes will focus on the business reasons to create one. The second post will delve deeper into the technology considerations & choices around data ingest & processing in the lake to satisfy myriad business requirements. The third will tackle the critical topic of metadata management, data cleanliness & governance. The fourth & final post in the series will focus on the business justification to build out a Big Data Center of Excellence (COE).

Business owners at the C level are saying, ‘Hey guys, look. It’s no longer inordinately expensive for us to store all of our data. I want all of you to make copies. OK, your systems are busy. Find the time, get an extract, and dump it in Hadoop.’”- Mike Lang, CEO of Revelytix

The onset of Digital Architectures in enterprise businesses implies the ability to drive continuous online interactions with global consumers/customers/clients or patients. The goal is not just provide engaging visualization but also to personalize services clients care about across multiple modes of interaction. Mobile applications first begun forcing the need for enterprise to begin supporting multiple channels of interaction with their consumers. For example Banking now requires an ability to engage consumers in a seamless experience across an average of four to five channels – Mobile, eBanking, Call Center, Kiosk etc. Healthcare is a close second where caregivers expect patient, medication & disease data at their fingertips with a few finger swipes on an iPad app.

Big Data has been the chief catalyst in this disruption. The Data Lake architectural & deployment pattern makes it possible to first store all this data & then enables the panoply of Hadoop ecosystem projects & technologies to operate on it to produce business results.

Let us consider a few of the major industry verticals and the sheer data variety that players in these areas commonly possess – 

The Healthcare & Life Sciences industry possess some of the most diverse data across the spectrum ranging from – 

  • Structured Clinical data e.g. Patient ADT information
  • Free hand notes
  • Patient Insurance information
  • Device Telemetry 
  • Medication data
  • Patient Trial Data
  • Medical Images – e.g. CAT Scans, MRIs, CT images etc

The Manufacturing industry players are leveraging the below datasets and many others to derive new insights in a highly process oriented industry-

  • Supply chain data
  • Demand data
  • Pricing data
  • Operational data from the shop floor 
  • Sensor & telemetry data 
  • Sales campaign data

Data In Banking– Corporate IT organizations in the financial industry have been tackling data challenges due to strict silo based approaches that inhibit data agility for many years now.
Consider some of the traditional sources of data in banking –

  • Customer Account data e.g. Names, Demographics, Linked Accounts etc
  • Core Banking Data
  • Transaction Data which captures the low level details of every transaction (e.g debit, credit, transfer, credit card usage etc)
  • Wire & Payment Data
  • Trade & Position Data
  • General Ledger Data e.g AP (accounts payable), AR (accounts receivable), cash management & purchasing information etc.
  • Data from other systems supporting banking reporting functions.

Industries have changed around us since the advent of relational databases & enterprise data warehouses. Relational Databases (RDBMS) & Enterprise Data Warehouses (EDW) were built with very different purposes in mind. RDBMS systems excel at online transaction processing (OLTP) use cases where massive volumes of structured data needs to be processed quickly. EDW’s on the other hand perform online analytical processing functions (OLAP) where data extracts are taken from OLTP systems, loaded & sliced in different ways to . Both these kinds of systems are not simply suited to handle not just immense volumes of data but also highly variable structures of data.

awesome-lake

Let us consider the main reasons why legacy data storage & processing techniques are unsuited to new business realities of today.

  • Legacy data technology enforces a vertical scaling method that is sorely unsuited to handling massive volumes of data in a scale up/scale down manner
  • The structure of the data needs to be modeled in a paradigm called ’schema on write’ which sorely inhibits time to market for new business projects
  • Traditional data systems suffer bottlenecks when large amounts of high variety data are processed using them 
  • Limits in the types of analytics that could be performed. In industries like Retail, Financial Services & Telecommunications, enterprise need to build detailed models of customers accounts to predict their overall service level satisfaction in realtime. These models are predictive in nature and use data science techniques as an integral component. The higher volumes of data along with attribute richness that can be provided to them (e.g. transaction data, social network data, transcribed customer call data) ensures that the models are highly accurate & can provide an enormous amount of value to the business. Legacy systems are not a great fit here.

Given all of the above data complexity and the need to adopt agile analytical methods  – what is the first step that enterprises must adopt? 

The answer is the adoption of the Data Lake as an overarching data architecture pattern. Lets define the term first. A data lake is two things – a small or massive data storage repository and a data processing engine. A data lake provides “massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs“.[1] Data Lake are created to ingest, transform, process, analyze & finally archive large amounts of any kind of data – structured, semistructured and unstructured data.

DL_1

                                  Illustration – The Data Lake Architecture Pattern

What Big Data brings to the equation beyond it’s strength in data ingest & processing is a unified architecture. For instance, MapReduce is the original framework for writing applications that process large amounts of structured and unstructured data stored in the Hadoop Distributed File System (HDFS). Apache Hadoop YARN opened Hadoop to other data processing engines (e.g. Apache Spark/Storm) that can now run alongside existing MapReduce jobs to process data in many different ways at the same time. The result is that ANY kind of application processing can be run inside a Hadoop runtime – batch, realtime, interactive or streaming.

Visualization  – Mobile applications first begun forcing the need for enterprise to begin supporting multiple channels of interaction with their consumers. For example Banking now requires an ability to engage consumers in a seamless experience across an average of four to five channels – Mobile, eBanking, Call Center, Kiosk etc. The average enterprise user is also familiar with BYOD in the age of self service. The Digital Mesh only exacerbates this gap in user experiences as information consumers navigate applications as they consume services across a mesh that is both multi-channel as well as provides Customer 360 across all these engagement points.While information management technology has grown at a blistering pace, the human ability to process and comprehend numerical data has not. Applications being developed in 2016 are beginning to adopt intelligent visualization approaches that are easy to use,highly interactive and enable the user to manipulate corporate & business data using their fingertips – much like an iPad app. Tools such as intelligent dashboards, scorecards, mashups etc are helping change a visualization paradigms that were based on histograms, pie charts and tons of numbers. Big Data improvements in data lineage, quality are greatly helping the visualization space.

The Final Word

Specifically, Data Lake architectural pattern provide the following benefits – 

The ability to store enormous amounts of data with a high degree of agility & low cost: The Schema On Read architecture makes it trivial to ingest any kind of raw data into Hadoop in a manner that preserves it’s structure.  Business analysts can then explore  this data and then defined a schema to suit the needs of their particular application.

The ability to run any kind of Analytics on the data: Hadoop supports multiple access methods (batch, real-time, streaming, in-memory, etc.) to a common data set.  You are only restricted by your use case.

the ability to analyze, process & archive data while dramatically cutting cost : Since Hadoop was designed to work on low-cost commodity servers which have direct attached storage – it helps dramatically lower the overall cost of storage.  Thus enterprises are able to retain source data for long periods, thus providing business applications with far greater historical context.

The ability to augment & optimize Data Warehouses: Data lakes & Hadoop technology are not a ‘rip & replace’ proposition. While they provide a much lower cost environment than data warehouses, they can also be used as the compute layer to augment these systems.  Data can be stored, extracted and transformed in Hadoop. Then a subset of the data i.e the results are loaded into the data warehouse. This enables the EDW to leverage compute cycles and storage to perform truly high value analytics.

The next post of the series will dive deeper into the architectural choices one needs to make while creating a high fidelity & business centric enterprise data lake.

References – 

[1] https://en.wikipedia.org/wiki/Data_lake

The six megatrends helping enterprises derive massive value from Big Data..

The world is one big data problem.” – Andrew McAfee, associate director of the Center for Digital Business at MIT Sloan

Though Data as a topic has been close to my heart, it was often a subject I would not deal much with given my preoccupation with applications, middleware, cloud computing & DevOps. However I grabbed the chance to teach a Hadoop course in 2012 and it changed the way I looked at data – not merely an enabler but as the true oil of business. Fast forward to 2016, I have almost completed an amazing and enriching year at Hortonworks.  It is a good time for retrospection about how Big Data is transforming businesses across the Fortune 500 landscape. Thus, I present what is not merely the ‘Art of the Possible’ but ‘Business Reality’ –  distilled insights from an year of working with real world customers. Companies pioneering Big Data  into commercial applications to drive shareholder value & customer loyalty.

Six Megatrends

  Illustration – The Megatrends helping enterprises derive massive value from Big Data 

Please find presented the six megatrends that will continue to drive Big Data into enterprise business & IT architectures for the foreseeable future.

  1. The Internet of Anything (IoAT) – The rise of the machines has been well documented but enterprises have just begun waking up to the possibilities in 2016. The paradigm of  harnessing IoT data by leveraging Big Data techniques has begun to gain industry wide adoption & cachet. For example in the manufacturing industry, data is being gathered from a wide variety of sensors that are distributed geographically along factory locations running 24×7. Predictive maintenance strategies that pull together sensor data, prognostics are critical to efficiency & also to optimize the business. In other verticals like healthcare & insurance, massive data volumes are now being reliably generated from diverse sources of telemetry such as patient monitoring devices as well as human manned endpoints at hospitals. In transportation, these devices include cars in the consumer space, trucks & other field vehicles, geolocation devices. Others include field machinery in oil exploration & server logs across IT infrastructure. In the personal consumer space, personal fitness devices like FitBit, Home & Office energy management sensors etc. All of this constitutes the trend that Gartner terms the Digital Mesh. The Mesh really is built from coupling machine data these with the ever growing social media feeds, web clicks, server logs etc.The Digital Mesh leads to an interconnected information deluge which encompasses classical IoT endpoints along with audio, video & social data streams. This leads to huge security challenges and opportunity from a business perspective  for forward looking enterprises (including Governments). Applications that are leveraging Big Data to ingest, connect & combine these disparate feeds into one holistic picture of an entity – whether individual or institution – are clearly beginning to differentiate themselves. IoAT is starting to be a huge part of digital transformation initiatives with more usecases emerging in 2017 across industry verticals.
  2. The Emergence of Unified Architectures – The onset of Digital Architectures in enterprise businesses implies the ability to drive continuous online interactions with global consumers/customers/clients or patients. The goal is not just provide engaging visualization but also to personalize services clients care about across multiple modes of interaction. Mobile applications first begun forcing the need for enterprise to begin supporting multiple channels of interaction with their consumers. For example Banking now requires an ability to engage consumers in a seamless experience across an average of four to five channels – Mobile, eBanking, Call Center, Kiosk etc. Healthcare is a close second where caregivers expect patient, medication & disease data at their fingertips with a few finger swipes on an iPad app. What Big Data brings to the equation beyond it’s strength in data ingest & processing is a unified architecture. For instance, MapReduce is the original framework for writing applications that process large amounts of structured and unstructured data stored in the Hadoop Distributed File System (HDFS). Apache Hadoop YARN opened Hadoop to other data processing engines (e.g. Apache Spark/Storm) that can now run alongside existing MapReduce jobs to process data in many different ways at the same time. The result is that ANY kind of application processing can be run inside a Hadoop runtime – batch, realtime, interactive or streaming.
  3. Consumer 360 Mobile applications first begun forcing the need for enterprise to begin supporting multiple channels of interaction with their consumers. For example Banking now requires an ability to engage consumers in a seamless experience across an average of four to five channels – Mobile, eBanking, Call Center, Kiosk etc. The healthcare industry stores patient data across multiple silos – ADT (Admit Discharge Transfer) systems, medication systems, CRM systems etc. Data Lakes provide an ability to visualize all of the patients data in one place thus improving outcomes. The Digital Mesh (covered above) only exacerbates this semantic gap in user experiences as information consumers navigate applications as they consume services across the mesh. A mesh that is both multi-channel as well as one that needs a 360 degree customer view across all these engagement points. Applications developed in 2016 and beyond must take a 360 degree based approach to ensuring a continuous client experience across the spectrum of endpoints and the platforms that span them from a Data Visualization standpoint. Every serious business needs to provide a unified view of a customer across tens of product lines and geographies.
  4. Machine Learning, Data Science & Predictive Analytics – Most business problems are data challenges and an approach centered around data analysis helps extract meaningful insights from data thus helping the business It is a common capability now for many enterprises to possess the capability to acquire, store and process large volumes of data using a low cost approach leveraging Big Data and Cloud Computing.  At the same time the rapid maturation of scalable processing techniques allows us to extract richer insights from data. What we commonly refer to as Machine Learning – a combination of  of econometrics, machine learning, statistics, visualization, and computer science – extract valuable business insights hiding in data and builds operational systems to deliver that value.Data Science has evolved to a new branch called “Deep Neural Nets” (DNN). DNN Are what makes possible the ability of smart machines and agents to learn from data flows and to make products that use them even more automated & powerful. Deep Machine Learning involves the art of discovering data insights in a human-like pattern. The web scale world (led by Google and Facebook) have been vocal about their use of Advanced Data Science techniques and the move of Data Science into Advanced Machine Learning. Data Science is an umbrella concept that refers to the process of extracting business patterns from large volumes of both structured, semi structured and unstructured data. It is emerging the key ingredient in enabling a predictive approach to the business. Data Science & it’s applications across a range of industries are covered in the blogpost http://www.vamsitalkstech.com/?p=1846
  5. Visualization  – Mobile applications first begun forcing the need for enterprise to begin supporting multiple channels of interaction with their consumers. For example Banking now requires an ability to engage consumers in a seamless experience across an average of four to five channels – Mobile, eBanking, Call Center, Kiosk etc. The average enterprise user is also familiar with BYOD in the age of self service. The Digital Mesh only exacerbates this gap in user experiences as information consumers navigate applications as they consume services across a mesh that is both multi-channel as well as provides Customer 360 across all these engagement points.While information management technology has grown at a blistering pace, the human ability to process and comprehend numerical data has not. Applications being developed in 2016 are beginning to adopt intelligent visualization approaches that are easy to use,highly interactive and enable the user to manipulate corporate & business data using their fingertips – much like an iPad app. Tools such as intelligent dashboards, scorecards, mashups etc are helping change a visualization paradigms that were based on histograms, pie charts and tons of numbers. Big Data improvements in data lineage, quality are greatly helping the visualization space.
  6. DevOps – Big Data powered by Hadoop has now evolved into a true application architecture ecosystem as mentioned above. The 30+ components included in an enterprise grade platform like the Hortonworks Data Platform (HDP) include APIs (Application Programming Interfaces) to satisfy every kind of data need that an application could have – streaming, realtime, interactive or batch. Couple that with improvements in predictive analytics. In 2016, enterprise developers leveraging Big Data have been building scalable applications with data as a first class citizen. Organizations using DevOps are already reaping the rewards as they are able to streamline, improve and create business processes to reflect customer demand and positively affect customer satisfaction. Examples abound in the Webscale world (Netflix, Pinterest, and Etsy) but we now have existing Fortune 1000 companies in verticals like financial services, healthcare, retail and manufacturing who are benefiting from Big Data & DevOps.Thus, 2016 will be the year when Big Data techniques are no longer be the preserve of classical Information Management teams but move to the umbrella application development area which encompasses the DevOps and Continuous Integration & Delivery (CI-CD) spheres.
    One of DevOps chief goal’s is to close the long-standing gap between the engineers who develop and test IT capability and the organizations that are responsible for deploying and maintaining IT operations. Using traditional app dev methodologies, it can take months to design, test and deploy software. No business today has that much time—especially in the age of IT consumerization and end users accustomed to smart phone apps that are updated daily. The focus now is on rapidly developing business applications to stay ahead of competitors that can better harness the Big Data business capabilities. The micro services architecture approach advocated by DevOps combines the notion of autonomous, cooperative yet loosely coupled applications built as a conglomeration of business focused services is a natural fit for the Digital Mesh.  The most important additive and consideration to micro services based architectures in 2016 are  Analytics Everywhere.

The Final Word – 

We have all heard about the growth of data volumes & variety. 2016 is perhaps the first year where forward looking business & technology executives have begin capturing commercial value from the data deluge by balancing analytics with creative user experience. 

Thus, modern data applications are making Big Data ubiquitous. Rather than existing as back-shelf tools for the monthly ETL run or for reporting, these modern applications can help industry firms incorporate data into every decision they make.  Applications in 2016 and beyond are beginning  to recognize that Analytics are pervasive, relentless, realtime and thus embedded into our daily lives.  

An Enterprise Wide Framework for Digital Cybersecurity..(4/4)

The first two posts in this series on Cybersecurity have focused on the strategic issues around information security and the IT response from the datacenter. The third post then spent discussed exciting new innovations being ushered in by Big Data techniques and players in the open source space. This fourth & final post in the series will focus on the business steps that Corporate boards, Executive & IT leadership need to adopt from a governance & strategy standpoint to protect & insulate their businesses from the constant firehose of cyber attacks.

Cybersecurity – A Board level concern – 

Enterprise business is built around data assets and data is the critical prong of any digital initiative. For instance, Digital Banking platforms & Retail applications are evolving to collections of data based ecosystems. These  need to natively support loose federations of partner applications, regulatory applications which are API based & Cloud native. These applications are majorly micro service architecture based & need to support mobile clients from the get go. Owing to their very nature in that they support massive amounts of users & based on their business priority, these tend to take a higher priority in the overall security equation .

The world of business is now driven by complex software & information technology.  IT is now enterprise destiny. Given all of this complexity across global operating zones, perhaps no other business issue has the potential to result in massive customer drain, revenue losses, reputation risks & lawsuits from affected parties as do breaches in Cybersecurity. A major breach in security is a quick game-changer and has the potential to put an organization in defensive mode for years.

Thus, Corporate Boards which have been long insulated from technology decisions now want to understand from their officers how they’re overseeing, and mitigating cyber security. Putting into place an exemplary program that can govern across a vast & quickly evolving cybersecurity threat landscape is a vital board level responsibility. The other important point to note is the interconnected nature of these business ecosystems implies the need for external collaboration as well as a dedicated executive to serve as a Cyber czar.

Enter the formal role of the CISO (Chief Information Security Officer)….

The CISO typically heads an independent technology and business function with a dedicated budget & resources. Her or his mandate extends from physical security (equipment lockdown, fob based access control etc_ to setting architectural security standards for business applications as well as reviewing business processes. One of the CISO’s main goals is standardize the internal taxonomy of cyber risk and to provide a framework for quantifying these risks across a global organization.

A new approach to cybersecurity as a business issue is thus highly called for. Enterprises have put in place formal programs for cybersecurity with designated a CISO (Chief Information Security Officer). The CISO has a team reporting to her which ensures that detailed threat assessments are created as well as dedicated resources embedded both in the lines of business as well as in central architecture & operations to maintain smooth business continuity in the event of security breach led disruptions.

Cybersecurity – An Enterprise Wide Process – 

With all of that in mind, let us take a look at a the components of an enterprise wide cybersecurity program in critical industries like financial services and insurance. I will follow each of the steps with detailed examples from a real world business standpoint. A key theme across the below will be to ensure that the cybersecurity program in and of itself shall not turn burdensome to business operation & innovation. Doing so would defeat the purpose of having such a program.

The program is depicted in the below illustration.

CyberSecurityProgram

                                             Illustration – Enterprise Cybersecurity Process

The first step is almost always an Assessment processes which itself has two sub components – business threat assessment & information threat assessment. The goal here should be to comprehensively understand the organizations business ecosystem  by taking into account every actor – internal or external- that interfaces with the business. For an insurance company, this includes customers, prospects, partner organizations (banks, reinsurance firms), internal actors (e.g. underwriters, actuaries etc).

For a Bank, this includes fraud & cyber risks around retail customer ACH accounts, customer wires,  commercial customer accounts along with the linked entities they do business with, millions of endpoint devices like ATMs & POS terminals, a wide payments ecosystem etc etc. Understanding the likely business threats across each role & defining appropriate operational metrics across those areas is a key part of this stage. At the same time, the range of information used across the organization starting with customer data, payment systems data, employee data should be catalogued and classified based on their threat levels from Critical to Restricted to Internal Use to Benign et al. These classifications must be communicated over to the lines of business as well as IT & development organizations. It is critical for operations & development teams to understand this criticality from the perspective of incorporating secure & efficient development methodologies into their current IT Architecture & development practices.

The next step in the process is to Plan & Benchmark the current state of security with the industry standard organizations to better understand where the internal cyber gaps may lie across the entire range of business systems. This step also takes into account the Digital innovation roadmap in the organization and does not treat areas like Mobility, Cloud Computing, DevOps, Big Data as being distinct from a security theme standpoint. This is key to ensuring that effective controls can be applied in a forward looking manner. For instance, understanding where gaps lie from a Sarbanes Oxley or PCI DSS or HIPAA regulations ensure that appropriate steps be taken to bring these different systems up to par from an industry standpoint. Across these process appropriate risk migrations need to be understood for systems across the board. This ranges from desktop systems, mobile devices and systems which hold & process client data.

The third step is the Execution step. This has three subcomponents –  Systems & IT Refresh & the Governance Process.

The Systems & IT Refresh step deals with instituting specific security technologies, IT Architectures, Big Data standards etc into line of business & central IT systems with the view of remediating or improving gaps observed across step 1. The list of systems is too exhaustive to cover here but at a minimum it includes all the security systems covered here in the first blog in this series @ http://www.vamsitalkstech.com/?p=1265

The Execution step will also vary based on the industry vertical you operate in. Let me explain this with an example.

For instance, in Banking, in addition to general network level security, I would categorize business level security into four specific buckets –   general fraud, credit card fraud, AML compliance and cyber security.

  • Firstly, the current best practice in the banking industry is to encourage a certain amount of convergence in the back end data infrastructure across all of the fraud types – literally in the tens.  Forward looking institutions are building cybersecurity data lakes to aggregate & consolidate all digital banking information, wire data, payment data, credit card swipes, other telemetry data (ATM & POS)  etc in one place to do security analytics. This approach can payoff in a big way.
  • Across all of these different fraud types, the common thread is that the fraud is increasingly digital (or internet based) and they fraudster rings are becoming more sophisticated every day. To detect these infinitesimally small patterns, an analytic approach beyond the existing rules based approach is key to understand for instance – location based patterns in terms of where transactions took place, Social Graph based patterns and Patterns which can commingle realtime & historical data to derive insights.

               

Finally, the Governance process.

Over a certain period of time, it is a given that every organization will be breached. The executive team has to to set in place a governance strategy that recognizes overall limitations in a defensive posture and seeks to move the organization to an active defense approach. The goals of this process are to deeply advise the board not only on how to manage cyber risk from a business mitigation perspective but also be able to setup a steering committee to manage customer, legal & media outreach. The executive team themselves needs to be trained in cybersecurity issues and this should be lead by the CISO. Attention has to be paid to ensuring that the CISO’s team is not only staffed with risk, compliance & fraud detection personnel but also those with expertise and contacts in the specific lines of business that the organization operates across. To that end, the CISO’s team has to be funded at the highest levels of the organization. Investment in human activities like training classes, certifications & regular cybersecurity drills will also ensure a high level of preparedness across the organization. Explicit incident response plans need to be created across different business areas. Based on the specific vulnerability & concomitant business risk, the CISO will need to decide if each of the specific risks can be shared over multiple external actors – vendors, suppliers & other partners. If not, it would make a lot of sense to look for cyber risk insurance, an emerging business area, in those specific situations. More on Cyber risk in a followup post. To reiterate one of the points I made above, a strong cybersecurity process does not inhibit business agility.

What are the questions business execs and boards should ask of their IT:

A few key questions that business management should ask of themselves from a cybersecurity standpoint.
  • How are we doing on Cybersecurity from a competitive & business level standpoint? Further, are we answering this question using a business metric drive approach that assigns scores to the program in various categories? For instance – no of breaches, malware incidents, pace & the effectiveness of response. Are these goals S.M,A.R,T ?
  • Are all systems under regulation protected using appropriate controls?
  • Are we able to hire the best and brightest security personnel and engage them within lines of business?
  • Are we investing in the best IT solutions that leverage Big Data & Cloud Computing that have been proven to be more secure than older fragmented architectures? Can my IT leadership vocalize our roadmap goals across these areas?
  • Are my line of business leaders engaged in cybersecurity from the perspective of their business areas?
  • Is our business ecosystem protected? What are my partners doing to protect sensitive consumer & business data?
  • Are we all sharing appropriate information constantly with industry consortia around threat intelligence & the authorities i.e law enforcement and the federal government agencies?

Conclusion:

My goal in this post to bring forth the high level dimensions of a cybersecurity plan at the board level while not being over prescriptive in terms of specific industry & business actions. Based on my years of working in sensitive industries like Financial Services & Insurance, Healthcare and Telco, I can confidently say that if the broad contours of the above strategy are adopted, you are on your way to becoming an organization with a strong foundation for Cybersecurity management. In this Digital Age, that can be a huge competitive differentiator.

Global Banking faces it’s Uber Moment..

The neighborhood bank branch is on the way out and is being slowly phased out as the primary mode of customer interaction for Banks. Banks across the globe have increased their technology investments in strategic areas such as Analytics, Data & Mobile. The Bank of the future increasingly resembles a technology company.

I have no doubt that the financial industry will face a series of Uber moments,” – Antony Jenkins (then CEO) of Barclays Bank, 2015

The Washington Post proclaimed in an article [1] this week that bank branch on the corner of Main Street may not be there much longer.

Technology is transforming Banking thus leading to dramatic changes in the landscape of customer interactions. We live in the age of the Digital Consumer – Banking in the age of the hyper-connected consumer. As millenials join the labor force, they are expecting to be able to Bank from anywhere, be it a mobile device or use internet banking from their personal computer.

As former Barclays CEO Antony Jenkins described it in a speech given last fall, the global banking industry, which is under severe pressure from customer demands for increased automation and contextual services, will slash employment and branches by 20 percent to 50 percent over the next decade.[2]

“I have no doubt that the financial industry will face a series of Uber moments,” he said in the late-November speech in London, referring to the way that Uber and other ride-hailing companies have rapidly unsettled the taxi industry.[2]

Banking must trend Digital to respond to changing client needs – 

The Financial Services and the Insurance industry are facing an unprecedented amount of change driven by factors like changing client preferences and the emergence of new technology—the Internet, mobility, social media, etc. These changes are immensely profound, especially with the arrival of “FinTech”—technology-driven applications that are upending long-standing business models across all sectors from retail banking to wealth management & capital markets. Further, members of a major new segment, Millennials, increasingly use mobile devices, demand more contextual services and expect a seamless unified banking experience—something akin to what they  experience on web properties like Facebook, Amazon, Uber, Google or Yahoo, etc.

The definition of Digital is somewhat nebulous, I would like to define the key areas where it’s impact and capabilities will need to be felt for this gradual transformation to occur.

A true Digital Bank needs to –

  • Offer a seamless customer experience much like the one provided by the likes of Facebook & Amazon i.e highly interactive & intelligent applications that can detect a single customer’s journey across multiple channels
  • offer data driven interactive services and products that can detect customer preferences on the fly, match them with existing history and provide value added services. Services that not only provide a better experience but also foster a longer term customer relationship
  • to be able to help the business prototype, test, refine and rapidly develop new business capabilities
  • Above all, treat Digital as a Constant Capability and not as an ‘off the shelf’ product or a one off way of doing things

Though some of the above facts & figures may seem startling, it’s how individual banks put both data and technology to work across their internal value chain that will define their standing in the rapidly growing data economy.

Enter the FinTechs

FinTechs (or new Age financial industry startups) offer enhanced customer experiences built on product innovation and agile business models. They do so by expanding their wallet share of client revenues by offering contextual products tailored to individual client profiles. Their savvy use of segmentation data and predictive analytics enables the delivery of bundles of tailored products across multiple delivery channels (web, mobile, Point Of Sale, Internet, etc.). Like banks, these technologies support multiple modes of payments at scale, but they aren’t bound by the same regulatory and compliance regulations as are banks, who operate under a mandate that they must demonstrate that they understand their risk profiles. Compliance is an even stronger requirement for banks in areas around KYC (Know Your Customer) and AML (Anti Money Laundering) where there is a need to profile customers—both individual & corporate—to decipher if any of their transaction patterns indicate money laundering, etc.

Banking produces the most data of any industry—rich troves of data that pertain to customer transactions, payments, wire transfers and demographic information. However, it is not enough for financial service IT departments to just possess the data. They must be able to drive change through legacy thinking and infrastructures as the industry changes—both from a data product as well as from a risk & compliance standpoint.

The business areas shown in the below illustration are a mix of both legacy capabilities (Risk, Fraud and Compliance) to the new value added areas (Mobile Banking, Payments, Omni-channel Wealth Management etc).

DataDriven1

   Illustration – Predictive Analytics and Big Data are upending business models in Banking across multiple vectors of disruption 

Business Challenges facing banks today

Banks and other players across the financial spectrum face challenges across three distinct areas. First and foremost they need to play defense with a myriad of regulatory and compliance legislation across defensive areas of the business such as risk data aggregation and measurement and financial compliance and fraud detection. On the other hand, there is a distinct need to vastly improve customer satisfaction and stickiness by implementing predictive analytics capabilities and generating better insights across the customer journey thus driving a truly immersive digital experience. Finally, banks need to leverage their mountains of data assets to create new business models and go-to-market strategies. They need to do this by monetizing multiple data sources—both data-in-motion and data-at-rest—for actionable intelligence.

Data is the single most important driver of bank transformation, impacting financial product selection, promotion targeting, next best action and ultimately, the entire consumer experience. Today, the volume of this data is growing exponentially as consumers increasingly share opinions and interact with an array of smart phones, connected devices, sensors and beacons emitting signals during their customer journey.

Data Challenges – 

Business and technology leaders are struggling to keep pace with a massive glut of data from digitization, the internet of things, machine learning, and cybersecurity for starters. A data lake—which combines data assets, technology and analytics to create enterprise value at a massive scale—can help businesses gain control over their data.

Fortunately, Big Data driven predictive analytics is here to help.  The Hadoop platform and ecosystem of technologies have matured considerably and have evolved to supporting business critical banking applications. The emergence of cloud platforms is helping in this regard.

Positively impacting the banking experience requires data

Whether at the retail bank or at corporate headquarters, there are a number of ways to leverage technology in order to enable a successful consumer experience across all banking sectors:

Retail & Consumer Banking

Banks need to move to a predominantly online model, providing consumers with highly interactive, engaging and contextual experiences that span multiple channels—branch banking, eBanking, POS, ATM, etc. Further goals are increased profitability per customer for both micro and macro customer populations with the ultimate goal of increasing customer lifetime value (CLV).

Capital Markets

Capital markets firms must create new business models and offer superior client relationships based on their data assets. Those that leverage and monetize their data assets will enjoy superior returns and raise the bar for the rest of the industry. It is critical for capital market firms to better understand their clients (be they institutional or otherwise) from a 360-degree perspective so they can be marketed to as a single entity across different channels—a key to optimizing profits with cross selling in an increasingly competitive landscape.

Wealth Managers

The wealth management segment (e.g., private banking, tax planning, estate planning for high net worth individuals) is a potential high growth business for any financial institution. It is the highest touch segment of banking, fostered on long-term and extremely lucrative advisory relationships. It is also the segment most ripe for disruption due to a clear shift in client preferences and expectations for their financial future. Actionable intelligence gathered from real-time transactions and historical data becomes a critical component for product tailoring, personalization and satisfaction.

Corporate Banking

The ability to market complex financial products across a global corporate banking client base is critical to generating profits in this segment. It’s also important to engage in risk-based portfolio optimization to predict which clients are at risk for adverse events like defaults. In addition to being able to track revenue per client and better understand the entities they bank with, it is also critical that corporate banks track AML compliance.

The future of data for Financial Services

Understand the Customer Journey

Across retail banking, wealth management and capital markets, a unified view of the customer journey is at the heart of the bank’s ability to promote the right financial product, recommend a properly aligned portfolio products, keep up with evolving preferences as the customer relationship matures and accurately predict future revenue from a customer. But currently most retail, investment banks and corporate banks lack a comprehensive single view of their customers. Due to operational silos, each department has a limited view of the customer across multiple channels. These views are typically inconsistent, vary quite a bit and result in limited internal collaboration when servicing customer needs. Leveraging the ingestion and predictive capabilities of a Big Data platform, banks can provide a user experience that rivals Facebook, Twitter or Google and provide a full picture of customer across all touch points.

Create Modern data applications

Banks, wealth managers, stock exchanges and investment banks are companies run on data—data on deposits, payments, balances, investments, interactions and third-party data quantifying risk of theft or fraud. Modern data applications for banking data scientists may be built internally or purchased “off the shelf” from third parties. These new applications are powerful and fast enough to detect previously invisible patterns in massive volumes of real-time data. They also enable banks to proactively identify risks with models based on petabytes of historical data. These data science apps comb through the “haystacks” of data to identify subtle “needles” of fraud or risk not easy to find with manual inspection.

These modern data applications make Big Data and data science ubiquitous. Rather than back-shelf tools for the occasional suspicious transaction or period of market volatility, these applications can help financial firms incorporate data into every decision they make. They can automate data mining and predictive modeling for daily use, weaving advanced statistical analysis, machine learning, and artificial intelligence into the bank’s day-to-day operations.

Conclusion – Banks need to drive Product Creation using the Latest Technology –  

A strategic approach to industrializing analytics in a Banking organization can add massive value and competitive differentiation in five distinct categories –

  1. Exponentially improve existing business processes. e.. Risk data aggregation and measurement, financial compliance, fraud detection
  2. Help create new business models and go to market strategies – by monetizing multiple data sources – both internal and external
  3. Vastly improve customer satisfaction by generating better insights across the customer journey
  4. Increase security while expanding access to relevant data throughout the enterprise to knowledge workers
  5. Help drive end to end digitization

If you really think about it –  all that banks do is manipulate and deal in data. If that is not primed for a Über type of revolution I do not know what is.

References
[1] https://www.washingtonpost.com/news/wonk/wp/2016/04/19/say-goodbye-to-your-neighborhood-bank-branch/

[2] http://www.theguardian.com/business/2015/nov/25/banking-facing-uber-moment-says-former-barclays-boss

Cybersecurity – the Killer App for Big Data..(3/4)

Most people are starting to realize that there are only two different types of companies in the world: those that have been breached and know it and those that have been breached and don’t know it. Therefore, prevention is not sufficient and you’re going to have to invest in detection because you’re going to want to know what system has been breached as fast as humanly possible so that you can contain and remediate.” – Kevin Mitnick in “The World’s Most Famous Hacker”

The first two posts in this series on Cybersecurity have focused on the strategic issues around information security and the IT response from the datacenter. This third post will focus on the exciting new innovations being ushered in by Big Data techniques and players in the open source space. The final post of the series will focus on the business steps that Corporate boards, Executive & IT leadership need to adopt from a governance & strategy standpoint to protect & insulate their businesses from the constant firehose of cyber attacks.

The Corporate State of Mind with Cybersecurity– 

Beyond the sheer complexity of massively coordinated attacks, the Internet is driving a need for financial institutions & merchants to provide new channels of customer interactions. Companies are constantly collecting large amounts of consumer data across multiple touch points to paint a single picture of a customer’s journey. They need to do this to help guide constant customer interactions with enriched business context. While this has beneficial effects in terms of new markets across the globe opening up to businesses in these verticals – there is a concomitant increase in vulnerability both in terms of speed of these attacks as well as the resources needed to mount them. The hackers only need to be able to penetrate defenses once to be able to compromise both customer data as well as intellectual property.

Organizations under threat of cyber attacks broadly engage in a defensive approach to cybersecurity. What do I mean by that? They largely invest in a range of technology-oriented solutions across a range of functional areas including intrusion detection systems (IDS), firewalls, data protection products, Identity & Access Management (IAAM) solutions etc –  a range of which were covered in the first blog in this series (http://www.vamsitalkstech.com/?p=1265). While all of these investments are essential and have tremendous value to offer in their respective security silos, it bears note that hacker rings and other cyber threats are constantly evolving themselves – both from a technology as well as a fraud pattern sophistication standpoint. Thanks to Cloud Computing & easy access to a tremendous amount of compute & storage,  the technological sophistication of these bad actors is only growing. They are also increasingly well funded, in some cases by rogue governments across the globe. In addition, the cyberattacker of 2016 also leverages the Dark Web for tools that range from the latest in malware, network intrusion etc – tools that can bypass the strongest corporate firewall.

In addition, whole new kinds of cyber attacks are emerging in industry verticals like financial services. For instance, Banks continue to innovate to meet consumer demand in areas ranging from ATMs to modern point of sale (PoS) terminals to Internet Banking – they face newer and more sophisticated threats. These include – Distributed Denial of service attacks; Corporate Account Take Over (CATO) attacks, ATM cash outs etc as discussed in the first blog in this series. The common theme to these attacks is the exponentially growing amounts of network traffic that must now be handled across the billions of business records that are being produced by a range of actors across the industry – consumers, IoT enabled devices, Telemetry devices like ATMs, POS terminals etc. The data deluge across industries is only too well known thanks to the media. Digitization of consumer interactions, mobile technology & the Internet of Things (IoT) are all driving consumer demands for enterprise applications to be highly responsive yet not result in a loss of privacy and security of sensitive data.

Enter the SOC 

To provide for an integrated approach across the above security platforms & toolsets, enterprises have begun investing in SOC (Security Operations Center) platforms. The SOC is a formalized capability designed to handle any and all security incidents across millions of endpoints. The goal is to provide for corporate wide data collection, data aggregation, threat detection, advanced analytic and workflow capabilities – all from a single area of management. Thus SOC systems perform a highly essential function as they deal with massive amounts of data streams constantly being generated by many different systems, devices & business applications. These range from intrusion detection systems, firewalls, antivirus tools etc as discussed above. All of this data is then pulled into security incident and event management (SIEM) tools, which then filter, aggregate, correlate and then provide reporting functions from a security alert standpoint. The typical workflow followed is to mimic the signature behavior of endpoint systems & applications into static models that reflect the typical behavior of applications using business rules & then flag any out of band behavior. A security analyst then determines if this alert represents a specific threat or if it is just harmless noise. For example – a credit card usage event from a known bad IP address, or, erroneous application behavior that could signify a malware compromise etc are all things SOC systems are tailored to detect.

SOC systems have proved to be highly effective across a range of use cases but more importantly at offering a unified place to aggregate security related data and to perform analytics on them. The effectiveness of this compared to older approaches cannot be overstated.

The Malicious Insider Threat 

One of the biggest limitations of the classical signature-based approach to detecting cyber threats is that it cannot tackle the growing threat from insiders. As we have seen from the news headlines, more often than not, insiders cause a variety of data breaches to occur. These actions range from pure neglect or error (e.g. not patching sensitive systems, virus definitions to clicking on email phishes etc) to, malicious actions caused by a range of motivations ranging from data theft to a need to hurt the organization due to some grievance. Thus, CISOs (Chief Information Security Officers) must adopt an active approach to mitigating such insider threats, just as they must do for many external threats. SOC systems are particularly unsuited to detecting insider threats and CISOs are being forced to adopt data oriented tools and techniques to glean patterns in how insiders use IT systems to understand if any of it contains harmful activity.

The other limitations of the SOC approach also need to be catalogued-

  • The rate of false positives which are high but some of which may signify an actual compromise
  • The amount of time taken by the SOC analyst in the process of triage
  • The need to look for existing bad behavior signature patterns which doesn’t protect against new (or zero day) exploits
  • The lack of an ability to resolve the threat to business applications from partners
  • Lack of learning capabilities as the attack patterns and threats themselves evolve constantly

In the face of such challenges, there is a need to re-look the security architecture of the future. I propose this can be achieved in four strategic ways from a technical perspective.

  1. Leverage real time analytics as the foundation of any security strategy. This is only possible by adopting data analytics that provide real time analysis at extremely low time latencies. An ability to constantly ingest and analyze data from network devices, malware sources, identity and authentication systems. The ability to leverage machine learning and data science to do threat classification as opposed to strict rules based approaches to analyze relationships between data
  2. Natively integrate these analytics into these applications such that they promote and way of automatic learning of threat patterns.
  3. Promote ways to enable business processes to learn from these incidents
  4. Promote an open source ecosystem so that every enterprise that adopts these platforms can automatically learn & enhance their analytics as a way of joining forces against the cyberattacker communities

Enter Big Data –

So what can Big Data and the Hadoop ecosystem bring to this complex world of security analytics as applied to the above strategies? The answer is “All of the above and much more.” Leveraging a Big Data approach to supplant existing investments, cyber defense can move into attack mode as well.

As depicted in the below illustration, Big Data provides a data platforms that can ingest massive amounts of internal & external data,. On this provide machine learning, text mining & ontology modeling to provide advanced cyber detection, prediction and prevention. According to IDC, the big data and analytics market will reach $125 billion worldwide in 2015 [3]. It is clearly evident that an increased number of cyber security platforms will leverage big data storage and analytics going forward. Various Cybersecurity solutions like – network security, malware detection and endpoint security are beginning to feed data into a Big Data analytic platform.

Screen Shot 2016-04-06 at 9.30.31 AM

                           Illustration – Big Data Analytics (Adapted & Redrawn from IBM)

Big Data can provide Cybersecurity capabilities in four key areas –

  1. The ability to ingest application data:As players in key verticals expand the definition of Cybersecurity to encompass the insider threat – call data records (CDR), chat messages, business process data, social media activity & emails etc are all rich sources of threat detection which must be ingested as well as processed for consumption by SOC consoles.
  2. The ability to capture, store & process high volumes of any kind of security & security telemetry data at scale:Security data (e.g threat intelligence, geolocation, watchlist data,clickstreams etc) is constantly produced in every enterprise and ,all of it can be pushed to a Hadoop HDFS backed data lake.
  3. Perform universal processing of the data (transformation, enrichment, forensic analysis on the data: Such processing combines but is not limited to -business rules, machine learning, text mining to provide a way to model security threats as well as detection & deterrence processing.
  4. Long term information storage:In verticals like financial services, information security is expanding to not just include the classic security data but also AML (Anti Money Laundering) & Credit Card Fraud data that are both highly application driven.

With all of the above in mind, I would like to introduce the leading open source cybersecurity project built on Hadoop technology – Apache Metron.

Apache Metron:

Apache Metron was originally invented by James Sirota at Cisco systems[4]. Sirota is now Chief Data Scientist at Hortonworks and his team has been driving increased capabilities into Metron from both a feature as well as a community collaboration standpoint. Metron has been open sourced and has just attained top level project status within the Apache foundation. Expect to see increased maturity, feature richness and stability around the project as the vibrant open source community increasingly leverages Metron across multiple cybersecurity initiatives.

At a minimum, when combined with a datalake, it integrates a variety of open source big data technologies (e.g Apache Spark, Storm, Flume, HDFS etc) in order to offer a centralized tool for security monitoring and analysis. It provides capabilities for log aggregation, full packet capture indexing, storage, advanced behavioral analytics and data enrichment, while applying the most current threat-intelligence information to security telemetry within a single platform as depicted in the below illustration.

Metron_SOC

                           Illustration: Apache Metron – Key Capabilities (source – Hortonworks)


While a deepdive into Metron is a topic for a followup post, as the diagram above indicates, the Metron framework provides 4 key capabilities[3]:

    1. Security Data Lake / Vault – It provides cost effective way to store enriched telemetry data for long periods of time. This data lake provides the corpus of data required to do feature engineering that powers discovery analytics and provides a mechanism to search and query for operational analytics.
    2. Pluggable Framework – It provides not only a rich set of parsers for common security data sources (pcap, netflow, bro, snort, fireye, sourcefire) but also provides a pluggable framework to add new custom parsers for new data sources, add new enrichment services to provide more contextual info to the raw streaming data, pluggable extensions for threat intel feeds, and the ability to customize the security dashboards.
    3. Security Application – Metron provides standard SIEM like capabilities (alerting, threat intel framework, agents to ingest data sources) but also has packet replay utilities, evidence store and hunting services commonly used by SOC analysts.
    4. Threat Intelligence Platform – Metron will provide advanced defense techniques that consists of using a class of anomaly detection and machine learning algorithms that can be applied in real-time as events are streaming in.

Conclusion

We have covered a lot of ground in this post to reiterate the fact that big data is a natural fit for powerful security analytics. The Hadoop ecosystem & projects like Metron combine to provide a scalable platform for security analytics that can effectively enable rapid detection and rapid response for advanced security threats. It is heartening that an industry leader like Hortonworks is not only recognizing the grave business threat that Cybersecurity presents but is also driving an open source ecosystem around such needs.

The final post of the series will focus on the business recommendations that Corporate boards, Executive (CISOs, CXOs), Business & IT leadership need to adopt from both a governance & strategy standpoint to protect & insulate their businesses from the constant firehose of cyber attacks.

References –

  1. SANS SOC Reference – https://www.sans.org/reading-room/whitepapers/analyst/building-world-class-security-operations-center-roadmap-35907
  2. Hortonworks blog by James Sirota  – http://hortonworks.com/blog/leveraging-big-data-for-security-analytics/
  3. Metron Explained – https://community.hortonworks.com/articles/26050/apache-metron-explained.html

A Digital Bank is a Data Centric Bank..

“There’s no better way to help a customer than to be there for them in the moments that matter.” — Lucinda Barlow, Google

The Banking industry produces the most data of any vertical out there with well defined & long standing business processes that have stood the test of time. Banks possess rich troves of data that pertain to customer transactions & demographic information. However, it is not enough for Bank IT to just possess the data. They must be able to drive change through legacy thinking and infrastructures as things change around the entire industry not just from a risk & compliance standpoint.

For instance a major new segment are the millennial customers – who increasingly use mobile devices and demand more contextual services as well as a seamless unified banking experience – akin to what they commonly experience via the internet – at web properties like Facebook, Amazon, Uber, Google or Yahoo etc.

The Data Centric Bank

Banks, wealth managers, stock exchanges and investment banks are companies run on data—data on deposits, payments, balances, investments, interactions and third-party data quantifying risk of theft or fraud. Modern data applications for banking data scientists may be built internally or purchased “off the shelf” from third parties. These new applications are powerful and fast enough to detect previously invisible patterns in massive volumes of real-time data. They also enable banks to proactively identify risks with models based on petabytes of historical data. These modern data science applications comb through the “haystacks” of data to identify subtle “needles” of fraud or risk not easy to find with manual inspection.

The Bank of the future looks somewhat like the below –

Digital_Journey_Banking

                                            Illustration – The Data Driven Bank

How do Banks stay relevant in this race and how is the Digital Journey to be accomplished?

I posit that there are five essential steps –

  1. Build for the organization of the future by inculcating innovation into the cultural DNA. A good chunk of the FinTech’s success is owed to a contrarian mindset in terms of creating business platforms using technology & generating a huge competitive advantage. The secret is following a strategy of continuous improvements. This is done by generating new ideas, being unafraid to cannibalize older (and even profitable) ideas and constant experimenting across new businesses. For instance Facebook is famous for not having a review board that designers and engineers go present to with PowerPoint slides. Here prototypes and pilot projects are directly presented to executives – even to CEO Mark Zuckerberg. Facebook pivoted in a couple of years from weak mobile offerings to becoming the #1 mobile app company (more users access their FB pages using mobile devices running iOS & Android compared to using laptops).
  2. Leverage Predictive Analytics across all data sets – A large part of the answer is to take an industrial approach to predictive analytics.  The current approach as in vogue – to treat these as one-off, tactical project investments does not simply work or scale anymore.  There are various organizational models that one could employ from the standpoint of developing analytical maturity. These ranging from a shared service to a line of business led approach. An approach that I have seen work very well is to build a Center of Excellence (COE) to create contextual capabilities, best practices and rollout strategies across the larger organization. These modern data applications make Big Data and data science ubiquitous. Rather than back-shelf tools for the occasional suspicious transaction or period of market volatility, these applications can help financial firms incorporate data into every decision they make. They can automate data mining and predictive modeling for daily use, weaving advanced statistical analysis, machine learning, and artificial intelligence into the bank’s day-to-day operations.
  3. Drive Automation across lines of business  – Financial services are fertile ground for business process automation, since most banks across their various lines of business are simply a collection of core and differentiated processes. Examples are consumer banking (with processes including on boarding customers, collecting deposits, conducting business via multiple channels, and compliance with regulatory mandates such as KYC and AML); investment banking (including straight-through-processing, trading platforms, prime brokerage, and compliance with regulation); payment services; and wealth management (including modeling model portfolio positions and providing complete transparency across the end-to-end life cycle). The key takeaway is that driving automation can result not just in better business visibility and accountability on behalf of various actors. It can also drive revenue and contribute significantly to the bottom line.It enables enterprise business and IT users to document, simulate, manage, automate and monitor business processes and policies. It is designed to empower business and IT users to collaborate more effectively, so business applications can be changed more easily and quickly.
  4. Adopt Open Source  -Open Source while being some what of an unknown challenge  to the mass middle market enterprise represents also a tremendous opportunity at most Banks & FinTechs across the spectrum of Financial Services. As one examines business imperatives & use-cases across the seven key segments (Retail & Consumer banking, Wealth management, Capital Markets,Insurance, Credit Cards & Payment processing, Stock Exchanges and Consumer Lending) it is clear that SMAC (Social, Mobile, Analytics, Cloud and Data) stacks can not just satisfy existing use-cases in terms of cost & satisfying business requirements across a spectrum but also help adopters build out Blue Oceans (i.e new markets). Segments of open source include the Linux OS, Open Source Middleware, Databases and Big Data ecosystem. Technologies like these have disrupted proprietary closed source products ranging from popular UNIX variants, Application Platforms & EDWs, RDBMS’s etc.
  5. Understand the Customer – Across Retail Banking, Wealth Management, Capital Markets, a unified view of the customer journey is at the heart of the bank’s ability to promote the right financial product, recommend a properly aligned portfolio products, keep up with evolving preferences as the customer relationship matures and accurately predict future revenue from a customer. But currently most retail, investment banks and corporate banks lack a comprehensive single view of their customers. Due to operational silos, each department has a limited view of the customer across multiple channels. These views are typically inconsistent, vary quite a bit and result in limited internal collaboration when servicing customer needs. Leveraging the ingestion and predictive capabilities of a Big Data based platform, banks can provide a user experience that rivals Facebook, Twitter or Google and provide a full picture of customer across all touch points.

Recommendations – 

Developing a strategic mindset to digital transformation  should be a board level concern. This entails

  • To begin with – ensuring buy in & commitment in the form of funding at a Senior Management level. This support needs to extend across usecases in the entire value chain
  • Extensive but realistic ROI (Return On Investment) models built during due diligence with periodic updates for executive stakeholders
  • On a similar note, ensuring buy in using a strategy of co-opting & alignment with Quants and different high potential areas of the business (as covered in the usecases in the last blog)
  • Identifying leaders within the organization who can not only lead digital projects but also create compelling content to evangelize the use of predictive analytics
  • Begin to tactically bake in or embed data science capabilities across different lines of business and horizontal IT
  • Slowly moving adoption to the Risk, Fraud, Cybersecurity and Compliance teams as part of the second wave of digital. This is critical in ensuring that analysts across these areas move from a spreadsheet intensive model to adopting advanced statistical techniques
  • Creating a Digital Analytics COE (Center of Excellence) that enable cross pollination of ideas across the fields of statistical modeling, data mining, text analytics, and Big Data technology
  • Ensuring that issues related to data privacy,audit & compliance have been given a great deal of forethought
  • Identifying  & developing human skills in toolsets (across open source and closed source) that facilitate adapting to data lake based architectures. A large part of this is to organically grow the talent pool by instituting a college recruitment process

Summary – 

I have found myself spending the vast majority of my career working with a range of marquee financial services, healthcare, business services & Telco clients. More often than not, a vast percentage of these strategic discussions have centered around business transformation, enterprise architecture and overall strategy around Open Source initiatives & technology.

Global Banking is at an inflexion point, there is now an emerging sense of urgency in mainstream Financial Services organizations to create and expand on their Digital Transformation strategy.In the last few years, more and more of the technology oriented discussions have been focused around Cloud Computing, DevOps, Mobility & Big Data.

The prongs of digital range from Middleware to BPM to Cloud Computing (IaaS/PaaS/SaaS) to the culture & DevOps practices.

The rise of Open Standards and Open APIs have been the catalyst in the digital disruption. Neglect them at your peril.

Cybersecurity and the Next Generation Datacenter..(2/4)

The first blog of this four part series introduced a key business issue in the Digital Age – Cybersecurity. We also briefly touched upon responses that are being put in place by Corporate Boards. This part two focuses on technology strategies for enterprises to achieve resilience in the face of these attacks. The next post – part three – will focus on advances in Big Data Analytics that provide advanced security analytics capabilities. The final post of the series will focus on the business steps that Corporate boards, Executive & IT leadership need to adopt from a governance & strategy standpoint to protect & insulate their businesses from the constant firehose of cyber attacks.

Growing reliance on IT breeds Cyber Insecurity – 

The increased reliance on on information technology to help run businesses, their supply chains and consumer facing applications has led to a massive increase cyber risk. Given that most organizations are increasingly allowing employees to remotely access critical systems, the need to provide highly secure computing capabilities has become more pronounced. This IT infrastructure ranges from systems that store sensitive customer information, financial data etc has lead to an entire industry segment for network and computer security. This also has led to the creation of a burgeoning market of security experts across a range of cyber segments to tailor solutions to fit the operating requirements of respective client organizations.

 The core domains of Cyber Defense –

A fact of life facing the CIO & CISO (Chief Information Security Officer) or an IT manager is that every enterprise corporate datacenter is currently a mishmash of existing legacy technology investments. These range from antiquated proprietary software, some open source investments, proprietary server, rack & networking architectures. The people & software process piece is then added on all of these by incorporating custom architecture tools and governance processes.

Layered across & within these are the typical security tools, frameworks and approaches that are employed commonly.

The important functional areas of Cybersecurity are listed below –

  • Intrusion detection systems (IDS)
  • Firewalls
  • Application Security leveraging Cryptography
  • Data Security
  • System administration controls,
  • Server & Workstation security
  • Server Management Procedures (Patching, Updating etc),
  • Incident Response,
  • Data Protection
  • Identity and Access Management (IAAM) etc. These tools are also commonly extended to endpoint devices like laptops and mobile clients etc.

While these are all valid & necessary investments, security as a theme in IT is almost always an afterthought across the four primary technology domains in the datacenter – Infrastructure, Platforms, Data & Management.

Why is that?

From an IT philosophy & culture standpoint, security has historically been thought of as a Non Functional Requirement (or an “illity”) or a desirable or additive feature. As a result most high level executives as well as IT personnel & end users have come to regard security as a process of running through checklists by installing cumbersome client tools, malware scanners as well as conforming with periodic audits.

However recent hack attacks at major financial institutions as discussed in the first post in this series – http://www.vamsitalkstech.com/?p=1265, have brought to fore the need to view Cybersecurity & defense as an integral factor in the IT lifecycle. Not just an integral factor but a strategic component while building out applications & datacenter architectures that host them.

Datacenter complexity breeds Cyber Insecurity – 

Information Technology as an industry is really only 30+ years old as compared to architecture or banking or manufacturing or healthcare which have existed as codified bodies of knowledge for hundreds of years. Consequently the body of work on IT still evolves and continues to do so at a rapid clip. Over the last couple of decades, computing architectures have evolved from being purely mainframe based in the 1960s & 70s to a mishmash of few hundred Unix servers running the entire application portfolio of a major enterprise like a financial institution in the 1980s.

Fast forward to 2016, a typical Fortune 500 enterprise now runs multiple data centers with each hosting hundreds of thousands of Linux & Windows based servers either bare metal or virtualized, high end mainframes, legacy Unix systems etc. The sum total of the system images can run into tens of operating systems alone. When one factors in complex n-tier applications themselves along with packaged software (databases, application servers, message oriented middleware,business process management systems, ISV applications and systems utilities etc), the number of unique systems runs into 100,000 or more instances.

This complexity adds significantly to management tasks (maintaining, updating, patching servers) as well as the automation factor needed to derive business value at scale.

Security challenges thus rise manifold in the typical legacy technology dominated data center.

The top five day to day challenges from a security standpoint include –

  • Obtaining a ‘single pane of glass‘ view from a security standpoint across the zones in the infrastructure
  • Understanding and gaining visibility in realtime across this complex infrastructure
  • Staying ahead of rapidly moving exploits like the Heartbleed virus, the Shellshock bash vulnerability etc. The key point is that ensuring that all vulnerable systems are instantly patched
  • Understanding what platforms and systems are hosting applications that have been designated as “Non Compliant” for various reasons – legacy applications that are no longer maintained, out of support or unsupported software stacks which are way behind on patch levels etc
  • Proactively enforcing policies around security compliance and governance. These run the gamut from server patch policies, hardware configurations tailored to security zones e.g. a server with too many NICs in a DMZ or applications that did not have the correct & certified version of an application.

Datacenter Architecture built for Cybersecurity – 

Can there be a data center architecture that is optimized for Cybersecurity from the ground up?

I contend that at a high level, four critical tiers and planes underlie every typical corporate information technology architecture.

These are  –

  1. Infrastructure tier – where Storage, Compute & Network provisioning reside
  2. Data tier – the sum total of all data assets including OLTP systems, Big Data
  3. Application/Services tier – applications composed of services or microservices
  4. Management plane – which maintains the operator and admins view

Open source and Cybersecurity – Like two peas in a pod – 

Open source technology choices across the above layers provide the highest security benefits. Open source platforms are maintained by the highest number of varied contributors that removes the dependence on any one organization as a source of security support. The open development model ensures that hordes of developers – both corporate & hobbyists, agree on standards while constantly testing and improving platforms. For example, platforms ranging from Red Hat Linux to Open Middleware to Hadoop to OpenStack have received the highest security ratings in their respective categories. All of the above platforms have the highest release velocity, rate of product updates & security fixes.

There has been a perception across the industry that while open source frameworks and platforms are invaluable for developers, they are probably not a good fit for IT operations teams who need a mix of highly usable management consoles as well as scripting facilities & monitoring capabilities. However, open source projects have largely closed this feature gap and then some over the last five years. Robust and mature open source management platforms now span the gamut across all the above disciplines as enumerated below.

  • OS Management – Systems Management Consoles
  • Application Middleware – end to end application deployment, provisioning & monitoring toolsets
  • Big Data & Open Source RDBMS – Mature consoles for provisioning, managing, and monitoring clusters
  • Cloud Computing – Cloud Management Platforms

The below illustration captures these tiers along with specifics on security; lets examine each of the tiers starting from the lowest –

CyberSecurity_DC_Arc

         Illustration: Next generation Data Center with different technology layers 

Infrastructure Tier

The next generation way of architecting infrastructure is largely centered around Cloud Computing. A range of forward looking institutions are either deploying or testing cloud-based solutions that span the full range of cloud delivery models – whether private or public or a hybrid mode.

Security and transparency are best enabled by a cloud based infrastructure due to the below reasons.

  • highly standardized application & OS stacks with enables seamless patching across tiers
  • Workload isolation by leveraging virtualization or containers
  • Highest levels of deployment automation
  • The ability of cloud based stacks to scale up at an instant to handle massive amounts of streaming data

Cloud computing provides three main delivery models (IaaS, PaaS & SaaS).

  • IaaS (infrastructure-as-a-service) to provision compute, network & storage,
  • PaaS (platform-as-a-service) to develop applications &
  • exposing their business services as  SaaS (software-as-a-service) via APIs.

There are three broad options while choosing Cloud a based infrastructure –

  1. Leveraging a secure public cloud  (Amazon AWS or Microsoft Azure) or
  2. An internal private cloud (built on OpenStack etc)
  3. A combination of the two i.e a hybrid approach is a safe and sound bet for any new or greenfield applications.

In fact many vendors now offer cloud based security products which offer a range of services from malware detection to monitoring cloud based applications like Google’s suite of office applications, Salesforce etc.

Data Tier – 

While enterprise data tiers are usually composed of different technologies like RDBMS, EDW (Enterprise Data Warehouses), CMS (Content Management Systems) & Big Data etc. My recommendation for the target state is largely dominated by appropriate technologies for the appropriate usecase. For example a Big Data Platform powered by Hadoop is a great fit for data ingest, processing & long term storage. EDW’s shine at reporting use cases & RDBMS’s at online transaction processing. Document Management Systems are fantastic at providing business document storage, retrieval etc. All of these technologies can be secured for both data in motion and at rest.

Given the focus of the digital wave in leveraging algorithmic & predictive analytics capabilities in create tailored & managed consumer products  – Hadoop is a natural fit as it is fast emerging as the platform of choice for analytic applications.  

Big Data and Hadoop make security comparatively easy to bake in as compared to a silo’ed approach due to the below reasons –  

  1. Hadoop’s ability to ingest and work with all the above kinds of data & more (using the schema on read method) has been proven at massive scale. Operational data stores are being built on Hadoop at a fraction of the cost & effort involved with older types of data technology (RDBMS & EDW). Since the data is all available in one place, it makes it much more easier to perform data governance & auditing
  2. The ability to perform multiple types of security processing on a given data set. This processing varies across batch, streaming, in memory and realtime which greatly opens up the ability to create, test & deploy closed loop analytics quicker than ever before. In areas like security telemetry where streams of data are constantly being generated,  ingesting high volume data at high speeds and sending it to various processing applications for computation and analytics – is key
  3. The DAS (Direct Attached Storage) model that Hadoop provides fits neatly in with the horizontal scale out model that the services, UX and business process tier leverage in a cloud based architecture.
  4. The ability to retain data for long periods of time thus providing security oriented applications with predictive models that can reason on historical data
  5. Hadoop provides the ability to run a massive volumes of models in a very short amount of time helps with modeling automation

Techniques like Machine Learning, Data Science & AI feed into core business processes thus improving them. For instance, Machine Learning techniques support the creation of self improving algorithms which get better with data thus making accurate cyber security & other business predictions. Thus, the overarching goal of the analytics tier should be to support a higher degree of automation by working with the business process and the services tier. Predictive Analytics can be leveraged across the value chain of Cybersecurity & have begun to find increased rates of adoption with usecases ranging from behavior detection to telemetry data processing.

Services Tier

A highly scalable, open source & industry leading platform as a service (PaaS) is recommended as the way of building out and hosting this tier. A leading PaaS technology , (e.g. Red Hat’s OpenShift), is hardened constantly for process, network, and storage separation for each of the tenets running on a private or public cloud. In addition, there is focus on providing intrusion detection capabilities across files, ports & potential back doors etc.

An enterpise PaaS provides the right level of abstraction for both developers and deployers to encapsulate business functinlaity as microservices. This capability is provided via it’s native support for a linux container standard like Docker that can  be hosted on either bare metal or any virtualization platform. This also has the concomitant advantage of standardizing application stacks, streamlining deployment pipelines thus leading the charge to a DevOps style of building applications which can constantly protect against new security exploits. Microservices have moved from the webscale world to fast becoming the standard for building mission critical applications in many industries. Leveraging a PaaS such as OpenShift provides a way to help cut the “technical debt” [1] that has plagued both developers and IT Ops.

Further I recommend that service designers design their micro services so that they can be deployed in a SaaS paradigm – which usually implies taking an API based approach. APIs promotes security from the get-go due to their ability to expose business oriented functionality depending on the end users permission levels.

Further, APIs enable one to natively build or to integrate security features into the applications themselves  – via simple REST/SOAP calls. These include APIs for data encryption, throttling traffic from suspect consumers, systems behavior monitoring &  integration with Identity & Access Management Systems etc.

A DevOps oriented methodology is recommended in building applications in the following ways –

  • Ensuring that security tooling is incorporated into development environments
  • Leveraging resiliency & recoverability tools like ChaosMonkey (which is part of the Netflix Simian Army project) etc to constantly test systems for different kinds of vulnerabilities (e.g abnormal conditions, random errors, massive amounts of traffic etc) from Day 1
  • Promoting horizontal scaling and resilience by testing live application updates, rollbacks etc
  • Leveraging a Cloud, OS & development language agnostic style of application development

 

User Experience Tier – 

The UX (User Experience) tier fronts humans – clients. partners, regulators, management and other business users across all touch points. The UX tier interacts closely APIs  provided for partner applications and other non-human actors to interact with business service tier. Data and information security are key priorities at this layer offers secure connectivity to backend systems.  Data is transmitted over a secure pipe from device to backend systems across business applications.

The UX tier has the following global security responsibilities  – 

  1. Provide a consistent security across all channels (mobile, eBanking, tablet etc) in a way that is a seamless and non-siloed. The implication is that clients should be able to begin a business transaction in channel A and be able to continue them in channel B where it makes business sense where security is carried forth across both channels.
  2. Understand client personas and integrate with the business & predictive analytic tier in such a way that the UX is deeply integrated with the overall security architecture
  3. Provide advanced visualization (wireframes, process control, social media collaboration) that integrates with single sign on(SSO) & cross partner authentication
  4. The UX should also be designed is such a manner that it’s design, development & ongoing enhancement follow an Agile & DevOps methodology

The other recommendation for remote clients is to leverage desktop virtualization. In this model, the user essentially uses a device (a laptop or a terminal or a smartphone) that performs zero processing in that it just displays a user interface or application (ranging from the simple to the complex – a financial application, or office tools, document management user interface etc) delivered from a secure server over a secure connection. These clients known as zero clients are highly secure as they run a golden uncompromisable image run from a highly protected central server. These also have a smaller attack surface.

How to embed Cybersecurity into the infrastructure –  

How do all of the above foundational technologies (Big Data, UX,Cloud, BPM & Predictive Analytics) help encourage a virtuous cycle?

This cycle needs to be accelerated helping the creation of a learning organization which can outlast competition by means of a culture of unafraid experimentation and innovation.

  1.  The Architecture shall support small, incremental changes to business services & data elements based on changing business requirements which include Cybersecurity
  2. The Architecture shall support standardization across application stacks, toolsets for development & data technology to a high degree
  3. The Architecture shall support the creation of a user interface that is highly visual and feature rich from a content standpoint when accessed across any device
  4. The Architecture shall support an API based model to invoke any interaction – by a client or an advisor or a business partner
  5. The Architecture shall support the development and deployment of an application that encourages a DevOps based approach
  6. The Architecture shall support the easy creation of scalable business processes that natively emit security metrics from the time they’re instantiated to throughout their lifecycle

Summary

My goal in this post was to convince enterprise practitioners to shed their conservatism in adopting new approaches in building out applications & data center architectures. The inherent advantage in using Cloud, Big Data & Realtime analytics is that security can been intrinsically built into infrastructure.

This post makes no apologies about being forward looking. Fresh challenges call for fresh approaches and a new mindset.

References

[1] https://en.wikipedia.org/wiki/Technical_debt

[2] http://venturebeat.com/2014/01/30/top-ten-saas-security-tools/

Cybersecurity – The biggest threat to the Digital Economy..(1/4)

We believe that data is the phenomenon of our time. It is the world’s new natural resource. It is the new basis of competitive advantage, and it is transforming every profession and industry. If all of this is true – even inevitable – then cyber crime, by definition, is the greatest threat to every profession, every industry, every company in the world.” – IBM Corp’s Chairman & CEO Ginny Rometty, Nov 2015, NYC

The first blog of this four part series will focus on the cybersecurity challenge across industry verticals while recapping some of the major cyber attacks in the previous years. We will also discuss what responses are being put in place by Corporate Boards. Part two of this series will focus on strategies for enterprises to achieve resilience in the face of these attacks – from a technology stack standpoint. Part three will focus on advances in Big Data Analytics that provide advanced security analytics capabilities. The final post of the series will focus on the steps corporate boards, exec leadership & IT leadership needs to adopt from a governance & strategy standpoint to protect their organizations from this constant onslaught.

The Cybersecurity Challenge – 

This blog has from time to time, noted the ongoing digital transformation across industry verticals. For instance, banking organizations are building digital platforms that aim to engage customers, partners and employees. Banks now recognize that the key to win the customer of the future is to offer seamless experience across billions of endpoints. Healthcare providers want to offer their stakeholders – patients, doctors,nurses, suppliere etc with multiple avenues to access contextual data and services; the IoT (Internet of Things) domain is abuzz with the possibilities of Connected Car technology.

However, the innate challenge across all of the above scenarios is that the surface area of exposure across all of these assets exponentially rises. This rise increases security risks – risk of system compromise, data breach and worse system takeover.

A cursory study of the top data breaches in 2015 reads like a “Who’s Who” of actors in society across Governments, Banks, Retailers, Health providers etc. The world of business now understands that an comprehensive & strategic approach to cybersecurity is now far from being a cursory IT challenge a few years ago to a board level concern.

The top two business cyber-risks are data loss & the concomitant disruption to smooth operations.  The British insurance major Lloyd’s estimates that cyber attacks cost businesses as much as $400 billion a year, which includes direct damage plus post-attack disruption to the normal course of business. Vendor and media forecasts put the cybercrime figure as high as $500 billion and more.[1]

The word Cybersecurity was not as highly popular in the popular IT lexicon a few years ago as it is now. Cybersecurity and cybercrime have become not only a nagging but also an existential threat to enterprises across a whole range of verticals – retail, financial services, healthcare and government. The frequency and sophistication of these attacks have also increased in number year after year.

For instance, while the classical cybercriminal of a few years ago would target a Bank or a Retailer or a Healthcare provider but things have evolved nowadays as technology has opened up new markets. As an illustration of the expanding challenge around security – there are now threats emerging around automobiles i.e protecting cars from being taken over by cyber attackers. Is this borne out by industry research? Yes..

ABI Research forecasted that by 2020, we will have more than 20 million connected & inter communicating cars & other automobiles with Internet of Anything (IoAT) data flow capabilities[3]. The key concern is not just about securing the endpoints (the cars) themselves but the fact that the data flows into a corporate datacenter where is harnessed for business uses such as preventative maintenance, assisting in new product development, manufacturing optimization and even with recall avoidance etc. The impact and risk of the threat then become magnified as they both extend across the value chain along with data & information flows.

OnlineBreaches                                          Illustration: Largest Hacks of 2014 (source – [2])

The biggest cyberattacks of recent times include some of the below –

  • Home Depot – 109 million user records stolen
  • JP Morgan Chase – 83 million user records compromised
  • Sony Pictures Entertainment – 47k records stolen with significant loss of intellectual property

Cybersecurity – A Board level concern – 

The world of business is now driven by complex software & information technology. IT is now enterprise destiny. Given all of this complexity across global operating zones, perhaps no other business issue has the potential to result in massive customer drain, revenue losses, reputational risks & lawsuits from affected parties as do breaches in Cybersecurity. A major breach in security is a quick gamechanger and has the potential to put one in defensive mode for years.

Thus, Corporate Boards which have been long insulated from technology decisions now want to understand from their officers how they’re overseeing, and mitigating cyber security. Putting into place an exemplary program that can govern across a vast & quickly evolving cybersecurity threat landscape is a vital board level responsibility. The other important point to note is the interconnected nature of these business ecosystems implies the need for external collaboration as well as a dedicated executive to serve as a Cyber czar.

Enter the formal role of the CISO (Chief Information Security Officer)….

The CISO typically heads an independent technology and business function with a dedicated budget & resources. Her or his mandate extends from physical security (equipment lockdown, fob based access control etc_ to setting architectural security standards for business applications as well as reviewing business processes. One of the CISO’s main goals is standardize the internal taxonomy of cyber risk and to provide a framework for quantifying these risks across a global organization.

Cyber Threat is magnified in the Digital Age – 

As IBM’s CEO states above – “Data is the phenomenon of our time.”  Enterprise business is built around data assets and data is the critical prong of any digital initiative. For instance, Digital Banking platforms & Retail applications are evolving to collections of data based ecosystems. These  need to natively support loose federations of partner applications, regulatory applications which are API based & Cloud native. These applications are majorly microservice architecture based & need to support mobile clients from the get go. Owing to their very nature in that they support massive amounts of users & based on their business priority, these tend to take a higher priority in the overall security equation .

It must naturally follow that more and more information assets are at danger of being targeted by extremely well funded and sophisticated adversaries ranging from criminals to cyber thieves to hacktivists.

Cybersecuity_Vectors

                       Illustration – Enterprise Cybersecurity Vectors

How are Enterprises responding? – 

The PwC Global State of Information Security Survey (GSISS) for 2015 has the following key findings [4]. These are important as we will use expand on some of these themes in the following posts –

  • An increased adoption in risk based security frameworks. E.g ISO 27001, the US National Institute of Standards and Technology (NIST) Cybersecurity Framework and SANS Critical Controls. These frameworks offer a common vocabulary, a set of guidelines that enable enterprises to  identify and prioritize threats, quickly detect and mitigate risks and understand security gaps.
  • Increased adoption of cloud based security platforms. Cloud Computing has emerged as an advanced method of deploying data protection, network security and identity & access management capabilities. These enable enterprises to improve threat intelligence gathering & modeling thus augmenting their ability to block attacks as well as to accelerate incident responses.
  • The rapid rise and adoption of Big Data analytics –  The drive to a data driven approach can help organizations shift their focus away from pure perimeter based defense to ensuring that realtime data streams can be analyzed as well as combined with historical data to drive security analytics. A data-driven approach can shift enterprises away from a predominantly perimeter-based defence strategy and enable enterprises to put real-time information to use in ways that can help predict cybersecurity incidents. Data-driven cybersecurity allows companies to better understand anomalous network activity and more quickly identify and respond to cybersecurity incidents. Big Data is being combined with existing security information and event management (SIEM) technologies to generate holistic views of network activity. Other usecases include the use of data analytics for insider threat surveillance.
  • A huge increase in external collaboration on cybersecurity working with industry peers as well as law enforcement, government agencies as well as Information Sharing and Analysis Centers (ISACs).
  • The emergence of Cyber insurance as one of the fastest growing sectors in the insurance market, according to  PwC [3].Cybersecurity insurance is designed to mitigate business losses that could occur from a variety of cyber incidents, including data breaches. This form of insurance should be factored into more and more Enterprise Risk Management programs.

Thus far, Enterprises are clearing waking to the threat and spending big dollars on cybersecurity. According to Gartner, worldwide spending on information security in 2015 reached $75 billion, an increase of 4.7​%​ over 2014[1]. However it needs to be noted that Cybersecurity compliance comes at a huge cost both in terms of manpower as well as the amount of time needed to certify projects as being compliant with a set of standards – both of which lead to delays in time and a rise in costs.

All said, the advantage remains with the attackers – 

The key issue here is that the attackers need to succeed only once as compared to the defenders. Important factors like technology sophistication,the number of attack vectors ensure that the surface area of exposure as well remains high. This ensures that the advantage lies with the cyber attacker and will do so for the foreseeable future.

Summary – 

Given all of the above, the five important questions Corporate leaders, CXO’s & industry practitioners need to ask of themselves  –

  1. First and foremost, can an efficient security infrastructure not only be a defensive strategy but also a defining source of competitive advantage ?
  2. The ideal organizational structure and processes that need to be put in place to ensure continued digital resilience in the face of concerted & sophisticated attacks
  3. Can the above (#2) be navigated without hindering the pace of innovation? How do we balance both?
  4. Given that most cyber breaches are long running in nature – where systems are slowly compromised over months. How does one leverage Cloud Computing, Big Data and Predictive Modeling to rewire applications with any security flaws?
  5. Most importantly, how can applications implement security in a manner that they constantly adapt and learn? How can the CISO’s team influence infrastructure, application & data development standards & processes? 

The next post will examine the answers to some of these questions but from a technology standpoint.

References:

  1. Cybersecurity ventures – “The Cybersecurity market report Q1 2016”
  2. Gemalto “Cybersecurity Breach Level Index for 2014”
  3. Forbes Magazine – “Cybersecurity Market Expected to Reach 75 billion by 2015” – Steve Morgan
  4. PwC Global State of Information Security Survey 2016 (GSIS)