Why Data Garbage-In means Analytics Garbage-Out..

This is the third in a series of blogs on Data Science that I am jointly authoring with Maleeha Qazi, (https://www.linkedin.com/in/maleehaqazi/). We have previously covered some of the inefficiencies that result from a siloed data science process @ http://www.vamsitalkstech.com/?p=5046 & the ideal way Data Scientists would like their models deployed for the maximal benefit and use – as a Service @ http://www.vamsitalkstech.com/?p=5321. As the name of this third blog post suggests, the success of a data science initiative depends on data. If the data going into the process is “bad” then the results cannot be relied upon. Our goal is to also suggest some practical steps that enterprises can take from a data quality & governance process standpoint. 

However, under the strong influence of the current AI hype, people try to plug in data that’s dirty & full of gaps, that spans years while changing in format and meaning, that’s not understood yet, that’s structured in ways that don’t make sense, and expect those tools to magically handle it. ” – Monica Rogati (Data Science Advisor and ex-VP  Jawbone – 2017) 

Image Credit – The Daily Omnivore

Introduction

Different posts in this blog have discussed Data Science and other Analytical approaches to some degree of depth. What is apparent is that whatever the kind of analytics – descriptive, predictive, or prescriptive – the availability of a wide range of quality data sources is key. However, along with volume and variety of data, the veracity, or the truth, in the data is as important. This blog post discusses the main factors that determine the quality of data from a Data Scientist’s perspective.  

The Top Issues of Data Quality

As highlighted in the above illustration, the top quality issues that data assets typically face are the following:

  1. Incomplete Data: The data provided for analysis should span the entire cross-section of known data about how the organization views its customers and products. This would include data generated from various applications that belong to the business, and external data bought from various vendors to enriched the knowledge base. The completeness criteria measures if all of the information about entities under consideration is available and useable.
  2. Inconsistent & Inaccurate Data: Consistency measures what data values give conflicting information and must be fixed. It also measures if all the data elements conform to specific and uniform formats and are stored in a consistent manner. Inaccurate data either has duplicate, missing or erroneous values. It also does not reflect an accurate picture of the state of the business at the point in time it was pulled.
  3. Lack of Data Lineage & Auditability: The data framework needs to support audit-ability, i.e provide an audit trail of how the data values were derived from source to analysis point; the various transformations performed on it to arrive at the data set being considered for analysis.
  4. Lack of Contextuality: Data needs to be accompanied by meaningful metadata – data that describes the concepts within the dataset.
  5. Temporally Inconsistent: This measures if the data was temporally consistent and meaningful given the time it was recorded.

What Business Challenges does Poor Data Quality Cause…

Image Credit – DataMartist

Data Quality causes the following business challenges in enterprises:

  1. Customer dissatisfaction: Across industries like Banking, Insurance, Telecom & Manufacturing, the ability to get a unified view of the customer & their journey is at the heart of the enterprise’s ability to promote relevant offerings & detect customer dissatisfaction. Currently, most industry players are woeful at putting together this comprehensive Single View of their Customers (SVC). Due to operational silos, each department possesses its own siloed & limited view of the customer across multiple channels. These views are typically inconsistent, lack synchronization with other departments, & miss a high amount of potential cross-sell and upsell opportunities. This is a data quality challenge at its core.
  2. Lost revenue: The Customer Journey problem has been an age-old issue which has gotten exponentially more complicated over the last five years as the staggering rise of mobile technology and the Internet of Things (IoT) have vastly increased the number of enterprise touch points that customers are exposed to in terms of being able to discover and purchase new products/services. In an OmniChannel world, an increasing number of transactions are being conducted online. In verticals like the Retail industry and Banking & Insurance industries, the number of online transactions conducted approaches an average of 40%. Adding to the problem, more and more consumers are posting product reviews and feedback online. Companies thus need to react in real-time to piece together the source of consumer dissatisfaction.
  3. Time and cost in data reconciliation: Every large enterprise nowadays runs expensive data re-engineering projects due to their data quality challenges. These are an inevitable first step in other digital projects which cause huge cost and time overheads.
  4. Increased time to market for key projects: Poor data quality causes poor data agility, which increases the time to market for key projects.
  5. Poor data means suboptimal analytics: Poor data quality causes the analytics done using it to be suboptimal – algorithms will end up giving wrong conclusions because the input provided to them is incorrect at best & inconsistent at worst.

Why is Data Quality a Challenge in Enterprises

Image Credit – DataMartist

The top reasons why data quality has been a huge challenge in the industry are:

  1. Prioritization conflicts: For most enterprises, the focus of their business is the product(s)/service(s) being provided, book-keeping is a mandatory but secondary concern. And since keeping the business running is the most important priority, keeping the books accurate for financial matters is the only aspect that gets most of the technical attention it deserves. Other data aspects are usually ignored.
  2. Organic growth of systems: Most enterprises have gone through a series of book-keeping methods and applications, most of which have no compatibility with one another. Warehousing data from various systems as they are deprecated, merging in data streams from new systems, and fixing data issues as these processes happen is not prioritized till something on the business end fundamentally breaks. Band-aids are usually cheaper and easier to apply than to try and think ahead to what the business will need in the future, build it, and back-fill it with all the previous systems’ data in an organized fashion.
  3. Lack of time/energy/resources: Nobody has infinite time, energy, or resources. Doing the work of making all the systems an enterprise chooses to use at any point in time talk to one another, share information between applications, and keep a single consistent view of the business is a near-impossible task. Many well-trained resources, time & energy is required to make sure this can be setup and successfully orchestrated on a daily basis. But how much is a business willing to pay for this? Most do not see short-term ROI and hence lose sight of the long-term problems that could be caused by ignoring the quality of data collected.
  4. What do you want to optimize?: There are only so many balls an enterprise can have up in the air to focus on without dropping one, and prioritizing those can be a challenge. Do you want to optimize the performance of the applications that need to use, gather and update the data, OR do you want to make sure data accuracy/consistency (one consistent view of the data for all applications in near real-time) is maintained regardless? One will have to suffer for the other.

How to Tackle Data Quality

Image Credit – DataMartist

                                                   

With the advent of Big Data and the need to derive value from ever increasing volumes and a variety of data, data quality becomes an important strategic capability. While every enterprise is different, certain common themes emerge as we consider the quality of data:

  1. The sheer number of transaction systems found in a large enterprise causes multiple challenges across the data quality dimensions. Organizations need to have valid frameworks and governance models to ensure the data’s quality.
  2. Data quality has typically been thought of as just data cleansing and fixing missing fields. However, it is very important to address the originating business processes that cause this data to take multiple dimensions of truth. For example, centralize customer onboarding in one system across channels rather than having every system do its own onboarding.
  3. It is clear from the above that data quality and its management is not a one time or siloed application exercise. As part of a structured governance process, it is very important to adopt data profiling and other capabilities to ensure high-quality data.

Conclusion

Enterprises need to define both quantitative and qualitative metrics to ensure that data quality goals are captured across the organization. Once this is done, an iterative process needs to be followed to ensure that a set of capabilities dealing with data governance, auditing, profiling, and cleansing is applied to continuously ensure that data is brought up to, and kept at, a high standard. Doing so can have salubrious effects on customer satisfaction, product growth, and regulatory compliance.

The Deployment Architecture of an Enterprise API Management Platform..

We discussed the emergence of Application Programming Interfaces (APIs) as providing a key business capability in Digital Platforms @ http://www.vamsitalkstech.com/?p=3834. The next post then discussed the foundational technology, integration & governance capabilities that any Enterprise API Platform must support @ http://www.vamsitalkstech.com/?p=5102.  This final post in the API series will discuss a deployment model for an API Management Platform.

Background..

The first two posts in this series discussed the business background to API Management and the need for an Enterprise API Strategy. While details will vary across vendor platforms, the intention of this post is to discuss key runtime components of an API management platform and the overall developer workflow in creating APIs & runtime workflow to that enables client applications to access them.

Architectural Components of an API Management Platform..

The important runtime components of an API management platform are depicted in the below illustration. Note that we have abstracted out network components (firewalls, reverse proxies, VLANs, switches etc) as well as the internal details of application architecture which would normally be impacted by an API Platform.

The major components of an API Management Platform and the request flow across the architecture.

Let us cover the core components of the above:

  1. API Gateway -The API Gateway has emerged as the dominant deployment artifact in API Architectures. As the name suggests Gateways are based on a facade design pattern. The Gateway (or typically a set of highly available Gateways) acts as a proxy to traffic between client applications (used by customers, partners and employees) and back end services (ranging from mainframes to microservices). The Gateway is essentially an appliance or a software process that abstracts all API traffic into an organization and exposes business capabilities typically via a REST interface. Clients are exposed different views of the same API – coarse grained or granular – depending on the kind of client application (thick/thin) and access control permissions.  Gateways include protocol translation and request routing as their core functionality. It is also not uncommon to deploy multiple Gateways – in an internal and external fashion – depending on business requirements in terms of partner interactions etc. Gateways also include functionality such as caching requests for performance, load balancing, authentication, serving static content etc. The API Gateway can thus be managed using a set of policy controls. Performance characteristics such as throughput, scalability, caching, load balancing and failover are managed using a cluster of API Gateways.  The introduction of an API Gateway also ensures that application design is impacted going forward. API Gateways can be implemented in many forms – as a software platform or as an appliance. Public cloud providers have also begun offering mature API Gateways that integrate well with a range of backend services that they provide both from an IaaS and a PaaS standpoint. For instance, Amazon’s API Gateway integrates natively with AWS Lambda and EC2 Container Service for microservice deployments on AWS.
  2. Security -Though it is not a standalone runtime artifact, Security ends to be called out as one of the most important logical requirements of an API Management platform. APIs have to follow the same access control mechanisms, security constrains for different user roles etc as their underlying datasources. This is key as backend applications and organizational data need to be protected from a variety of targets – denial of service attacks, malware, access control violations etc. Accordingly, policy based protection using API keys, JSON/XML signature scanning & threat protection, encryption for Data in motion and at rest, OAuth support etc – all need to be provided as standard features.
  3. Developer portal -A Developer portal is the entry point for developers and can also serve as a developer onboarding tool. Thus, typically it is a web based portal integrated with the API Gateway. Developers use the portal to study API specs, download SDKs for different programming languages, register their APIs and to monitor their API performance. It also provides a visual interface to help developers build/test their APIs and also provides support for a high degree of automation using a continuous delivery model. For internal developers, the ability to provide self service consumption of API developer stacks (Node.js/ JavaScript frameworks/Java runtimes/ PaaS integration etc) is a highly desirable capability.
  4. Management and Monitoring -Ensuring that the exposed APIs are maintaining their QOS (Quality of Service) as helping admins monitor their quota of resource consumption is key from a Operations standpoint. Further, the M&M functionality should also aid operators in resolving complex systems issues and ensuring a high degree of availability during upgrades etc.
  5. Billing and Chargeback -Here we refer to the ability to tie in the usage of APIs to back office applications that can charge users based on their metered usage of the backend applications. This is typically provided through logging and auditing capability.
  6. Governance -From a Governance standpoint, the ability to track APIs across their lifecycle,  a handy catalog of available APIs, an ability to audit their usage and the underlying assets they expose and the ability for business to set policies on their usage etc.

API Design Process..

Most API Platforms provide a developer toolkit with varying degrees of integration with a runtime platform. Handy SKDs for iOS, Android and Javascript development are provided.

An internal developer uses the developer toolkit (e.g. Eclipse with an offline plugin) and/or an API Designer tool included with a vendor platform to create the API based on organizational policies. Extensive CLI (Command Line Interface) is also provided to perform all functions which can be done using the GUI. These include, local unit & system test capabilities and an ability to publish the tested APIs to a repository from where the runtime can access, deploy and update the APIs.

From a data standpoint, multiple databases including RDBMS, NoSQL are supported for data access. During the creation of the API, depending on whether the developer already has an existing data model in mind, the business logic is mapped closely with the data schema, or, one can also work top down to create the backend once the API interface has been defined using a model driven approach. These also include settings for security permissions with support for OAuth and any other third party authentication dependencies.

Once defined and tested, the API is published onto the runtime. During this process access control privileges, access policies and the endpoint itself are defined. The API is then ready for external consumption and discovery.

Runtime Flow Across the Architecture..

In the simplest case – once the API has been deployed and tested it is made available for public discovery and consumption. Client Applications then begin to leverage the API and this can be done in a variety of ways. For example – user interactions on mobile applications, webpages and B2B services trigger calls to the API Gateway. The Gateway performs a range of functions to process the request – from security authorization to load-balancing before accessing policies setup for that particular API. The Gateway then invokes the API by calling the backend system typically via message oriented middleware such as an ESB or a Message Broker. Once the backend responds with the appropriate payload ,the data is sent to the requesting application. Systems and Administration teams can view detailed operational metrics and logs to monitor API performance.

A Note on Security..

It should come as no surprise that the security aspect of an API Management Platform is one of the most critical aspects of the implementation. While API Security is a good subject for a followup post and too exhaustive to be covered in a short blurb – several standards such as OAuth2, OpenID Connect, JSON Security & Policy languages are all topics that need to be explored by both organizational developers and administrators.  Extensive flow mapping and scenario testing are mandated here. Also, endpoint security from a client application standpoint is key. Your Servers, Desktops, Supported Mobile devices need to be updated and secured with the latest antivirus & other standard IT Security/access control policies.

Conclusion..

In this post, we tried to highlight the major components of an API Management Platform from a technology standpoint. While there are a range of commercial & open source platforms, it is important to evaluate them from a feature standpoint as well as from an ecosystem capability perspective as developers began implementing microservices based Digital Architectures.

Why APIs Are a Day One Capability In Digital Platforms..

As enterprises embark or continue on their Digital Journey, APIs are starting to emerge as a key business capability and one that we need to discuss. Regular readers of this blog will remember that APIs are one of the common threads across the range of architectures we have discussed in Banking, Insurance and IoT et al. In this blogpost, we will discuss the five key imperatives or business drivers for enterprises embarking on a centralized API Strategy. 

Digital Platforms are composed of an interconnected range of enterprise services exposed as APIs across the Internet.

API Management as a Native Digital Capability..

The use of application programming interfaces (APIs) has been well documented across web scale companies such as Facebook, Amazon and Google et al.Over the last decade, APIs have begun emerging as the primary means for B2B and B2C companies to interact with their customers, partners and employees. The leader enterprises already have Digital Platform efforts underway as opposed to creating standalone Digital applications. Digital Platforms aim to increase the number of product and client channels of interaction so that enterprises can reach customer audiences that were hitherto untapped. The primary mode of interaction with a variety of target audiences in such digital settings are via APIs.

APIs enable the creation of new business models that can deliver differentiated experiences (source – IBM)

APIs are widely interoperable, relatively easy to create and form the front end of many internet scale platforms. APIs are leveraged to essentially access the core services provided by these platforms and can be used to create partner and customer ecosystems. Leaders such as PayPal, Amazon & FinTechs such as Square, Mint etc have overwhelmingly used APIs as a way to not only open their platforms to millions of developers but also to offer innovative services.

As of 2015, programmableweb.com estimated that over 12,000 APIs were already being offered by enterprise firms. Leaders such as Salesforce.com were generating about 50% of their revenue through APIs. Salesforce.com created a thriving marketplace – AppExchange – for apps created by its partners that work on its platform which numbered around 300 at the time of writing. APIs were contributing 60% of revenues at eBay and a staggering 90% for Expedia.com. eBay uses APIs to create additional exposure for it’s products – list auctions on other websites, get bidder information about sold items, collect feedback on transactions, and list new items for sale. Expedia’s APIs allowed customers to use third party websites to book flights, cars, and hotels. [2]

The Three Core Competencies of Digital – Cloud, Big Data & Intelligent Middleware

While most of the Fortune 500 have already begun experimenting with the value that APIs can deliver, the conversation around these capabilities needs to be elevated from an IT level to a line of business to a CIO/Head of Marketing. APIs help generate significant revenue upside while enabling rapid experimentation in business projects. Examples of API usage abound in industries like Financial Services, Telecom, Retail and Healthcare.

The Main Kinds of APIs

While the categories of APIs will vary across industry, some types of APIs have been widely accepted. The three most popular from a high level are described below –

  1. Private APIs – These are APIs defined for use by employees and internal systems within an organization or across a global company. By their very nature, they’re created for sensitive internal functions and have access to privileged functions that external actors cannot perform.
  2. Customer APIs – Customer APIs are provided as a way of enabling used by global customers to conduct business using product/service distribution channels – examples include product orders, view catalogs etc. These carry a very limited set of privileges limited to customer facing actions in a B2C context.
  3. Partner APIs – Partner APIs are used for varying levels of business to be able to perform business functions in the context of a B2B relationship. Examples include Affiliate programs in Retail, inventory management, Supply Orders in Manufacturing & Billing functions in Financial Services etc.The API provider hosts marketplaces that enable partner developers to create software that leverages these APIs.

The Five Business Drivers for an Enterprise API Strategy..

The question for enterprise executives then becomes, when do they begin to invest in a central API Management Platform?  Is such a decision based on the API sprawl in the organization or the sheer number of APIs being manually managed etc?

While the exact trigger point may vary for every enterprise, Let us consider the five key value drivers..

Driver #1 APIs enable Digital Platforms to evolve into ecosystems

In my mind, the first and most important reason to move to a centralized API strategy is to evolve standalone Digital capabilities into a coherent Platform. Different lines of business can use these capabilities to develop an ecosystem of applications that can be offered as a SaaS (Software as a Service). The end state of Digital Platforms are to operate business systems at massive scale in terms of customers, partners and employees.

The two central ideas at the heart of a platform based approach are as follows –

  1. Create new customer revenue streams by reaching out to new customer segments across the globe or in new (and non traditional) markets. Examples of these platforms abound in the business world. In financial services, Banks & Credit reporting agencies are able to monetize their assets of years of customer & product data by reselling them to interested third parties which use them either for new product creation or to offer services that simplify a pressing industry issue – Customer Onboarding.
  2. Reduce cost in current business models by extending core processes to business partners and also by automating manual communication steps (which are almost always higher cost and inefficient). For instance, Amazon has built their retail business using partner APIs to extend retailing provisioning, entitlement, enablement and order fulfillment processes.

Driver #2 Impact the Customer experience

We have seen how mobile systems are a key source of customer engagement. Offering the customer a seamless experience while they transact with an organization is a key way of disarming competition. Accordingly, Digital projects emphasize the importance of capabilities such as Customer Journey Mapping (CJM) and Single View of Customer (SVC) as the minimum table stakes that they need to provide. For instance, in Retail Banking, players are feeling the pressure to move beyond the traditional transactional banking model to a true customer centric model by offering value added services on the customer data that they already possess. APIs are leveraged across such projects to enrich the views of the customer (typically with data from external systems) as well as to expose these views to customers themselves, business partners and employees.

Demystifying Digital – Why Customer 360 is the Foundational Digital Capability – ..(1/3)

Driver #3 Cloud Computing & DevOps

This one is all too familiar to anyone working in technology. We have seen how both Cloud Computing & DevOps are the foundation of agile technology implementations across a range of back end resources. These include but are not limited to Compute, NAS/SAN storage, Big Data, Application platforms, and other middleware. Extending that idea, Cloud (IaaS/PaaS) is a set of APIs.

APIs are used to abstract out the internals of these underlying platform services. Application Developers and other infrastructure services use well defined APIs to interact with the platforms. These APIs enable the provisioning, deployment and management of platform services.

APIs have become the de facto model that provide developers and administrators with the ability to assemble Digital applications such as microservices using complicated componentry. Thus, there is a strong case to be made for adopting an API centric strategy when evolving to a Software Defined Datacenter.

A huge trend on the developer side has been the evolution of Continuous build, integration and deployment processes. The integration of APIs into the DevOps process has begun with use cases ranging from using publicly available APIs being used to trigger CI jobs to running CI/CD jobs using a cloud based provider.

Why Digital Disruption is the Cure for the Common Data Center..

Driver #4 APIs enable Business & Product Line Experimentation

APIs thus enable companies to constantly churn out innovative offerings while still continuously adapting & learning from customer feedback. Internet scale companies such as Facebook provide edge APIs that enable thousands of companies to write applications that driver greater customer volumes to the Facebook platform. The term API Economy is increasingly in vogue and it connotes a loosely federated ecosystem of companies, consumers, business models and channels

The API economy is a set of business models and channels — based on secure access of functionality and the exchange of data to an ecosystem of developers and the users of the app constructs they build — through an API, either within a company or via the internet, with business partners and customers.

The Three Habits of Highly Effective Real Time Enterprises…

Driver #5 Increasingly, APIs are needed to comply with Regulatory Mandates

We have already seen how key industries such as Banking and Financial Services, regulatory authorities are at the forefront of forcing incumbents to support Open APIs. APIs thus become a mechanism for increasing competition to benefit consumer choice. The Regulators are  changing the rules of participation in the banking & payments industry and APIs are a key enabling factor in this transformation.

Under the PSD2, Banks and Payment Providers in the EU will need to unlock access to their customer data via Open APIs

Why the PSD2 will Spark Digital Innovation in European Banking and Payments….

Financial Services, Healthcare, Telecom and Retail.. a case in point for why APIs present an Enormous Opportunity for the Fortune 500..

Banking – At various times, we have highlighted various business & innovation issues with Banking providers in the areas of Retail Banking, Payment Providers and Capital Markets. Regimes such as Payment Systems Directive (PSD2) in the EU will compel staid industry players to innovate faster than they otherwise would. FinTechs across the industry offer APIs to enable third party services to use their offerings.

Healthcare – there is broad support in the industry for Open APIs to drive improved patient care & highly efficient billing processes as well as to ensure realtime engagement across stakeholders.

APIs across the Healthcare value chain can ensure more aligned care plans and business processes. (Image Credit – Chilmark)

In the Telecom industry, nearly every large operator has developed APIs which are offered to customers and the developer community. Companies such as AT&T and Telefonica are using their anonymized access to hundreds of millions of subscribers to grant large global brands access to nonsensitive customer data. Federated platforms such as the GSM Association’s oneAPI are already promoting the usage of industry APIs.[1]

Retailers are building new business models based on functionality such as Product Catalogs, Product Search, Online Customer Orders, Inventory Management and Advanced Analytics (such as Recommendation Engines). APIs enable retailers to expand their footprints beyond the brick and mortar store & an online presence.

Ranking Your API Maturity..

Is there a maturity model for APIs?  We can try listing those into three different strategic options for Banks. Readers can extrapolate these into for their specific industry segment.

  1. Minimally Compliant Enterprises – Here we should categorize Companies that seek to provide compliance with a minimal Open API. Taking the example of Banking, while this may be the starting point for several organizations, staying too long in this segment will mean gradual market share erosion as well as a loss of customer lifetime value (CLV) over time. The reason for this is that FinTechs and other startups will offer a range of services such as Instant mortgages,  personal financial management tools, paperless approval processes for a range of consumer accounts etc. It is also anticipated that such organizations will treat their API strategy as a localized effort and will allocate personnel to the project mainly around the front office and marketing.
  2. Digital Starters -Players that have begun exploring opening up customer data but are looking to support the core Open API but also introduce their own proprietary APIs. While this approach may work in the short to medium term, it will only impose integration headaches on the banks as time goes on.
  3. Digital Innovators – The Digital Innovators will lead the way in adopting APIs. These companies will fund dedicated teams in lines of business serving their particular customer segments either organically or using partnerships with third party service providers. They will not only adhere to the industry standard APIs but also extend these specs to create own services with a focus on data monetization.

Conclusion..

Increasingly, a company’s APIs represent a business development tool and a new go-to-market channel that can generate substantial revenues from referrals and usage fees. Given the strategic importance and revenue potential of this resource, the C-suite must integrate APIs into its corporate decision making.

The next post will take a technical look into the core (desired) features of an API Management Platform.

References..

[1] Forrester Research 2016 – “Sizing the Market for API Management Solutions” http://resources.idgenterprise.com/original/AST-0165452_Forrester_Sizing_the_market_for_api_management_solutions.pdf 

[2]  Harvard Business Review 2016 – “The Strategic Value of APIs” – https://hbr.org/2015/01/the-strategic-value-of-apis

A Digital Reference Architecture for the Industrial Internet Of Things (IIoT)..

A few weeks ago on the invitation of DZone Magazine, I jointly authored a Big Data Reference Architecture along with my friend & collaborator, Tim Spann (https://www.linkedin.com/in/timothyspann/). Tim & I distilled our experience working on IIoT projects to propose an industrial strength digital architecture. It brings together several technology themes – Big Data , Cyber Security, Cognitive Applications, Business Process Management and Data Science. Our goal is to discuss a best in class architecture that enables flexible deployment for new IIoT capabilities allowing enterprises to build digital applications. The abridged article was featured in the new DZone Guide to Big Data: Data Science & Advanced Analytics which can be downloaded at  https://dzone.com/guides/big-data-data-science-and-advanced-analytics

How the Internet Of Things (IoT) leads to the Digital Mesh..

The Internet of Things (IoT) has become one of the four top hyped up technology paradigms affecting the world of business. The other usual suspects being Big Data, AI/Machine Learning & Blockchain. Cisco predicts that the IOT is expected to impact about 25 billion connected things by 2020 and affect about $2 trillion of economic value globally across a diverse range of verticals. These devices are not just consumer oriented devices such as smartphones and home monitoring systems but dedicated industry objects such as sensors, actuators, engines etc.

The interesting angle to all this is the fact that autonomous devices are already beginning to communicate with one another using IP based protocols. They largely exchanging state & control information around various variables. With the growth of computational power on these devices, we are not far off from their sending over more granular and interesting streaming data – about their environment, performance and business operations – all of which will enable a higher degree of insightful analytics to be performed on the data. Gartner Research has termed this interconnected world where decision making & manufacturing optimization can occur via IoT as the “Digital Mesh“.

The evolution of technological innovation in areas such as Big Data, Predictive Analytics and Cloud Computing now enables the integration and analysis of massive amounts of device data at scale while performing a range of analytics and business process workflows on the data.

Image Credit – Sparkling Logic

According to Gartner, the Digital Mesh will thus lead to an interconnected data information deluge powered by the continuous data from these streams. These streams will encompasses classical IoT endpoints (sensors, field devices, actuators etc) sending data in a variety of formats –  text, audio, video & social data streams – along with new endpoints in areas as diverse as Industrial Automation, Remote Healthcare, Public Transportation, Connected Cars, Home Automation etc. These intelligent devices will increasingly begin communicating with their environments in a manner that will encourage collaboration in a range of business scenarios. The industrial cousin of IoT is the Industrial Internet of Things (IIIoT).

Defining the Industrial Internet Of Things (IIoT)

The Industrial Internet of Things (IIoT) can be defined as a ecosystem of capabilities that interconnects machines, personnel and processes to optimize the industrial lifecycle.  The foundational technologies that IIoT leverages are Smart Assets, Big Data, Realtime Analytics, Enterprise Automation and Cloud based services.

The primary industries impacted the most by the IIoT will include Industrial Manufacturing, the Utility industry, Energy, Automotive, Transportation, Telecom & Insurance.

According to Markets and Markets, the annual worldwide Industrial IoT market is projected to exceed $319 billion in 2020, which represents an 8% a compound annual growth rate (CAGR). The top four segments are projected to be manufacturing, energy and utilities, auto & transportation and healthcare.[1]

Architectural Challenges for Industrial IoT versus Consumer IoT..

Consumer based IoT applications generally receive the lion’s share of media attention. However the ability of industrial devices (such as sensors) to send ever more richer data about their operating environment and performance characteristics is driving a move to Digitization and Automation across a range of industrial manufacturing.

Thus, there are four distinct challenges that we need to account for in an Industrial IOT scenario as compared to Consumer IoT.

  1. The IIoT needs Robust Architectures that are able to handle millions of device telemetry messages per second. The architecture needs to take into account that all kinds of devices operating in environments ranging from the constrained to
  2. IIoT also calls for the highest degrees of Infrastructure and Application reliability across the stack. For instance, a lost message or dropped messages in a healthcare or a connected car scenario may mean life or death for a patient, or, an accident.
  3. An ability to integrate seamlessly with existing Information Systems. Lets be clear, these new age IIOT architectures need to augment existing systems such as Manufacturing Execution Systems (MES) or Traffic Management Systems. In Manufacturing, MES systems continually improve the product lifecycle and perform better resource scheduling and utilization. This integration helps these systems leverage the digital intelligence and insights across (potentially) millions of devices across complex areas of operation.
  4. An ability to incorporate richer kinds of analytics than has been possible before that provide a great degree of context. This ability to reason around context is what provides an ability to design new business models which cannot be currently imagined due to lack of agility in the data and analytics space.

What will IIoT based Digital Applications look like..

Digital Applications are being designed for specific device endpoints across industries. While the underlying mechanisms and business models differ from industry to industry, all of these use predictive analytics based on a combination of real time data processing & data science algorithms. These techniques extract insights from streaming data to provide digital services on existing toolchains, provide value added customer service, predict device performance & failures, improve operational metrics etc.

Examples abound. For instance, a great example in manufacturing is the notion of a Digital Twin which Gartner called out last year. A Digital twin is a software personification of an Intelligent device or system.  It forms a bridge between the real world and the digital world. In the manufacturing industry, digital twins can be setup to function as proxies of Things like sensors and gauges, coordinate measuring machines, vision systems, and white light scanning. This data is sent over a cloud based system where it is combined with historical data to better maintain the physical system.

The wealth of data being gathered on the shop floor will ensure that Digital twins will be used to reduce costs and increase innovation. Thus, in global manufacturing – Data science will soon make it’s way into the shop floor to enable the collection of insights from these software proxies. We covered the phenomenon of Servitization in manufacturing in a previous blogpost.

In the Retail industry, an ability to detect a customer’s location in realtime and combining that information with their historical buying patterns can drive real time promotions and an ability to dynamically price retail goods.

Solution Requirements for an IIoT Architecture..

At a high level, the IIoT reference architecture should support six broad solution areas-

  1. Device Discovery – Discovering a range of devices (and their details)  on the Digital Mesh for an organization within and outside the firewall perimeter
  2. Performing Remote Lifecycle Configuration of these devices ranging from startup to modification to monitoring to shut down
  3. Performing Deep Security level introspection to ensure the patch levels etc are adequate
  4. Creating Business workflows on the Digital Mesh. We will do this by marrying these devices to enterprise information systems (EISs)
  5. Performing Business oriented Predictive Analytics on these devices, this is critical to 
  6. On a futuristic basis, support optional integration with the Blockchain to support a distributed organizational ledger that can coordinate activity across all global areas that an enterprise operates in.

Building Blocks of the Architecture

Listed below are the foundational blocks of our reference architecture. Though the requirements will vary across industries, an organization can reasonably standardize on a number of foundational components as depicted below and then incrementally augment them as the interactions between different components increase based on business requirements.

Our reference architecture includes the following major building blocks –

  • Device Layer
  • Device Integration Layer
  • Data & Middleware Tier
  • Digital Application Layer

It also includes the following cross cutting concerns which span across the above layers –

  • Device and Data Security
  • Business Process Management
  • Service Management
  • UX Design
  • Data Governance – Provenance, Auditing, Logging

The next section provides a brief overview of the reference architecture’s components at a logical level.

A Big Data Reference Architecture for the Industrial Internet depicting multiple functional layers

Device Layer – 

The first requirement of IIIoT implementations is to support connectivity from the Things themselves or the Device layer depicted at the bottom. The Device layer includes a whole range of sensors, actuators, smartphones, gateways and industrial equipment etc. The ability to connect with devices and edge devices like routers, smart gateways using a variety of protocols is key. These network protocols include Ethernet, WiFi, and Cellular which can all directly connect to the internet. Other protocols that need a gateway device to connect include Bluetooth, RFID, NFC, Zigbee et al. Devices can connect directly with the data ingest layer shown above but it is preferred that they connect via a gateway which can perform a range of edge processing.

This is important from a business standpoint for instance, in certain verticals like healthcare and financial services, there exist stringent regulations that govern when certain identifying data elements (e.g. video feeds) can leave the premises of a hospital or bank etc. A gateway cannot just perform intelligent edge processing but also can connect thousands of device endpoints and facilitate bidirectional communication with the core IIoT architecture. 

The ideal tool for these constantly evolving devices, metadata, protocols, data formats and types is Apache NiFi.  These agents will send the data to an Apache NiFi gateway or directly into an enterprise Apache NiFi cluster in the cloud or on-premise.

Apache NiFi Eases Dataflow Management & Accelerates Time to Analytics In Banking (2/3)..

 A subproject of Apache NiFi – MiNiFi provides a complementary data collection approach that supplements the core tenets of NiFi in dataflow management. However due to its small footprint and low resource consumption, is well suited to handle dataflow from sensors and other IOT devices. It provides central management of agents while providing full chain of custody information on the flows themselves.

For remote locations, more powerful devices like the Arrow BeagleBone Black Industrial and MyPi Industrial, it is very simple to run a tiny Java or C++ MiNiFi agent for secure connectivity needs.

The data sent by the device endpoints are then modeled into an appropriate domain representation based on the actual content of the messages. The data sent over also includes metadata around the message. A canonical model can optionally be developed (based on the actual business domain) which can support a variety of applications from a business intelligence standpoint.

 Apache NiFi supports the flexibility of ingesting changing file formats, sizes, data types and schemas. The devices themselves can send a range of feeds in different formats. E.g. XML now and based on upgraded capabilities – richer JSON tomorrow. NiFi supports ingesting any file type that the devices or the gateways may send.  Once the messages are received by Apache NiFi, they are enveloped in security with every touch to each flow file controlled, secured and audited.   NiFi flows also provide full data provenance for each file, packet or chunk of data sent through the system.  NiFi can work with specific schemas if there are special requirements for file types, but it can also work with unstructured or semi structured data just as well.  From a scalability standpoint, NiFi can ingest 50,000 streams concurrently on a zero-master shared nothing cluster that horizontally scales via easy administration with Apache Ambari.

Data and Middleware Layer – 

The IIIoT Architecture recommends a Big Data platform with native message oriented middleware (MOM) capabilities to ingest device mesh data. This layer will also process device data in such a fashion – batch or real-time – as the business needs demand.

Application protocols such as AMQP, MQTT, CoAP, WebSockets etc are all deployed by many device gateways to communicate application specific messages.  The reason for recommending a Big Data/NoSQL dominated data architecture for IIOT is quite simple. These systems provide Schema on Read which is an innovative data handling technique. In this model, a format or schema is applied to data as it is accessed from a storage location as opposed to doing the same while it is ingested. From an IIOT standpoint, one must not just deal with the data itself but also metadata such as timestamps, device id, other firmware data such as software version, device manufactured data etc. The data sent from the device layer will consist of time series data and individual measurements.

The IIoT data stream can thus be visualized as a constantly running data pump which is handled by a Big Data pipeline takes the raw telemetry data from the gateways, decides which ones are of interest and discards the ones not deemed significant from a business standpoint.  Apache NiFi is your gateway and gate keeper.   It ingests the raw data, manages the flow of thousands of producers and consumers, does basic data enrichment, sentiment analysis in stream, aggregation, splitting, schema translation, format conversion and other initial steps to prepare the data. It does that all with a user-friendly web UI and easily extendible architecture.  It will then send raw or processed data to Kafka for further processing by Apache Storm, Apache Spark or other consumers.  Apache Storm is a distributed real-time computation engine that reliably processes unbounded streams of data.  Storm excels at handling complex streams of data that require windowing and other complex event processing. While Storm processes stream data at scale, Apache Kafka distributes messages at scale. Kafka is a distributed pub-sub real-time messaging system that provides strong durability and fault tolerance guarantees. NiFi, Storm and Kafka naturally complement each other, and their powerful cooperation enables real-time streaming analytics for fast-moving big data. All the stream processing is handled by NiFi-Storm-Kafka combination.  

Apache Nifi, Storm and Kafka integrate very closely to manage streaming dataflows.

 

Appropriate logic is built into the higher layers to support device identification, ID lookup, secure authentication and transformation of the data. This layer will process data (cleanse, transform, apply a canonical representation) to support Business Automation (BPM), BI (business intelligence) and visualization for a variety of consumers. The data ingest layer will also providing notification and alerts via Apache NiFi.

Here are some typical uses for this event processing pipeline:

a. Real-time data filtering and pattern matching

b. Enrichment based on business context

c. Real-time analytics such as KPIs, complex event processing etc

d. Predictive Analytics

e. Business workflow with decision nodes and human task nodes

Digital Application Tier – 

Once IIoT knowledge has become part of the Hadoop based Data Lake, all the rich analytics, machine learning and deep learning frameworks, tools and libraries now become available to Data Scientists and Analysts.   They can easily produce insights, dashboards, reports and real-time analytics with IIoT data joined with existing data in the lake including social media data, EDW data, log data.   All your data can be queried with familiar SQL through a variety of interfaces such as Apache Phoenix on HBase, Apache Hive LLAP and Apache Spark SQL.   Using your existing BI tools or the open sourced Apache Zeppelin, you can produce and share live reports.   You can run TensorFlow in containers on YARN for deep learning insights on your images, videos and text data; while running YARN clustered Spark ML pipelines fed by Kafka and NiFi to run streaming machine learning algorithms on trained models.

A range of predictive applications are suitable for this tier. The models themselves should seek to answer business questions around things like -Asset failure, the key performance indicators in a manufacturing process and how they’re trending, insurance policy pricing etc. 

Once the device data has been ingested into a modern data lake, key functions that need to be performed include data aggregation, transformation, enriching, filtering, sorting etc.

As one can see, this can get very complex very quick – both from a data storage and processing standpoint. A Cloud based infrastructure with its ability to provide highly scalable compute, network and storage resources is a natural fit to handle bursty IIoT applications. However, IIoT applications add their own diverse requirements of computing infrastructure, namely the ability to accommodate hundreds of kinds of devices and network gateways – which means that IT must be prepared to support a large diversity of operating systems and storage types

The tier is also responsible for the integration of the IIoT environment into the business processes of an enterprise. The IIoT solution ties into existing line-of-business applications and standard software solutions through adapters or Enterprise Application Integration (EAI) and business-to-business (B2B) gateway capabilities. End users in business-to-business or business-to-consumer scenarios will interact with the IIOT solution and the special- purpose IIoT devices through this layer. They may use the IIoT solution or line-of-business system UIs, including apps on personal mobile devices, such as smartphones and tablets.

Security Implementation

The topic of Security is perhaps the most important cross cutting concern across all layers of the IIoT architecture stack. Needless to say, each of the layers must support the strongest data encryption, authentication and authentication capabilities for devices, users and partner applications. Accordingly, capabilities must be provided to ingest and store security feeds, IDS logs for advanced behavioral analytics, server logs, device telemetry. These feeds must be constantly analyzed across three domains – the Device domain, the Business domain and the IT domain. The below blogpost delves into some of these themes and is a good read to get a deeper handle on this issue from a SOC (security operations center) standpoint.

An Enterprise Wide Framework for Digital Cybersecurity..(4/4)

Conclusion

It is evident from the above that IIoT will enormous opportunity for businesses globally. It will also create layers of complexity and opportunity for Enterprise IT. The creation of smart digital services on the data served up will further depend on the vertical industries. Whatever be the kind of business model – whether tracking behavior, location sensitive pricing, business process automation etc – the end goal of IT architecture should be to create enterprise business applications that are ultimately data native and analytics driven.

DZone-GuideToBigData-Apr17

Why Data Silos Are Your Biggest Source of Technical Debt..

Any enterprise CEO really ought to be able to ask a question that involves connecting data across the organization, be able to run a company effectively, and especially to be able to respond to unexpected events. Most organizations are missing this ability to connect all the data together.” Tim Berners Lee -(English computer scientist, best known as the inventor of the World Wide Web)

Image Credit – Device42

We have discussed vertical industry business challenges across sectors like Banking, Insurance, Retail and Manufacturing in some level of detail over the last two years. Though enterprise business models vary depending on the industry, there is a common Digital theme raging across all industries in 2017. Every industry is witnessing an upswing in the numbers of younger and digitally aware customers. Estimates of this influential population are as high as 40% in areas such as Banking and Telecommunications. They represent a tremendous source of revenue but can also defect just as easily if the services offered aren’t compelling or easy to use – as the below illustration re the Banking industry illustrates.

These customers are Digital Natives i.e they are highly comfortable with technology and use services such as Google, Facebook, Uber, Netflix, Amazon, Google etc almost hourly in their daily lives. As a consequence, they expect a similar seamless & contextual experience while engaging with Banks, Telcos, Retailers, Insurance companies over (primarily) digital channels. Enterprises then have a dual fold challenge – to store all this data as well as harness it for real time insights in a way that is connected with internal marketing & sales.

As many studies have shown, companies that constantly harness data about their customers and perform speedy advanced analytics outshine their competition. Does that seem a bombastic statement? Not when you consider that almost half of all online dollars spent in the United States in 2016 were spent on Amazon and almost all digital advertising revenue growth in 2016 was accounted by two biggies – Google and Facebook. [1]

According to The Economist, the world’s most valuable commodity is no longer Oil, but Data. The few large companies depicted in the picture are now virtual monopolies[2] (Image Credit – David Parkins)

Let us now return to the average Enterprise. The vast majority of industrial applications (numbering around an average of 1000+ applications at large enterprises according to research firm NetSkope) generally lag the innovation cycle. This is because they’re created using archaic technology platforms by teams that conform to rigid development practices. The Fab Four (Facebook Amazon Google Netflix) and others have shown that Enterprise Architecture is a business differentiator but the Fortune 500 have not gotten that message as yet. Hence they largely predicate their software development on vendor provided technology instead of open approaches. This anti-pattern is further exacerbated by legacy organizational structures which ultimately leads to these applications holding a very parochial view of customer data. These applications can typically be classified in one of the buckets – ERP, Billing Systems, Payment Processors, Core Banking Systems, Service Management Systems, General Ledger, Accounting Systems, CRM, Corporate Email, Salesforce, Customer On-boarding etc etc. 

These enterprise applications are then typically managed by disparate IT groups scattered across the globe. They often serve different stakeholders who seem to have broad overlapping interests but have conflicting organizational priorities for various reasons. These applications then produce and data in silos – localized by geography, department, or, line of business, or, channels.

Organizational barriers only serve to impede data sharing for various reasons –  ranging from competitive dynamics around who owns the customer relationship, regulatory reasons to internal politics etc. You get the idea, it is all a giant mishmash.

Before we get any further, we need to define that dreaded word – Silo.

What Is a Silo?

A mind-set present in some companies when certain departments or sectors do not wish to share information with others in the same company. This type of mentality will reduce the efficiency of the overall operation, reduce morale, and may contribute to the demise of a productive company culture. (Source- Business Dictionary -[2])

Data is the Core Asset in Every Industry Vertical but most of it is siloed in Departments, Lines of Business across Geographies..

Let us be clear, most Industries do not suffer from a shortage of data assets. Consider a few of the major industry verticals and a smattering of the kinds of data that players in these areas commonly possess – 

Data In Banking– 

  • Customer Account data e.g. Names, Demographics, Linked Accounts etc
  • Core Banking Data going back decades
  • Transaction Data which captures the low level details of every transaction (e.g debit, credit, transfer, credit card usage etc)
  • Wire & Payment Data
  • Trade & Position Data
  • General Ledger Data e.g AP (accounts payable), AR (accounts receivable), cash management & purchasing information etc.
  • Data from other systems supporting banking reporting functions.

DATA IN HEALTHCARE–  

  • Structured Clinical data e.g. Patient ADT information
  • Free hand notes
  • Patient Insurance information
  • Device Telemetry 
  • Medication data
  • Patient Trial Data
  • Medical Images – e.g. CAT Scans, MRIs, CT images etc

DATA IN MANUFACTURING– 

  • Supply chain data
  • Demand data
  • Pricing data
  • Operational data from the shop floor 
  • Sensor & telemetry data 
  • Sales campaign data

The typical flow of data in an enterprise follows a familiar path –

  1. Data is captured in large quantities as a result of business operations (customer orders, e commerce transactions, supply chain activities, Partner integration, Clinical notes et al). These feeds are captured using a combination of techniques – mostly ESB (Enterprise Service Bus) and Message Brokers.
  2. The raw data streams then flow into respective application owned silos where over time a great amount of data movement (via copying, replication and transformation operations – the dreaded ETL) occurs using proprietary vendor developed systems. Vendors in this space have not only developed shrink wrapped products that make them tens of billions of dollars annually but also imposed massive human capital requirements of enterprises to program & maintain these data flows.
  3. Once all of the relevant data has been normalized, transformed and then processed, it  is then copied over into business reporting systems where it is used to perform a range of functions – typically for reporting for use cases such as Customer Analytics, Risk Reporting, Business Reporting, Operational improvements etc.
  4. Rinse and repeat..

Due to this old school methodology of working with customer, operational data, most organizations have no real time data processing capabilities in place & they thus live in a largely reactive world. What that means is that their view of a given customers world is typically a week to 10 days old.

Another factor to consider is – the data sources described out above are what can be described as structured data or traditional data. However, organizations are now on-boarding large volumes of unstructured data as has been captured in the below blogpost. Oftentimes, it is easier for Business Analysts, Data Scientists and Data Architects to get access to external data faster than internal data.

Getting access to internal data typically means jumping over multiple hoops from which department is paying for the feeds, the format of the feeds, regulatory issues, cyber security policy approvals, SOX/PCI compliance et al. The list is long and impedes the ability of business to get things done quickly.

Infographic: The Seven Types of Non Traditional Data that can drive Business Insights

Data and Technical Debt… 

Since Gene Kim coined the term ‘Technical Debt‘ , it has typically been used in an IT- DevOps- Containers – Data Center context. However, technology areas like DevOps, PaaS, Cloud Computing with IaaS, Application Middleware, Data centers etc in and of themselves add no direct economic value to customers unless they are able to intelligently process Data. Data is the most important technology asset compared to other IT infrastructure considerations. You do not have to take my word for that. It so happens that The Economist just published an article where they discuss the fact that the likes of Google, Facebook, Amazon et al are now virtual data monopolies and that global corporations are way way behind in the competitive race to own Data [1].

Thus, it is ironic that while the majority of traditional Fortune 500 companies are still stuck in silos, Silicon Valley companies are not just fast becoming the biggest owners of global data but are also monetizing them on the way to record profits. Alphabet (Google’s corporate parent), Amazon, Apple, Facebook and Microsoft are the five most valuable listed firms in the world. Case in point – their profits are around $25bn  in the first quarter of 2017 and together they make up more than half the value of the NASDAQ composite index. [1]

The Five Business Challenges that Data Fragmentation causes (or) Death by Silo … 

How intelligently a company harnesses it’s data assets determines it’s overall competitive position. This truth is being evidenced in sectors like Banking and Retail as we have seen in previous posts.

What is interesting, is that in some countries which are concerned about the pace of technological innovation, National regulatory authorities are creating legislation to force slow moving incumbent corporations to unlock their data assets. For example, in the European Union as a result of regulatory mandates – the PSD2 & Open Bank Standard –  a range of agile players across the value chain (e.g FinTechs ) will soon be able to obtain seamless access to a variety of retail bank customer data by accessing using standard & secure APIs.

Once obtained the data can help these companies can reimagine it in manifold ways to offer new products & services that the banks themselves cannot. A simple use case can be that they can provide personal finance planning platforms (PFMs) that help consumers make better personal financial decisions at the expense of the Banks owning the data.  Surely, FinTechs have generally been able to make more productive use of client data than have banks. They do this by providing clients with intuitive access to cross asset data, tailoring algorithms based on behavioral characteristics and by providing clients with a more engaging and unified experience.

Why cannot the slow moving established Banks do this? They suffer from a lack of data agility due to the silos that have been built up over years of operations and acquisitions. None of these are challenges for the FinTechs which can build off of a greenfield technology environment.

To recap, let us consider the five ways in which Data Fragmentation hurts enterprises – 

#1 Data Silos Cause Missed Top line Sales Growth  –

Data produced by disparate applications which use scattered silos to store them causes challenges in enabling a Single View of a customer across channels, products and lines of business. This then makes everything across the customer lifecycle a pain – ranging from smooth on-boarding, to customer service to marketing analytics. Thus, it impedes an ability to segment customers intelligently, perform cross sell & up sell. This sheer inability to understand customer journeys (across different target personas) also leads customer retention issues. When underlying data sources are fragmented, communication between business teams moves over to other internal mechanisms such as email, chat and phone calls etc. This is a recipe for delayed business decisions which are ultimately ineffective as they depend more on intuition than are backed by data. 

#2 Data Silos are the Root Cause of Poor Customer Service  –

Across industries like Banking, Insurance, Telecom & Manufacturing, the ability to get a unified view of the customer & their journey is at the heart of the the enterprises ability to understand their customers preferences & needs. This is also crucial in promoting relevant offerings and in detecting customer dissatisfaction. Currently most enterprises are woefully inadequate at putting together this comprehensive Single View of their Customers (SVC). Due to operational silos, each department possess a silo & limited view of the customer across other silos (or channels). These views are typically inconsistent in and of themselves as they lack synchronization with other departments. The net result is that the companies typically miss a high amount of potential cross-sell and up-sell opportunities.

#3 – Data Silos produce Inaccurate Analytics 

First off most Analysts need to wait long times to acquire the relevant data they need to test their hypotheses. Thus, since the data they work on is of poor quality as a result of fragmentation, so are the analytics operate on the data.

Let us take an example in Banking, Mortgage Lending, an already complex business process has been made even more so due to the data silos built around Core Banking, Loan Portfolio, Consumer Lending applications.Qualifying borrowers for Mortgages needs to be based on not just historical data that is used as part of the origination & underwriting process (credit reports, employment & income history etc) but also data that was not mined hitherto (social media data, financial purchasing patterns,). It is a well known fact there are huge segments of the population (especially the millennials) who are broadly eligible but under-banked as they do not satisfy some of the classical business rules needed to obtain approvals on mortgages.  Each of the silos store partial customer data. Thus, Banks do not possess an accurate and holistic picture of a customer’s financial status and are thus unable to qualify the customer for a mortgage in quick time with the best available custom rate.

#4 – Data Silos hinder the creation of new Business Models  


The abundance of data created over the last decade is changing the nature of business. If it follows that enterprise businesses are being increasingly built around data assets, then it must naturally follow that data as a commodity can be traded or re-imagined to create revenue streams off it. As an example, pioneering payment providers now offer retailers analytical services to help them understand which products perform best and how to improve the micro-targeting of customers. Thus, data is the critical prong of any digital initiative. This has led to efforts to monetize on data by creating platforms that either support ecosystems of capabilities. To vastly oversimplify this discussion ,the ability to monetize data needs two prongs – to centralize it in the first place and then to perform strong predictive modeling at large scale where systems need to constantly learn and optimize their interactions, responsiveness & services based on client needs & preferences. Thus, Data Silos hurt this overall effort more than the typical enterprise can imagine.

#5 – Data Silos vastly magnify Cyber, Risk and Compliance challenges – 

Enterprises have to perform a range of back-office functions such as Risk Data Aggregation & Reporting, Anti Money Laundering Compliance and Cyber Security Monitoring.

Cybersecurity – The biggest threat to the Digital Economy..(1/4)

It must naturally follow that as more and more information assets are stored across the organization, it is a manifold headache to deal with securing each and every silo from a range of bad actors – extremely well funded and sophisticated adversaries ranging from criminals to cyber thieves to hacktivists. On the business compliance front, sectors like Banking & Insurance need to maintain large AML and Risk Data Aggregation programs – silos are the bane of both. Every industry needs fraud detection capabilities as well, which need access to unified data.

Conclusion

My intention for this post is clearly to raise more questions than provide answers. There is no question Digital Platforms are a massive business differentiator but they need to have access to an underlying store of high quality, curated, and unified data to perform their magic. Industry leaders need to begin treating high quality Data as the most important business asset they have & to work across the organization to rid it of Silos.

References..

[1]  The Economist – “The world’s most valuable resource is no longer oil, but data” – http://www.economist.com/news/leaders/21721656-data-economy-demands-new-approach-antitrust-rules-worlds-most-valuable-resource

[2] Definition of Silo Mentality – http://www.businessdictionary.com/definition/silo-mentality.html

Here Is What Is Causing The Great Brick-And-Mortar Retail Meltdown of 2017..(1/2)

Amazon and other pure plays are driving toward getting both predictive and prescriptive analytics. They’re analyzing and understanding information at an alarming rate. Brands have pulled products off of Amazon because they’re learning more about them than the brands themselves.” — Todd Michaud, Founder and CEO of Power Thinking Media

By April 2017,17 major retailers announced plans to close stores (Image Credit: Clark Howard)

We are witnessing a meltdown in Storefront Retail..

We are barely halfway through 2017, and the US business media is rife with stories of major retailers closing storefronts. The truth is inescapable that the Retail industry is in the midst of structural change. According to a research report from Credit Suisse, around 8,600 brick-and-mortar stores will shutter their doors in 2017. The number in 2016 was 2,056 stores and 5,077 in 2015 which points to industry malaise [1].

The Retailer’s bigger cousin – the neighborhood Mall – is not doing any better. There are around 1,200 malls in the US today and that number is forecast to decline to just about 900 in a decade.[3]

It is clear that in the coming years, Retailers (and malls) across the board will remain under pressure due to a variety of changes – technological, business model and demographic.

So what can legacy Retailers do to compete with and disarm the online upstart?

Six takeaways for Retail Industry watchers..

Six takeaways that should have industry watchers take notice from the recent headlines –

  1. The brick and mortar retail store pullback has accelerated in 2017 – an year of otherwise strong economic expansion. Typical consumer indicators that influence consumer spending on retail are generally pointing upwards. Just sample the financial data – the US has seen increasing GDP for eight straight years, the last 18 months have seen wage growth for middle & lower income Americans and gas prices are at all time lows.[3] These kinds of relatively strong consumer data trends cannot explain a slowdown in physical storefronts. Consumer spending is not shrinking to due to declining affordability/spending power.
  2. Retailers that have either declared bankruptcy or announced large scale store closings include marquee names across the different categories of retail. Ranging from Apparel to Home Appliances to Electronics to Sporting Goods. Just sample some of the names – Sports Authority, RadioShack, HHGregg, American Apparel, Bebe Stores, Aeropostale, Sears, Kmart, Macy’s, Payless Shoes, JC Penney etc. So this is clearly a trend across various sectors in retail and not confined to a given area, for instance, women’s apparel.
  3. Some of this “Storefront Retail bubble burst” can definitely be attributed to hitherto indiscriminate physical retail expansion. The first indicator is here is in the glut of residual excess retail space.  The WSJ points out that the retail expansion dates back almost 30 years ago when retailers began a “land grab” to open more stores – not unlike the housing boom a decade or so ago. [1] North America now has a glut of both retail stores and shopping malls while per capita sales has begun declining. The US especially has almost five times retail space per capita compared to the UK. American consumers are also swapping materialism for more experiences.[3]  Thus, an over-buildout of retail space is one of the causes of the ongoing crash.

    The US has way more shopping space compared to the rest of the world. (Credit – Cowan and Company)
  4. The dominant retail trend in the world is online ‘single click’ shopping. This is evidenced by declining in-store Black Friday sales in 2016 when compared with record Cyber Monday (online) sales. As online e-commerce volume increases year on year, online retailers led by Amazon are surely taking market share away from the struggling brick-and mortar Retailer who has not kept up with the pace of innovation. The uptick in online retail is unmistakeable as evidenced by the below graph (src – ZeroHedge) depicting the latest retail figures. Department-store sales rose 0.2% on the month, but were down 4.5% from a year earlier. Online retailers such as Amazon, posted a 0.6% gain from the prior month and a 11.9% increase from a year earlier.[3]

    Retail Sales – Online vs In Store Shopping (credit: ZeroHedge)
  5. Legacy retailers are trying to play catch-up with the upstarts who excel at technology. This has sometimes translated into acquisitions of online retailers (e.g. Walmart’s buy of Jet.com). However, the Global top 10 Retailers are dominated by the likes of Walmart, Costco, the Kroger, Walgreens etc. Amazon comes in only at #10 which implies that this battle is only in it’s early days. However, legacy retailers are saddled by huge fixed costs & their investors prefer dividend payouts to investments in innovations. Thus their CEOs are incentivized to focus on the next quarter, not the next decade like Amazon’s Jeff Bezos who is famously known to not evidence any signs of increasing Amazon’s profitability. Though traditional retailers have begun accelerating investments (both organic and via acquisition) in the critical areas of Cloud Computing, Big Data,Mobility and Predictive Analytics – the web scale majors such as Amazon are far far ahead of typical Retail IT shop.

  6. The fastest growing Retail industry brands are companies that use Data as a core business capability to impact the customer experience versus as just another component of an overall IT system. Retail is a game of micro customer interactions that drive sales and margin. This implies a Retailer’s ability to work with realtime customer data – whether it’s sentiment data, clickstream data and historical purchase data to drive marketing promotions, personally relevant services, order fulfillment, show-rooming, loyalty programs etc etc.On the back end, the ability to streamline operations by pulling together data from operations, supply chains are helping retailers fine-tune & automate operations especially from a delivery standpoint.

    In Retail, Technology Is King..

    So, what makes Retail somewhat of a unique industry in terms of it’s data needs? I posit that there are four important characteristics –

    • First and foremost, Retail customers especially the millennials are very open about sharing their brand preferences and experiences on social media. There is a treasure trove of untapped data and similar out there. Data needs to be collected and monetized on. We will explore this in more detail in the next post.
    • Secondly, leaders such as Amazon use customer, product data and a range of other technology capabilities to shape the customer experience versus the other way around for traditional retailers. They do this based on predictive analytic approaches such as machine learning and deep learning. Case in point is Amazon which has now morphed from an online retailer to a Cloud Computing behemoth with it’s market leading AWS (Amazon Web Services). In fact it’s best in class IT enabled it to experiment with retail business models. E.g. The Amazon Prime subscription at $99-a-year Amazon Prime subscription, which includes free two delivery, music and video streaming service that competes with Netflix. As of March 31, 2017 Amazon had 80 million Prime subscribers in the U.S , an increase of 36 percent from a year earlier, according to Consumer Intelligence Research Partners.[3]
    • Thirdly, Retail organizations need to become Data driven businesses. What does that mean or imply? They need to rely on data to drive every core business process – e.g. realtime insights about customers, supply chains, order fulfillment and inventory. This data however spans every kind from traditional structured data (sales data, store level transactions, customer purchase histories, supply chain data, advertising data etc) to non traditional data (social media feeds as there is a strong correlation between the products people rave about and what they ultimately purchase), location data, economic performance data etc). This Data variety represents a huge challenge to Retailers in terms of managing, curating and analyzing these feeds.
    • Fourth, Retail needs to begin aggressively adopting the IoT capabilities they already have in place in the area of Predictive Analytics. This implies tapping and analyzing data from in store beacons, sensors and actuators across a range of use cases from location based offers to restocking shelves.

      ..because it enables new business models..

      None of the above analysis claims that physical stores are going away. They serve a very important function in allowing consumers a way to try on products and allowing for the human experience. However, online definitely is where the growth primarily will be.

      The Next and Final Post in this series..

      It is very clear from the above that it now makes more sense to talk about a Retail Ecosystem which is composed of store, online, mobile and partner storefronts.

      In that vein, the next post in this two part series will describe the below four progressive strategies that traditional Retailers can adopt to survive and favorably compete in today’s competitive (and increasingly online) marketplace.

      These are –

    • Reinventing Legacy IT Approaches – Adopting Cloud Computing, Big Data and Intelligent Middleware to re-engineer Retail IT

    • Changing Business Models by accelerating the adoption of Automation and Predictive Analytics – Increasing Automation rates of core business processes and infusing them with Predictive intelligence thus improving customer and business responsiveness

    • Experimenting with Deep Learning Capabilities  –the use of Advanced AI such as Deep Neural Nets to impact the entire lifecycle of Retail

    • Adopting a Digital or a ‘Mode 2’ Mindset across the organization – No technology can transcend a large ‘Digital Gap’ without the right organizational culture

      Needless to say, the theme across all of the above these strategies is to leverage Digital technologies to create immersive cross channel customer experiences.

References..

[1] WSJ – ” Three hard lessons the internet is teaching traditional stores” – https://www.wsj.com/articles/three-hard-lessons-the-internet-is-teaching-traditional-stores-1492945203

[2] The Atlantic  – “The Retail Meltdown” https://www.theatlantic.com/business/archive/2017/04/retail-meltdown-of-2017/522384/?_lrsc=2f798686-3702-4f89-a86a-a4085f390b63

[3] WSJ – ” Retail Sales fall for the second straight month” https://www.wsj.com/articles/u-s-retail-sales-fall-for-second-straight-month-1492173415

How the Internet of Things (IIoT) Digitizes Industrial Manufacturing..

In 2017, the chief strategic concerns for Global Product Manufacturers are manifold. These range from their ability drive growth in new markets by creating products that younger customers need, cut costs by efficient high volume manufacturing spanning global supply chains  & effective distribution and service. While the traditional lifecycle has always been a huge management challenge the question now is how digital technology can help create new markets and drive higher margins in established areas. In this blogpost, we will consider how IIoT (Internet Of Things) technology can do all of the above and foster new business models -by driving customer value on top of the core product.

Global Manufacturing is evolving from an Asset based industry to an Information based Digital industry. (Image Credit – GE)

A Diverse Industry Caught in Digital Dilemmas..

The last decade has seen tectonic changes in leading manufacturing economies. Along with a severe recession, employment in the industry has moved along the technology curve to a more skilled workforce. The services component of the industry is also steadily increasing i.e manufacturing now consumes business services and also is presented as such in certain sectors. The point is well made that this industry is not monolithic and there are distinct sectors with their own specific drivers for business success[1].

           The diverse sectors within Global Manufacturing (McKinsey [1])

Global manufacturing operations have evolved differently across industry segments. McKinsey identifies five diverse segments across the industry

  1. Global innovators for local markets – Industries such as Chemicals, Auto, Heavy Machinery etc.
  2. Regional processingRubber and Plastics products, Tobacco, Fabricated Metal and
  3. Energy intensive commodities – Industries supplying wood products, Petroleum and coke refining and Mineral based products
  4. Global technologies and innovators – Industries supplying Semiconductors, Computers and Office machinery
  5. Labor intensive tradables – These include textiles, apparel, leather, furniture, toys etc.
    Each of the above five sectors has different geographical locations where production takes place, they have diverse supply chains, support models, efficiency requirements and technological focus areas. These industries all have varying competitive forces operating across each.

However the trend that is broadly applicable to all of them is the “Industrial Internet”.

Defining the Industrial Internet Of Things (IIoT)

The Industrial Internet of Things (IIoT) can be defined as a ecosystem of capabilities that interconnects machines, personnel and processes to optimize the industrial lifecycle.  The foundational technologies that IIoT leverages are Smart Assets, Big Data, Realtime Analytics, Enterprise Automation and Cloud based services.

The primary industries impacted the most by the IIoT will include Industrial Manufacturing, the Utility industry, Energy, Automotive, Transportation, Telecom & Insurance.

Globally integrated manufacturers must constantly assess and fine-tune their strategy across these above eight stages. A key aspect is to be able to collect data throughout the process to derive real-time insights from the lifecycle, suppliers and customers. IoT technologies allied with Big Data techniques provide ways to store this data and to derive real-time & historical analytic insights. Thus the Manufacturing industry is moving to an entirely virtual world across its lifecycle, ranging from product development, customer demand monitoring to production to inventory management. This trend is being termed as Industry 4.0 or Connected Manufacturing. As devices & systems become more interactive and intelligent, the data they send out can be used to optimize the lifecycle across the value chain thus driving higher utilization of plant capacity and improved operational efficiencies.

Let us consider the impact of the IIoT across the lifecycle of Industrial Manufacturing.

IIOT moves the Manufacturing Industries from Asset Centric to Data Centric

The Industrial Internet of Things (IIoT) is a key enabler in digitizing the legacy manufacturing lifecycle. IIoT, Big Data and Predictive Analytics enable Manufacturers to reinvent their business models.

The Generic Product Manufacturing Lifecycle Overview as depicted in the above illustration covers the the most important activities that take place in the manufacturing process. Please note that this is a high level overview and in future posts we will expand upon each stage accordingly.

The overall lifecycle can be broken down into the following eight steps:

  1. Globally Integrated Product Design
  2. Prototyping and Pre-Production
  3. Mass production
  4. Sales and Marketing
  5. Product Distribution
  6. Activation and Support
  7. Value Added Services
  8. Resale and Retirement

Industry 4.0/ IIoT impacts Product Design and Innovation

IIoT technology can have a profound impact on the above traditional lifecycle in the following ways –

  1. The ability to connect the different aspects of the value chain that hitherto have been disconnected. This will fundamentally transform the asset lifecycle leading to higher manufacturing efficiencies, reduced wastage and more customer centric manufacturing (thus reducing recall rates)
  1. The ability to manage and integrate diverse data from sensors, machine data from operational systems, supplier channels & social media feedback drives real time insights
  2. The Connected asset lifecycle also leads to better inventory management and also drive optimal resupply decisions
  3. Create new business models that leverage data across the lifecycle to enable better product usage, pay for performance or outcome based services or even a subscription based usage model
  4. The ability track real time insights across the customer base thus leading to a more optimized asset lifecycle
  5. Reducing costs by allowing more operations ranging from product maintenance to product demos, customer experience sessions to occur remotely

Manufacturers have been connecting the value chain together for many years now. The M2M (mobile to mobile) implementations have already led to rounds of improvements in the so called ‘illities’ metrics– productivity, quality, reliability etc. The real opportunity with IIoT is being able to create new business models that result from the convergence of Operational Technology (OT) with Information Technology (IT). This journey primarily consists of taking a brick and mortar industry and slowly turning it into a data driven industry.

The benefits of adopting the IIOT range from improved quality owing to better aligned, efficient and data driven processes, higher operational efficiency overall, products better aligned with changing customer requirements, tighter communication across interconnected products and supplier networks.

Deloitte has an excellent take on the disruption ongoing in manufacturing ecosystems and holds all of the below terms as synonymous – [2]

  • Industrial Internet

  • Connected Enterprise

  • SMART Manufacturing

  • Smart Factory

  • Manufacturing 4.0

  • Internet of Everything

  • Internet of Things for Manufacturing

Digital Applications are already being designed for specific device endpoints across thought leaders across manufacturing industries such as the Automakers. While the underlying mechanisms and business models differ across the above five manufacturing segments, all of the new age Digital applications leverage Big Data, Cloud Computing, Predictive analytics at a minimum. Predictive Analytics are largely based on a combination of real time data processing & data science algorithms. These techniques extract insights from streaming data to provide digital services on existing toolchains, provide value added customer service, predict device performance & failures, improve operational metrics etc.

Examples abound. For instance, an excellent example in manufacturing is the notion of a Digital Twin which Gartner called out last year in their disruptive trends for 2017. A Digital twin is a software personification of an Intelligent device or system.  It forms a bridge between the real world and the digital world. In the manufacturing industry, digital twins can be setup to function as proxies of Things like sensors and gauges, coordinate measuring machines, vision systems, and white light scanning. This data is sent over a cloud based system where it is combined with historical data to better maintain the physical system.

The wealth of data being gathered on the shop floor will ensure that Digital twins will be used to reduce costs and increase innovation. Thus, in global manufacturing – Data science will soon make it’s way into the shop floor to enable the collection of insights from these software proxies.

What About the Technical Architecture..

For those readers inclined to follow the technology arc of this emerging trend, the below blogpost discusses an IIoT Reference Architecture to a great degree of technical depth –

A Digital Reference Architecture for the Industrial Internet Of Things (IIoT)..

References

  1. McKinsey & Company  – Global Manufacturing Outlook 2017 – http://www.mckinsey.com/business-functions/operations/our-insights/the-future-of-manufacturing
  2. Deloitte Press on Manufacturing Ecosystems – https://dupress.deloitte.com/dup-us-en/focus/industry-4-0/manufacturing-ecosystems-exploring-world-connected-enterprises.html

My take on Gartner’s Top 10 Strategic Technology Trends for 2017

We’re only at the very, very beginning of this next generation of computing and I think that every industry leader will be the ones that transforms first. I don’t care what industry you’re talking about” -Kim Stevenson, CIO, Intel, Feb 2016

Gartner Research rolled out their “Top 10 Strategic Technology Trends for 2017” report a few weeks ago. My goal for this blogpost is to introduce these trends to the reader and to examine the potential impact of their recommendations from an enterprise standpoint.

gartner_trends_2017

                                                              Gartner’s Strategic Trends for 2017 

# 1: AI & Advanced Machine Learning

Gartner rightly forecasts that AI (Artificial Intelligence) and Advanced Machine Learning will continue their march into daily applications run by the Fortune 1000. CIOs are coming to realize that most business problems are primarily data challenges. The rapid maturation of scalable processing techniques allows us to extract richer insights from data. What we commonly refer to as Machine Learning – a combination of econometrics, machine learning, statistics, visualization, and computer science – helps extracts valuable business insights hiding in data and builds operational systems to deliver that value.

Deep Machine Learning involves the art of discovering data insights in a human-like pattern. We are, thus, clearly witnessing the advent of modern data applications. These applications will leverage a range of advanced techniques such as Artificial Intelligence and Machine Learning (ML) encompassing techniques such as neural networks, natural language processing and deep learning.

Implications for industry CIOs – Modern data applications understand their environment (e.g customer preferences and other detailed data insights) to be able to predict business trends in real time & to take action based on them to drive revenues and decrease business risk. These techniques will enable applications and devices to operate in an even more smarter manner while saving companies enormous amounts of money on manual costs.

http://www.vamsitalkstech.com/?p=407

# 2: Intelligent Apps

Personal assistants, e.g Apple Siri, Microsoft Cortona in the category of virtual personal assistants (VPAs), have begun transforming everyday business processes easier for their users. VPAs represent the intersection of AI, conversational interfaces and integration into business processes. In 2017, these will begin improving customer experiences for the largest Fortune 100 enterprises. On the more personal front, Home VPAs will rapidly evolve & become even more smarter as their algorithms get more capable and understanding of their own environments.  We will see increased application of smart agents in diverse fields like financial services,healthcare, telecom and media.

Implications for industry CIOs – Get ready to invest in intelligent applications in the corporate intranet to start with.

# 3: Intelligent Things

The rise of the IoT has only been well documented but couple AI with massive data processing capabilities – that makes up Intelligent Things which can interact with humans in new ways. You can add a whole category of things around transportation (self driving cars, connected cars) and Robots that perform key processes in industrial manufacturing, drones etc.

Implications for industry CIOs – These intelligent devices will increasingly begin communicating with their environments in a manner that will encourage collaboration in a range of business scenarios. 2017 should begin the trend of these devices communicating with each other to form the eponymous ‘Digital Mesh’.

# 4: Virtual & Augmented Reality

Virtual reality (VR) and augmented reality (AR) are technologies that are beginning to completely change the way humans interact with one another and with intelligent systems that make up the Digital Mesh. Pokemon GO & Oculus Rift were the first hugely successful consumer facing AR applications – debuting in 2016. Uses of these technologies will include gamification (to improve customer engagement with products and services), other customer & employee facing applications etc. While both these technologies enable us to view the world in different ways – AR is remarkable in its ability to add to our current reality. BMW’s subsidiary Mini has actually developed a driving goggle with AR technology[1].

Implications for industry CIOs – This one is still on the drawing board for most verticals but it does make sense to invest in areas like gamification and in engaging with remote employees using AR.

# 5: Digital Twin

A Digital twin is a software personification of an Intelligent Thing or system. In the manufacturing industry, digital twins can be setup to function as proxies of things like sensors and gauges, Coordinate Measuring Machines, lasers, vision systems, and white light scanning [2]. The wealth of data being gathered on the shop floor will ensure that Digital twins will be used to reduce costs and increase innovation. Data science will soon make it’s way into the shop floor to enable the collection of insights from these software proxies.

Implications for industry CIOs – Invest in Digital capabilities that serve as proxies for physical things.

# 6: Blockchain

The term Blockchain is derived from a design pattern that describes a chain of data blocks that map to individual transactions. Each transaction that is conducted in the real world (e.g a Bitcoin wire transfer) results in the creation of new blocks in the chain. The new blocks so created are done so by calculating a cryptographic hash function of its previous block thus constructing a chain of blocks – hence the name.

Blockchain is a distributed ledger (DLT) which allows global participants to conduct secure transactions that could be of any type – banking, music purchases, legal contracts, supply chain transactions etc. Blockchain will transform multiple industries in the years to come. Bitcoin is the first application of Blockchain.

How the Blockchain will lead disruption across industry..(5/5)

Implications for industry CIOs – Begin expanding internal knowledge on Blockchain and as to how it can potentially augment or disrupt your vertical industry.

# 7: Conversational Systems

Mobile applications first begun forcing the need for enterprises to begin supporting multiple channels of interaction with their consumers. For example Banking now requires an ability to engage consumers in a seamless experience across an average of four to five channels – Mobile, eBanking, Call Center, Kiosk etc. Conversational Systems take these interactions to the next level and enable humans to communicate with a wide range of Intelligent Things using a range of channels – speech, touch, vision etc.

Implications for industry CIOs – Every touch point matters, and those leading the smart agent transformation should constantly be asking how organizations are removing friction and enhancing the experience for every customer regardless of where they are in the journey.

# 8: Mesh App and Service Architecture

This one is still from last year. The Digital Mesh leads to an interconnected information deluge which encompasses classical IoT endpoints along with audio, video & social data streams. The creation of these smart services will further depend on the vertical industries that these products serve as well as requirements for the platforms that host them. E.g industrial automation, remote healthcare, public transportation, connected cars, home automation etc.The micro services architecture approach which combines the notion of autonomous, cooperative yet loosely coupled applications built as a conglomeration of business focused services is a natural fit for the Digital Mesh.  The most important additive and consideration to micro services based architectures in the age of the Digital Mesh is what I’d like to term –  Analytics Everywhere.

Implications for industry CIOs -The mesh app will require a microservices based architecture which supports multichannel & multi device solutions.

# 9: Digital Technology Platforms

The onset of Digital Architectures in enterprise businesses implies the ability to drive continuous micro level interactions with global consumers/customers/clients/stockholders or patients depending on the vertical you operate in. More information on the core building blocks of Digital Technology Platforms at the below blogpost.

Implications for industry CIOs

http://www.vamsitalkstech.com/?m=201609

# 10: Adaptive Security Architecture

The evolution of the intelligent digital mesh and digital technology platforms and application architectures means that security has to become fluid and adaptive.Traditional solutions cannot handle this challenge which is exacerbated by the expectation that in an IoT & DM world, data flows will be multidirectional across a grid of application endpoints.

Implications for industry CIOs -Expect to find applications in 2016 and beyond incorporating Deep Learning and Real Time Analytics into their core security design with a view to analyzing large scale data at a very low latency. Security in the IoT environment is particularly challenging. Security teams need to work with application, solution and enterprise architects to build security into the overall DevOps process to create a DevSecOps model.

Conclusion..

In this year’s edition, Gartner are clearly forecasting the future ten years out from a mass market standpoint. As we cross this chasm slowly over the next ten years, we will see that IoT begin to emerge and take center stage in every industry vertical. Digital transformation will happen on apps created for and brought together for Smart Agents on the Device Mesh.

These apps will gradually become autonomous, data intensive,server-less, hopefully secure and location independent (data center or cloud). The app can be a sensor or a connected car or a digital twin for a manufacturing technician. So, it’s not just about a single app sitting in a data center or the cloud or on the machine itself. These smart agent apps will data driven, components of a larger mesh, interconnected connected using open interfaces, and resident at the places where it’s optimal for realtime analytics. This may seem like science fiction for the Fortune 1000 enterprise but it is manifest reality at the web scale innovators. The industry will have no choice but to follow.

References..

[1] Cramer – “A lesson in Augmented Realities” –  http://cramer.com/story/the-difference-between-ar-and-vr/

[2] Dr.Michael Grieves – “Digital Twin: Manufacturing Excellence through Virtual Factory Replication” – http://innovate.fit.edu/plm/documents/doc_mgr/912/1411.0_Digital_Twin_White_Paper_Dr_Grieves.pdf

How Big Data & Predictive Analytics transform AML Compliance in Banking & Payments..(2/2)

The first blog in this two part series (Deter Financial Crime by Creating an effective AML Program) described how Money Laundering (ML) activities employed by nefarious actors (e.g drug cartels, corrupt public figures & terrorist organizations) have gotten more sophisticated over the years. Global and Regional Banks are falling short of their compliance goals despite huge technology and process investments. Banks that fail to maintain effective compliance are typically fined hundreds of millions of dollars. In this second & final post, we will examine why Big Data Analytics as a second generation effort can become critical to efforts to shut down the flow of illicit funds across the globe thus ensuring financial organizations are compliant with efforts to reduce money laundering.

Where current enterprisewide AML programs fall short..

As discussed in various posts and in the first blog in the series (below), the Money Laundering (ML) rings of today are highly sophisticated in their understanding of the business specifics across the domains of Banking  – Capital Markets, Retail & Commercial banking. They are also very well versed in the complex rules that govern global trade finance.

Deter Financial Crime by Creating an Effective Anti Money Laundering (AML) Program…(1/2)

Further, the more complex and geographically diverse a financial institution is, the higher it’s risk of AML (Anti Money Laundering) compliance violations. Other factors such as an enormous volume of transactions across multiple distribution channels, across geographies between thousands of counter-parties always increases money laundering risk.

Most current AML programs fall short in five specific areas –

  1. Manual Data Collection & Risk Scoring – Bank’s response to AML statutes has been to bring in more staff typically in hundreds at large banks. These staff perform rote but key processes in AML such as Customer Due Diligence (CDD) and Know Your Customer (KYC).  These staff extensively scour external sources like Lexis Nexis, Thomson Reuters, D&B etc to manually scoring of risky client entities often pairing these with internal bank data. They also use AML watch-lists to perform this process of verifying individuals and business customers so that AML Case Managers can review it before filing Suspicious Activity Reports (SAR). On an average, about 50% of the cost of AML programs is incurred in terms of the large headcount requirements. At large Global Banks where the number of accounts are more 100 million customers the data volumes can get real big real quick causing all kinds of headaches for AML programs from a data aggregation, storage, processing and accuracy standpoint. There is a crying need to automate AML programs end to end to not only perform accurate risk scoring but also to keep costs down.
  2. Social Graph Analysis in areas such as Trade finance helps model the complex transactions occurring between thousands of entities. Each of these entities may have a complex holding structure with accounts that have been created using forged documents. Most fraud also happens in networks of fraud. An inability to dynamically understand the topology of the financial relationships among thousands of entities implies that AML programs need to develop graph based analysis capabilities .
  3. AML programs extensively deploy rule based systems or Transaction Monitoring Systems (TMS) which allow an expert system based approach to setup new rules. These rules span areas like monetary thresholds, specific patterns that connote money laundering & also business scenarios that may violate these patterns. However, fraudster rings now learn (or know) these rules quickly & change their fraudulent methods constantly to avoid detection. Thus there is a significant need to reduce a high degree of dependence on traditional TMS – which are slow to adapt to the dynamic nature of money laundering.
  4. The need to perform extensive Behavioral modeling & Customer Segmentation to discover transactions behavior with a view to identifying behavioral patterns of entities & outlier behaviors that connote potential laundering.
  5. Real time transaction monitoring in areas like Payment Cards presents unique challenges where money laundering is hidden within mountains of transaction data. Every piece of data produced as a result of bank operations needs to be commingled with historical data sets (for customers under suspicion) spanning years in making a judgment call about filing a SAR (Suspicious Activity Report).

How Big Data & Predictive Analytics can help across all these areas..

aml_predictiveanalytics

  1. The first area where Big Data & Predictive Analytics have a massive impact is around Due Diligence data of KYC (Know Your Customer) data. All of the above discussed data scraping from various sources can be easily automated by using tools in a Big Data stack to ingest information automatically. This is done by sending requests to data providers (the exact same ones that Banking institutions are currently using) via an API. Once this data is obtained, they can use real time processing tools (such as Apache Storm and Apache Spark) to apply sophisticated algorithms to that collected data to transform that data to calculate a Risk Score or Rating. In Trade Finance, Text Analytics can be used to process a range of documents like invoices, bills of lading, certificates of shipping etc to enable Banks to inspect a complex process across hundreds of entities operating across countries.  This approach enables Banks to process massive amounts of diverse data in quick time (even seconds) to synthesize it to accurate risk scores. Implementing Big Data in this very important workstream can help increase efficiency and reduce costs.
  2. The second area where Big Data shines at is in the space of helping create a Single View of a Customer as depicted below. This is made possible by doing advanced entity matching with the establishment and adoption of a lightweight entity ID service. This service will consist of entity assignment and batch reconciliation. The goal here is to get each business system to propagate the Entity ID back into their Core Banking, loan and payment systems, then transaction data will flow into the lake with this ID attached providing a way to do Customer 360.single-view-of-the-customer
  3. To be clear, we are advocating for a mix of both business rules and Data Science. Machine Learning is recommended as enables a range of business analytics across AML programs overcoming the limitations of a TMS. The first usecase is around Data Science for  – which is – Give me all transactions in one place, give me all the Case Mgmt files in one place, give me all of the customer data in one place and give me all External data (TBD) in one place. And the reason I want all of this is to perform Exploratory, hypothesis Data Science with the goal being to uncover areas of risk that one possibly missed out on before, find out areas that were not as risky as they thought were before so the risk score can be lowered and really constantly finding out the real Risk profile that your institution bears. E.g. Downgrading investment in your Trade financing as you are find a lot of Scrap Metal based fraudulent transactions.
  4. The other important value driver in deploying Data Science is to perform Advanced Transaction Monitoring Intelligence.  The core idea is to get years worth of Banking data in one location (the datalake) & then applying  unsupervised learning to glean patterns in those transactions. The goal is then to identify profiles of actors with the intent of feeding it into downstream surveillance & TM systems. This knowledge can then be used to –
  • Constantly learn transaction behavior for similar customers is very important in detecting laundering in areas like payment cards. It is very common to have retail businesses setup with the sole purpose of laundering money.
  • Discover transaction activity of trade finance customers with similar traits (types of businesses, nature of transfers, areas of operations etc.)
  • Segment customers by similar trasnaction behaviors
  • Understand common money laundering typologies and identify specific risks from a temporal and spatial/geographic standpoint
  • Improve and lear correlations between alert accuracy and suspicious activity reports (SAR) filings
  • Keep the noise level down by weeding out false positives

Benefits of a forward looking approach..  

We believe that we have a fresh approach that can help Banks with the following value drivers & metrics –

  • Detect AML violations on a proactive basis thus reducing the probability of massive fines
  • Save on staffing expenses for Customer Due Diligence (CDD)
  • Increase accurate production of suspicious activity reports (SAR)
  • Decrease the percent of corporate customers with AML-related account closures in the past year by customer risk level and reason – thus reducing loss of revenue
  • Decrease the overall KYC profile update backlog across geographies
  • Help create Customer 360 views that can help accelerate CLV (Customer Lifetime Value) as well as Customer Segmentation from a cross-sell/up-sell perspective

Big Data shines in all the above areas..

Conclusion…

The AML landscape will rapidly change over the next few years to accommodate the business requirements highlighted above. Regulatory authorities should also lead the way in adopting a Hadoop/ ML/Predictive Analytics based approach over the next few years. There is no other way to do tackle large & medium AML programs in a lower cost and highly automated manner.