Vamsi Chemitiganti's weekly musings on applying Big Data, Cloud, & Middleware technology to solving industry challenges. Published every Friday or Sunday (if I'm very busy). All opinions are entirely my own. I write this blog so my readers don't have to spend money on expensive consultants.
Over the last three years, we have examined a succession of business issues in the various sectors of financial services on this blog. These have ranged from the mundane (Trading, Risk management, Market Surveillance, Fraud detection, AML et al) to the transformative (Robo advisors, Customer Journeys, Blockchain, Bitcoin etc). We have also examined the changing paradigms in enterprise architecture – moving from siloed monolithic applications to cloud-native software. This blog summarizes the most 12 important technical posts on innovative application architectures.
Having spent the majority of my career working in Banking and Financial Services has made for a fascinating time. It is amazing to witness business transformation begin to occur across the landscape. However, this transformation is occurring on repeatedly discussed themes. A key challenge that CXOs and Enterprise Architecture teams face is how to deploy much-discussed technologies such as Cloud platforms, Big Data, Enterprise Middleware and AI in real-world architectures. This blog post sums up eleven real-world application architectures that industry leaders can use as a good reference point for their own implementations.
The common theme to all of the below architectures –
A focus on Cloud native concepts including microservices, lightweight backends, containers
Design Patterns that encourage new age Data Management techniques including Hadoop and Spark
Cloud-agnostic – whether that is public cloud or private cloud
Integrating business process management and business rules engines as first-class citizens
100% Open Source
#1 Design and Architecture of a Real World Trading Platform…
With each passing quarter, financial services is a business that looks increasingly in danger of disintermediation. CXOs have no alternative but to digitize their businesses. IT will be forced to support cloud-native technologies in both key areas – applications and infrastructure in pursuit of business goals. Developers will also be at the forefront of this change. Eventually, quality of Enterprise Architecture decides business destiny.
The natural tendency in the world of Corporate IT is to create applications in response to business challenges. For instance, take any large Bank or Insurer or Manufacturer – you will find thousands of packaged applications that aim to solve a range of challenges from departmental level issues to enterprise-wide business problems. Over years these have given rise to application and infrastructure sprawl.
The application mindset creates little business value over the long run while creating massive technology headaches. For instance, the rationalization of these applications over time becomes a massive challenge in and of itself. At times, IT does not even understand how relevant some of these applications are to business users, who are even using them and the benefits derived. Over the last 15 years, Silicon Valley players such as Apple, Google, and Facebook et al have begun illustrating the power of building platforms that connect a range of users to the businesses that serve them. As the Network Effects connected to using these platforms have grown exponentially, so have the users.
What Corporate IT & business need to learn to do is to move to a Platform mindset.
The Platform Strategy…
Amazon is the perfect example of how to conceive and execute a platform strategy over a couple of decades. It began life as a retailer in 1994 and over time morphed into other complementary offerings such as Marketplace, AWS, Prime Video, Payments etc. These platforms have led to an ever-increasing panoply of services, higher revenues, promoted more directed consumer interactions and higher network effects. Each platform generates its own revenue stream and is a large standalone corporation in its own right. However, the sum of these platforms is higher than the sum of the individual products and this has led to Amazon becoming the most valuable company in the world (as of late 2017).
So what are the key business benefits and drivers of a platform oriented model?
Driver #1 Platforms enable you to build business ecosystems
Platforms enable enterprise business to orient their core capabilities better and to be able to deliver on those. Once that is done to a high degree of success, partners and other ecosystem players can plug in their capabilities. The functionality that the platform provides is the ability to inter The challenge most times is that large companies always seem to play catchup with business models of nimbler players. When they do this, they often choose an application based approach which does not enable them to take a holistic view of their enterprise and the business ecosystems around them. In the Platform approach, IT departments move to more of a service model while delivering agile platforms and technology architectures for business lines to develop products around.
E.g. Post the PSD2 regulation, innovators in the European Banking system will become a prime example of platform led business ecosystems.
Driver #2 Platforms enable you to rethink and better the customer experience thus driving new revenue streams
The primary appeal of a platform based architecture is the ability to drive cross-sell and upsell opportunities. This increases not the number of products adopted by a given customer but also (and ultimately) the total revenue per customer.
The below blog post discusses how Payment Providers are increasingly using advanced analytics on their business platforms to generate not only increased topline/sales growth but also to defend against fraud and anti-money laundering (AML).
Driver #3 Platforms enable you to experiment with business models (e.g. Data Monetization)
The next progressive driver in leveraging both internal and external data is to use it to drive new revenue streams in existing lines of business. This is also termed Data Monetization. Data Monetization is the organizational ability to turn data into cost savings & revenues in existing lines of business and to create new revenue streams. This requires fusing both internal and external data to create new analytics and visualization.
Driver #4 Platforms destroy business process silos
One of the chief reasons that hold back an enterprise ability to innovate is the presence of both business and data silos. This is directly a result of an Application based approach. When underlying business processes & data sources are both fragmented, communication between business teams moves over to other internal & informal mechanisms such as email, chat and phone calls etc. This is an overall recipe for delayed business decisions which are ultimately ineffective as they depend more on intuition than are backed by data. The Platforms approach drives the organization towards unification and rationalization of both the data and the business process that creates it thus leading to a unified and consistent view of both across the business.
Driver #5 Platforms move you to become a Real-time Enterprise
Enterprises that are platform oriented does more strategic things right than wrong. They constantly experiment with creating new and existing business capabilities with a view to making them appealing to a rapidly changing clientele. They refine these using constant feedback loops and create platforms comprised of cutting-edge technology stacks that dominate the competitive landscape. The Real-Time enterprise demands that workers at many levels ranging from the line of business managers to executives have fresh, high quality and actionable information on which they can base complex yet high-quality business decisions.
A business and IT strategy built on platform approaches enable an organization to take on a much wider & richer variety of business challenges. This enables an organization to achieve outcomes that were not really possible with the Application model.
“There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don’t know. But there are also unknown unknowns. There are things we don’t know we don’t know.” – Donald Rumsfeld, 2002, Fmr US Secy of Defense
With machine learning increasing in popularity and adoption across industries, models are increasing in number and scope. McKinsey estimates that large enterprises have seen an increase of about 10 – 25% in their complex models which are being employed across areas as diverse as customer acquisition, risk management, insurance policy management, insurance claims processing, fraud detection and other advanced analytics. However, this increase is accompanied by a rise in model risk where incorrect model results, or design, contributes to erroneous business decisions. In this blog post, we discuss the need for model risk management (MRM) and a generic framework to achieve the same from an industry standpoint.
Model Risk Management in the Industry
The Insurance industry has extensively used predictive modeling across a range of business functions including policy pricing, risk management, customer acquisition, sales, and internal financial functions. However as predictive analytics has become increasingly important there is always a danger, or a business risk, incurred due to the judgment of the models themselves. While the definition of a model can vary from one company to another,we would like to define a model as a representation of some real-world phenomenon based on the real-world inputs (both quantitative and qualitative) shown to it, which is generated by operating on the inputs using an algorithm to produce a business insight or decision. The model can also provide some level of explanation for the reasons it arrived at the corresponding business insight. There are many ways to create and deliver models to applications. These vary from spreadsheets to specialized packages and platforms. We have covered some of these themes from a model development perspective in a previous blog @ – http://www.vamsitalkstech.com/?p=5321.
Models confer a multitude of benefits, namely:
The ability to reason across complex business scenarios spanning customer engagement, back-office operations, and risk management
The ability to automate decision-making based on historical patterns across large volumes of data
The audit-ability of the model which can explain to the business user how the model arrived at a certain business insight
The performance and the composition of a model depend on the intention of the designer. The reliability of the model depends primarily on access to adequate and representative data and secondly on the ability of the designer to model complex real-world scenarios and not always assume best-case scenarios.
As the financial crisis of 2008 illustrated, the failure of models brought down the insurance company AIG which caused severe disruption to the global financial system, set off the wider crisis in the global economy. Over the last few years, the growing adoption of Machine Learning models has resulted in their increased adoption into key business processes. This illustrates that if models go wrong, it can cause severe operational losses. This should illustrate the importance of putting in place a strategic framework for managing model risk.
A Framework for Model Risk Management
The goal of Model Risk Management (MRM) is to ensure that the entire portfolio of models is governed like any other business asset. To that effect, a Model Risk Management program needs to include the following elements:
Model Planning – The first step in the MRM process is to form a structure by which models created across the business are done so in a strategic and planned manner. This phase covers the ability to ensure that model objectives are well defined across the business, duplication is avoided, best practices around model development are ensured, & making sure modelers are provided the right volumes of data with high quality to create the most effective models possible. We have covered some of these themes around data quality in a previous blogpost @ http://www.vamsitalkstech.com/?p=5396
Model Validation & Calibration – As models are created for specific business functions, they must be validated for precision , and calibrated to reflect the correct sensitivity  & specificity  that the business would like to allow for. Every objective could have it’s own “sweet spot” (i.e. threshold) that they want to attain by using the model. For example: a company who wants to go green but realizes that not all of it’s customers have access to (or desire to use) electronic modes of communication might want to send out the minimum number of flyers that can get the message out but still keep their carbon footprint to a minimum without losing revenue by not reaching the correct set of customers. All business validation is driven by the business objectives that must be reached and how much wiggle room there is for negotiation.
Model Management – Models that have made it to this stage must now be managed. Management here reflects answering questions suck: who should use what model for what purpose, how long should the models be used without re-evaluation, what is the criteria for re-evaluation, who will monitor the usage to prevent wrong usage, etc. Management also deals with logistics like where do the models reside, how are they accessed & executed, who gets to modify them versus just use them, how will they be swapped out when needed without disrupting business processes dependent on them, how should they be versioned, can multiple versions of a model be deployed simultaneously, how to detect data fluctuations that will disrupt model behavior prior to it happening, etc.
Model Governance – Model Governance covers some of the most strategic aspects of Model Risk Management. The key goal of this process is to ensure that the models are being managed in conformance with industry governance and are being managed with a multistage process across their lifecycle – from Initiation to Business Value to Retirement.
Regulatory Guidance on Model Risk Management
The most authoritative guide on MRM comes from the Federal Reserve System – FRB SR 11-7/OCC Bulletin 2011-12.  And though it is not directly applicable to the insurance industry (it’s meant mainly for the banking industry), its framework is considered by many to contain thought leadership on this topic. The SR 11-7 framework includes documentation as part of model governance. An article in the Society of Actuaries April 2016 Issue 3 , details a thorough method to use for documenting a model, the process surrounding it, and why such information is necessary. In a highly regulated industry like insurance, every decision made (e.g. assumptions made, judgment calls given circumstances at the time, etc.) in the process of creating a model could be brought under scrutiny & effects the risk of the model itself. With adequate documentation you can attempt to mitigate any risks you can foresee, and have a good starting point for those that might blindside you down the road.
And Now a Warning…
Realize that even after putting MRM into place, models are still limited – they cannot cope with what Donald Rumsfeld dubbed the “unknown unknowns”. As stated in an Economist article : “Almost a century ago Frank Knight highlighted the distinction between risk, which can be calibrated in probability distributions, and uncertainty, which is more elusive and cannot be so neatly captured…The models may have failed but it was their users who vested too much faith in them”. Models, by their definition, are built using probability distributions based on previous experience to predict future outcomes. If the underlying probability distribution changes radically, they can no longer attempt to predict the future – because the assumption upon which they were built no longer holds. Hence the human element must remain vigilant and not put all their eggs into the one basket of automated predictions. A human should always question if the results of a model make sense and intervene when they don’t.
As the saying goes – “Models do not kill markets, people do.” A model is only as good as the assumptions and algorithm choices made by the designer, as well as the quality & scope of the data fed to it. However, enterprises need to put in place an internal model risk management program that ensures that their portfolio of models are constantly updated, enriched with data, and managed as any other strategic corporate asset. And never forget, that a knowledgeable human must remain in the loop.
With advances in various Blockchain based DLTs (distributed ledger technology) platforms such as HyperLedger & Etherium et al, enterprises have begun to take baby steps to adapt the Blockchain (BC) to industrial scale applications. This post discusses some of the stumbling blocks the author has witnessed enterprises are running into as they look to get started on this journey.
Blockchain meets the Enterprise…
The Blockchain is a system & architectural design pattern for recording (immutable) transactions while providing an unalterable historical audit trail. This approach (proven with the hugely successful Bitcoin) guarantees a high degree of security, transparency, and anonymity for distributed applications purpose built for it. Bitcoin is but the first application of this ground breaking design pattern.
Due to its origins in the Bitcoin ecosystem, there has been a high degree association of the Blockchain with the cryptocurrency movement. However, a wide range of potential enterprise applications has been identified in industries such as financial services, healthcare, manufacturing, and retail etc – as depicted in the below illustration.
Last year, we took an in-depth look into the business potential of the Blockchain design pattern at the below post.
We can then define the Enterprise Blockchain as “a highly secure, resilient, algorithmic & accurate globally distributed ledger (or global database or the biggest filesystem or the largest spreadsheet) that provides an infrastructure pattern to build multiple types of applications that help companies (across every vertical), consumers and markets discover new business models, transact, trade & exchange information & assets.”
While some early deployments and initial standards making activity have been seen in financial services and healthcare, it is also finding significant adoption in optimizing internal operations for globally diversified conglomerates. For instance, tech major IBM claims to host one of the largest blockchain enterprise deployments. The application known as IGF provides working capital to about to 4,000+ customers, distributors, and partners. IBM uses its blockchain platform to manage disputes in the $48 billion IGF program. . The near linear scalability of the blockchain ensures that the IGF can gradually increase the number of members participating in the network.
In particular, the Financial Services Industry has had several bodies aiming to create standards around use cases such as consumer and correspondent banking payments and around the trade lifecycle. Some examples of these are R3 Corda, HyperLedger, and Ethereum. However, there is still a large amount of technology innovation, adoption and ecosystem development that needs to happen before the technology is consumable by your everyday bank or manufacturer or insurer.
The Four Modes of Blockchain Adoption in the Enterprise…
There are certain criteria that need to be met for a business process to benefit from a distributed ledger. First off, the business process should comprise various actors (both internal and external to the organization), secondly, there should be no reason to have a central authority or middleman to verify daily transactions except when disputes arise. Third, the process should call for strict audit trail as well as transaction immutability. The assets stored on the blockchain can really be anything – data, contracts or transactions etc.
At a high level, there are four modes of adoption, or, ways in which a BC technology can make its way into an enterprise –
Organic Proof Of Concept’s (POC) – These are driven by innovation groups inside the company tasked with exploring the latest technology advances. Oftentimes, these are technology-driven initiatives in search of a business problem. The approach works like this – management targets specific areas in technology where the firm needs to develop capabilities around. The innovation team works on defining an appropriate technical approach, reference stack & architecture (in this case for applications that have been determined to be suitable to be POC’d on a DLT) et al. The risk in this approach is that much of the best practices, learnings etc from other organizations, vendors, and solution providers are not leveraged.
Participation in Industry Consortia – A consortium is a group of companies engaged in a similar business task. These kinds of initiatives are being driven by like minded enterprises banding together (within specific sectors such as financial services, insurance, and healthcare) to define common use-cases that can benefit from sector specific common standards from a DLT standpoint & the ensuing network effects. Consortiums tend to mitigate risk both from a business and a cost standpoint as several companies typically band together to explore the technology. However, these can be difficult to pull off many a time due to competitive and cultural reasons.
In many cases, Regulators are pushing industry leaders to look into use cases (such as Risk Management, BackOffice Processes, and Fraud Detection) which can benefit from adopting distributed ledger technology (DLT).
Partnerships with Blockchain start-ups – These arrangements enable the (slow to move) incumbent market leading enterprises to partner with the brightest entrepreneurial minds in the BC world who are building path-breaking applications which will upend business models. The focus of such efforts has been to identify a set of use-cases & technology approaches that would immensely help the organization from applying BC technology to their internal and external business challenges. The advantages of this approach are that the skills shortage when established companies tackle immature technology projects can be ameliorated by working with younger organizations.
Having noted all this, the majority of proof of concepts driven out of enterprises are failing or performing suboptimally.
I feel that this is due to various reasons some of which we will discuss below. Point to be noted is that we are assuming that there is strong buy-in around BC and DLT at the highest levels of the organization. Scepticism about this proven design pattern and overcoming it is quite another topic altogether.
The Key Considerations for a Successful Enterprise Blockchain or Distributed Ledger (DLT)…
CONSIDERATION #1 – Targeting the right business use case for the DLT…
As we saw in the above sections, the use cases identified for DLT need to reflect a few foundational themes -non-reliance on a middleman, a business process supporting a truly distributed deployment, building trust among a large number of actors/counterparties, ability to support distributed consensus, and transparency. Due to its flat, peer to peer nature – Blockchain/DLT conclusively eliminates the need for any middleman.It is important that a target use case be realistic from both a functional requirement standpoint as well as from a business process understanding. The majority of enterprise applications can do perfectly well with a centralized database and applying DLT technology to them can cause projects to fail.
CONSIDERATION #2 – The Revenge of the Non Functional requirements…
Generally speaking, the current state of DLT platforms is that they fall short in a few key areas that enterprises usually take for granted in other platforms such as Cloud Computing, Middleware, Data platforms etc. These include key areas such as data privacy, transaction throughput, high speed of performance etc. If one recalls, the community Blockchain (that Bitcoin was built on) prioritized anonymity over privacy. This can sometimes be undesirable in areas such as payments processing or healthcare where the identitiy of consumers is governed by strict KYC (Know Your Customer) mandates. Thus, from an industry standpoint most DLT platforms are 24 months or so away from coming up to par in these areas in a manner that enterprises can leverage them.
Some of the other requirements, such as performance and scalability, are sometimes not directly tied to business features but lack of support for them can stymie any ambitious intended use of the technology. For instance, a key requirement in payments processing and supplier management is the ability for the platform to process a large number of transactions per second. Most DLT’s can only process around ten transactions per second on a permissionless network. This is far far from the ideal throughput needed in use-cases such as payments processing, IoT etc.
The good news is that the DLT community are acutely aware of enhancements that need to be done to the underlying platforms (e.g reduced block size etc) to increase throughput but these changes will take time to make their way into the mass market given the rigorous engineering work that needs to happen.
The Blockchain/DLT is not a data management paradigm. This is important for adopters to understand. Also, there currently exist very few standards and guidance on integrating distributed applications (Dapps) custom built for DLTs with underlying enterprise assets. These assets include enterprise middleware stacks, identity management platforms, corporate security systems, application data silos, BPM (Business Process Management) and Robotic Process Automation systems etc. For the BC to work for any business capability and as a complete business solution, necessary integration must be provided with a reasonable number of backend systems that influence the business cases- most such integration is sorely lacking. Interoperability is still in its infancy despite vendor claims.
CONSIDERATION #4 – Understand that Smart Contracts are still in their infancy…
The blockchain introduces the important notion of programmable digital instruments or contracts. An important illustration of the possibilities of blockchain is this concept of a “Smart contract”. Instead of static data objects that are inserted into the distributed ledger, a Smart Contract is a program that can perform the generation of downstream actions when appropriate conditions are met. They only become immutable once accepted into the ledger. Business rules are embedded in a contract that can automatically trigger based on certain conditions being met. E.g. a credit pre-qualification or assets transferred after a payment is made or after legal approval is provided etc.
Smart Contracts are being spoken about as the key functionality for any DLT platform based on Blockchain. While this hype is justified in some sense, it should be noted that smart contracts are again not standards based across major DLT platforms. Which means that they’re not auditable & verifiable across both local and global jurisdictions or when companies use different underlying commercial DLTs. The technology will evolve over the next few years but it is still very early days to run large scale production grade applications that leverage Smart Contracts.
CONSIDERATION #5 – SECURITY and DATA PRIVACY CONCERNS…
The promise of the original blockchain platform which ran Bitcoin was very simple. It provided a truly secure, trustable and immutable record on which any digital asset could be run. Parties using the system were all in a permissionless mode which meant that their identities were hidden from one another and from any central authority. While this may work for Bitcoin like projects, the vast majority of industry verticals will need strong legal agreements and membership management capabilities which follow them. Accordingly, these platforms will need to be permission-ed.
CONSIDERATION #6 – Blockchain Implementations need to be treated as AN INTEGRAL part ofDigital Transformation…
Blockchain as a technology definitely sounds way more exotic than Digital projects which have all the idea currency at the moment. However, an important way to visualize the organizational BC is that it provides an environment of instantaneous collaboration with business partners and customers. That is a core theme of Digital Transformation as one can appreciate. Accordingly, Blockchain/DLT proof of concepts themselves should be centrally funded & governed, skills should be grown in this area from both a development, administration and project management standpoint. Projects should be tracked using fair business metrics and appropriate governance mechanisms instituted as with any other digital initiative.
Surely, Blockchain based distributed ledgers are going to usher in the next generation of distributed business processes. These will enable the easy transaction, exchange, and contraction of digital assets. However, before enterprises rush in, they need to perform an adequate degree of due diligence to avoid some of the pitfalls highlighted above.
This is the third in a series of blogs on Data Science that I am jointly authoring with Maleeha Qazi, (https://www.linkedin.com/in/maleehaqazi/). We have previously covered some of the inefficiencies that result from a siloed data science process @ http://www.vamsitalkstech.com/?p=5046 & the ideal way Data Scientists would like their models deployed for the maximal benefit and use – as a Service @ http://www.vamsitalkstech.com/?p=5321. As the name of this third blog post suggests, the success of a data science initiative depends on data. If the data going into the process is “bad” then the results cannot be relied upon. Our goal is to also suggest some practical steps that enterprises can take from a data quality & governance process standpoint.
“However, under the strong influence of the current AI hype, people try to plug in data that’s dirty & full of gaps, that spans years while changing in format and meaning, that’s not understood yet, that’s structured in ways that don’t make sense, and expect those tools to magically handle it. ” – Monica Rogati (Data Science Advisor and ex-VP Jawbone – 2017) 
Different posts in this blog have discussed Data Science and other Analytical approaches to some degree of depth. What is apparent is that whatever the kind of analytics – descriptive, predictive, or prescriptive – the availability of a wide range of quality data sources is key. However, along with volume and variety of data, the veracity, or the truth, in the data is as important. This blog post discusses the main factors that determine the quality of data from a Data Scientist’s perspective.
The Top Issues of Data Quality
As highlighted in the above illustration, the top quality issues that data assets typically face are the following:
Incomplete Data: The data provided for analysis should span the entire cross-section of known data about how the organization views its customers and products. This would include data generated from various applications that belong to the business, and external data bought from various vendors to enriched the knowledge base. The completeness criteria measures if all of the information about entities under consideration is available and useable.
Inconsistent & Inaccurate Data: Consistency measures what data values give conflicting information and must be fixed. It also measures if all the data elements conform to specific and uniform formats and are stored in a consistent manner. Inaccurate data either has duplicate, missing or erroneous values. It also does not reflect an accurate picture of the state of the business at the point in time it was pulled.
Lack of Data Lineage & Auditability: The data framework needs to support audit-ability, i.e provide an audit trail of how the data values were derived from source to analysis point; the various transformations performed on it to arrive at the data set being considered for analysis.
Lack of Contextuality: Data needs to be accompanied by meaningful metadata – data that describes the concepts within the dataset.
Temporally Inconsistent: This measures if the data was temporally consistent and meaningful given the time it was recorded.
What Business Challenges does Poor Data Quality Cause…
Data Quality causes the following business challenges in enterprises:
Customer dissatisfaction: Across industries like Banking, Insurance, Telecom & Manufacturing, the ability to get a unified view of the customer & their journey is at the heart of the enterprise’s ability to promote relevant offerings & detect customer dissatisfaction. Currently, most industry players are woeful at putting together this comprehensive Single View of their Customers (SVC). Due to operational silos, each department possesses its own siloed & limited view of the customer across multiple channels. These views are typically inconsistent, lack synchronization with other departments, & miss a high amount of potential cross-sell and upsell opportunities. This is a data quality challenge at its core.
Lost revenue: The Customer Journey problem has been an age-old issue which has gotten exponentially more complicated over the last five years as the staggering rise of mobile technology and the Internet of Things (IoT) have vastly increased the number of enterprise touch points that customers are exposed to in terms of being able to discover and purchase new products/services. In an OmniChannel world, an increasing number of transactions are being conducted online. In verticals like the Retail industry and Banking & Insurance industries, the number of online transactions conducted approaches an average of 40%. Adding to the problem, more and more consumers are posting product reviews and feedback online. Companies thus need to react in real-time to piece together the source of consumer dissatisfaction.
Time and cost in data reconciliation: Every large enterprise nowadays runs expensive data re-engineering projects due to their data quality challenges. These are an inevitable first step in other digital projects which cause huge cost and time overheads.
Increased time to market for key projects: Poor data quality causes poor data agility, which increases the time to market for key projects.
Poor data means suboptimal analytics: Poor data quality causes the analytics done using it to be suboptimal – algorithms will end up giving wrong conclusions because the input provided to them is incorrect at best & inconsistent at worst.
Why is Data Quality a Challenge in Enterprises
The top reasons why data quality has been a huge challenge in the industry are:
Prioritization conflicts: For most enterprises, the focus of their business is the product(s)/service(s) being provided, book-keeping is a mandatory but secondary concern. And since keeping the business running is the most important priority, keeping the books accurate for financial matters is the only aspect that gets most of the technical attention it deserves. Other data aspects are usually ignored.
Organic growth of systems: Most enterprises have gone through a series of book-keeping methods and applications, most of which have no compatibility with one another. Warehousing data from various systems as they are deprecated, merging in data streams from new systems, and fixing data issues as these processes happen is not prioritized till something on the business end fundamentally breaks. Band-aids are usually cheaper and easier to apply than to try and think ahead to what the business will need in the future, build it, and back-fill it with all the previous systems’ data in an organized fashion.
Lack of time/energy/resources: Nobody has infinite time, energy, or resources. Doing the work of making all the systems an enterprise chooses to use at any point in time talk to one another, share information between applications, and keep a single consistent view of the business is a near-impossible task. Many well-trained resources, time & energy is required to make sure this can be setup and successfully orchestrated on a daily basis. But how much is a business willing to pay for this? Most do not see short-term ROI and hence lose sight of the long-term problems that could be caused by ignoring the quality of data collected.
What do you want to optimize?: There are only so many balls an enterprise can have up in the air to focus on without dropping one, and prioritizing those can be a challenge. Do you want to optimize the performance of the applications that need to use, gather and update the data, OR do you want to make sure data accuracy/consistency (one consistent view of the data for all applications in near real-time) is maintained regardless? One will have to suffer for the other.
How to Tackle Data Quality
With the advent of Big Data and the need to derive value from ever increasing volumes and a variety of data, data quality becomes an important strategic capability. While every enterprise is different, certain common themes emerge as we consider the quality of data:
The sheer number of transaction systems found in a large enterprise causes multiple challenges across the data quality dimensions. Organizations need to have valid frameworks and governance models to ensure the data’s quality.
Data quality has typically been thought of as just data cleansing and fixing missing fields. However, it is very important to address the originating business processes that cause this data to take multiple dimensions of truth. For example, centralize customer onboarding in one system across channels rather than having every system do its own onboarding.
It is clear from the above that data quality and its management is not a one time or siloed application exercise. As part of a structured governance process, it is very important to adopt data profiling and other capabilities to ensure high-quality data.
Enterprises need to define both quantitative and qualitative metrics to ensure that data quality goals are captured across the organization. Once this is done, an iterative process needs to be followed to ensure that a set of capabilities dealing with data governance, auditing, profiling, and cleansing is applied to continuously ensure that data is brought up to, and kept at, a high standard. Doing so can have salubrious effects on customer satisfaction, product growth, and regulatory compliance.
We discussed the emergence of Application Programming Interfaces (APIs) as providing a key business capability in Digital Platforms @ http://www.vamsitalkstech.com/?p=3834. The next post then discussed the foundational technology, integration & governance capabilities that any Enterprise API Platform must support @ http://www.vamsitalkstech.com/?p=5102. This final post in the API series will discuss a deployment model for an API Management Platform.
The first two posts in this series discussed the business background to API Management and the need for an Enterprise API Strategy. While details will vary across vendor platforms, the intention of this post is to discuss key runtime components of an API management platform and the overall developer workflow in creating APIs & runtime workflow to that enables client applications to access them.
Architectural Components of an API Management Platform..
The important runtime components of an API management platform are depicted in the below illustration. Note that we have abstracted out network components (firewalls, reverse proxies, VLANs, switches etc) as well as the internal details of application architecture which would normally be impacted by an API Platform.
Let us cover the core components of the above:
API Gateway -The API Gateway has emerged as the dominant deployment artifact in API Architectures. As the name suggests Gateways are based on a facade design pattern. The Gateway (or typically a set of highly available Gateways) acts as a proxy to traffic between client applications (used by customers, partners and employees) and back end services (ranging from mainframes to microservices). The Gateway is essentially an appliance or a software process that abstracts all API traffic into an organization and exposes business capabilities typically via a REST interface. Clients are exposed different views of the same API – coarse grained or granular – depending on the kind of client application (thick/thin) and access control permissions. Gateways include protocol translation and request routing as their core functionality. It is also not uncommon to deploy multiple Gateways – in an internal and external fashion – depending on business requirements in terms of partner interactions etc. Gateways also include functionality such as caching requests for performance, load balancing, authentication, serving static content etc. The API Gateway can thus be managed using a set of policy controls. Performance characteristics such as throughput, scalability, caching, load balancing and failover are managed using a cluster of API Gateways. The introduction of an API Gateway also ensures that application design is impacted going forward. API Gateways can be implemented in many forms – as a software platform or as an appliance. Public cloud providers have also begun offering mature API Gateways that integrate well with a range of backend services that they provide both from an IaaS and a PaaS standpoint. For instance, Amazon’s API Gateway integrates natively with AWS Lambda and EC2 Container Service for microservice deployments on AWS.
Security -Though it is not a standalone runtime artifact, Security ends to be called out as one of the most important logical requirements of an API Management platform. APIs have to follow the same access control mechanisms, security constrains for different user roles etc as their underlying datasources. This is key as backend applications and organizational data need to be protected from a variety of targets – denial of service attacks, malware, access control violations etc. Accordingly, policy based protection using API keys, JSON/XML signature scanning & threat protection, encryption for Data in motion and at rest, OAuth support etc – all need to be provided as standard features.
Management and Monitoring -Ensuring that the exposed APIs are maintaining their QOS (Quality of Service) as helping admins monitor their quota of resource consumption is key from a Operations standpoint. Further, the M&M functionality should also aid operators in resolving complex systems issues and ensuring a high degree of availability during upgrades etc.
Billing and Chargeback -Here we refer to the ability to tie in the usage of APIs to back office applications that can charge users based on their metered usage of the backend applications. This is typically provided through logging and auditing capability.
Governance -From a Governance standpoint, the ability to track APIs across their lifecycle, a handy catalog of available APIs, an ability to audit their usage and the underlying assets they expose and the ability for business to set policies on their usage etc.
API Design Process..
An internal developer uses the developer toolkit (e.g. Eclipse with an offline plugin) and/or an API Designer tool included with a vendor platform to create the API based on organizational policies. Extensive CLI (Command Line Interface) is also provided to perform all functions which can be done using the GUI. These include, local unit & system test capabilities and an ability to publish the tested APIs to a repository from where the runtime can access, deploy and update the APIs.
From a data standpoint, multiple databases including RDBMS, NoSQL are supported for data access. During the creation of the API, depending on whether the developer already has an existing data model in mind, the business logic is mapped closely with the data schema, or, one can also work top down to create the backend once the API interface has been defined using a model driven approach. These also include settings for security permissions with support for OAuth and any other third party authentication dependencies.
Once defined and tested, the API is published onto the runtime. During this process access control privileges, access policies and the endpoint itself are defined. The API is then ready for external consumption and discovery.
Runtime Flow Across the Architecture..
In the simplest case – once the API has been deployed and tested it is made available for public discovery and consumption. Client Applications then begin to leverage the API and this can be done in a variety of ways. For example – user interactions on mobile applications, webpages and B2B services trigger calls to the API Gateway. The Gateway performs a range of functions to process the request – from security authorization to load-balancing before accessing policies setup for that particular API. The Gateway then invokes the API by calling the backend system typically via message oriented middleware such as an ESB or a Message Broker. Once the backend responds with the appropriate payload ,the data is sent to the requesting application. Systems and Administration teams can view detailed operational metrics and logs to monitor API performance.
A Note on Security..
It should come as no surprise that the security aspect of an API Management Platform is one of the most critical aspects of the implementation. While API Security is a good subject for a followup post and too exhaustive to be covered in a short blurb – several standards such as OAuth2, OpenID Connect, JSON Security & Policy languages are all topics that need to be explored by both organizational developers and administrators. Extensive flow mapping and scenario testing are mandated here. Also, endpoint security from a client application standpoint is key. Your Servers, Desktops, Supported Mobile devices need to be updated and secured with the latest antivirus & other standard IT Security/access control policies.
In this post, we tried to highlight the major components of an API Management Platform from a technology standpoint. While there are a range of commercial & open source platforms, it is important to evaluate them from a feature standpoint as well as from an ecosystem capability perspective as developers began implementing microservices based Digital Architectures.
As enterprises embark or continue on their Digital Journey, APIs are starting to emerge as a key business capability and one that we need to discuss. Regular readers of this blog will remember that APIs are one of the common threads across the range of architectures we have discussed in Banking, Insurance and IoT et al. In this blogpost, we will discuss the five key imperatives or business drivers for enterprises embarking on a centralized API Strategy.
API Management as a Native Digital Capability..
The use of application programming interfaces (APIs) has been well documented across web scale companies such as Facebook, Amazon and Google et al.Over the last decade, APIs have begun emerging as the primary means for B2B and B2C companies to interact with their customers, partners and employees. The leader enterprises already have Digital Platform efforts underway as opposed to creating standalone Digital applications. Digital Platforms aim to increase the number of product and client channels of interaction so that enterprises can reach customer audiences that were hitherto untapped. The primary mode of interaction with a variety of target audiences in such digital settings are via APIs.
APIs are widely interoperable, relatively easy to create and form the front end of many internet scale platforms. APIs are leveraged to essentially access the core services provided by these platforms and can be used to create partner and customer ecosystems. Leaders such as PayPal, Amazon & FinTechs such as Square, Mint etc have overwhelmingly used APIs as a way to not only open their platforms to millions of developers but also to offer innovative services.
As of 2015, programmableweb.com estimated that over 12,000 APIs were already being offered by enterprise firms. Leaders such as Salesforce.com were generating about 50% of their revenue through APIs. Salesforce.com created a thriving marketplace – AppExchange – for apps created by its partners that work on its platform which numbered around 300 at the time of writing. APIs were contributing 60% of revenues at eBay and a staggering 90% for Expedia.com. eBay uses APIs to create additional exposure for it’s products – list auctions on other websites, get bidder information about sold items, collect feedback on transactions, and list new items for sale. Expedia’s APIs allowed customers to use third party websites to book flights, cars, and hotels. 
While most of the Fortune 500 have already begun experimenting with the value that APIs can deliver, the conversation around these capabilities needs to be elevated from an IT level to a line of business to a CIO/Head of Marketing. APIs help generate significant revenue upside while enabling rapid experimentation in business projects. Examples of API usage abound in industries like Financial Services, Telecom, Retail and Healthcare.
The Main Kinds of APIs
While the categories of APIs will vary across industry, some types of APIs have been widely accepted. The three most popular from a high level are described below –
Private APIs – These are APIs defined for use by employees and internal systems within an organization or across a global company. By their very nature, they’re created for sensitive internal functions and have access to privileged functions that external actors cannot perform.
Customer APIs – Customer APIs are provided as a way of enabling used by global customers to conduct business using product/service distribution channels – examples include product orders, view catalogs etc. These carry a very limited set of privileges limited to customer facing actions in a B2C context.
Partner APIs – Partner APIs are used for varying levels of business to be able to perform business functions in the context of a B2B relationship. Examples include Affiliate programs in Retail, inventory management, Supply Orders in Manufacturing & Billing functions in Financial Services etc.The API provider hosts marketplaces that enable partner developers to create software that leverages these APIs.
The Five Business Drivers for an Enterprise API Strategy..
The question for enterprise executives then becomes, when do they begin to invest in a central API Management Platform? Is such a decision based on the API sprawl in the organization or the sheer number of APIs being manually managed etc?
While the exact trigger point may vary for every enterprise, Let us consider the five key value drivers..
Driver #1 APIs enable Digital Platforms to evolve into ecosystems
In my mind, the first and most important reason to move to a centralized API strategy is to evolve standalone Digital capabilities into a coherent Platform. Different lines of business can use these capabilities to develop an ecosystem of applications that can be offered as a SaaS (Software as a Service). The end state of Digital Platforms are to operate business systems at massive scale in terms of customers, partners and employees.
The two central ideas at the heart of a platform based approach are as follows –
Create new customer revenue streams by reaching out to new customer segments across the globe or in new (and non traditional) markets. Examples of these platforms abound in the business world. In financial services, Banks & Credit reporting agencies are able to monetize their assets of years of customer & product data by reselling them to interested third parties which use them either for new product creation or to offer services that simplify a pressing industry issue – Customer Onboarding.
Reduce cost in current business models by extending core processes to business partners and also by automating manual communication steps (which are almost always higher cost and inefficient). For instance, Amazon has built their retail business using partner APIs to extend retailing provisioning, entitlement, enablement and order fulfillment processes.
Driver #2 Impact the Customer experience
We have seen how mobile systems are a key source of customer engagement. Offering the customer a seamless experience while they transact with an organization is a key way of disarming competition. Accordingly, Digital projects emphasize the importance of capabilities such as Customer Journey Mapping (CJM) and Single View of Customer (SVC) as the minimum table stakes that they need to provide. For instance, in Retail Banking, players are feeling the pressure to move beyond the traditional transactional banking model to a true customer centric model by offering value added services on the customer data that they already possess. APIs are leveraged across such projects to enrich the views of the customer (typically with data from external systems) as well as to expose these views to customers themselves, business partners and employees.
This one is all too familiar to anyone working in technology. We have seen how both Cloud Computing & DevOps are the foundation of agile technology implementations across a range of back end resources. These include but are not limited to Compute, NAS/SAN storage, Big Data, Application platforms, and other middleware. Extending that idea, Cloud (IaaS/PaaS) is a set of APIs.
APIs are used to abstract out the internals of these underlying platform services. Application Developers and other infrastructure services use well defined APIs to interact with the platforms. These APIs enable the provisioning, deployment and management of platform services.
APIs have become the de facto model that provide developers and administrators with the ability to assemble Digital applications such as microservices using complicated componentry. Thus, there is a strong case to be made for adopting an API centric strategy when evolving to a Software Defined Datacenter.
A huge trend on the developer side has been the evolution of Continuous build, integration and deployment processes. The integration of APIs into the DevOps process has begun with use cases ranging from using publicly available APIs being used to trigger CI jobs to running CI/CD jobs using a cloud based provider.
Driver #4 APIs enable Business & Product Line Experimentation
APIs thus enable companies to constantly churn out innovative offerings while still continuously adapting & learning from customer feedback. Internet scale companies such as Facebook provide edge APIs that enable thousands of companies to write applications that driver greater customer volumes to the Facebook platform. The term API Economy is increasingly in vogue and it connotes a loosely federated ecosystem of companies, consumers, business models and channels
The API economy is a set of business models and channels — based on secure access of functionality and the exchange of data to an ecosystem of developers and the users of the app constructs they build — through an API, either within a company or via the internet, with business partners and customers.
Driver #5 Increasingly, APIs are needed to comply with Regulatory Mandates
We have already seen how key industries such as Banking and Financial Services, regulatory authorities are at the forefront of forcing incumbents to support Open APIs. APIs thus become a mechanism for increasing competition to benefit consumer choice. The Regulators are changing the rules of participation in the banking & payments industry and APIs are a key enabling factor in this transformation.
Financial Services, Healthcare, Telecom and Retail.. a case in point for why APIs present an Enormous Opportunity for the Fortune 500..
Banking – At various times, we have highlighted various business & innovation issues with Banking providers in the areas of Retail Banking, Payment Providers and Capital Markets. Regimes such as Payment Systems Directive (PSD2) in the EU will compel staid industry players to innovate faster than they otherwise would. FinTechs across the industry offer APIs to enable third party services to use their offerings.
Healthcare – there is broad support in the industry for Open APIs to drive improved patient care & highly efficient billing processes as well as to ensure realtime engagement across stakeholders.
In the Telecom industry, nearly every large operator has developed APIs which are offered to customers and the developer community. Companies such as AT&T and Telefonica are using their anonymized access to hundreds of millions of subscribers to grant large global brands access to nonsensitive customer data. Federated platforms such as the GSM Association’s oneAPI are already promoting the usage of industry APIs.
Retailers are building new business models based on functionality such as Product Catalogs, Product Search, Online Customer Orders, Inventory Management and Advanced Analytics (such as Recommendation Engines). APIs enable retailers to expand their footprints beyond the brick and mortar store & an online presence.
Ranking Your API Maturity..
Is there a maturity model for APIs? We can try listing those into three different strategic options for Banks. Readers can extrapolate these into for their specific industry segment.
Minimally Compliant Enterprises – Here we should categorize Companies that seek to provide compliance with a minimal Open API. Taking the example of Banking, while this may be the starting point for several organizations, staying too long in this segment will mean gradual market share erosion as well as a loss of customer lifetime value (CLV) over time. The reason for this is that FinTechs and other startups will offer a range of services such as Instant mortgages, personal financial management tools, paperless approval processes for a range of consumer accounts etc. It is also anticipated that such organizations will treat their API strategy as a localized effort and will allocate personnel to the project mainly around the front office and marketing.
Digital Starters -Players that have begun exploring opening up customer data but are looking to support the core Open API but also introduce their own proprietary APIs. While this approach may work in the short to medium term, it will only impose integration headaches on the banks as time goes on.
Digital Innovators – The Digital Innovators will lead the way in adopting APIs. These companies will fund dedicated teams in lines of business serving their particular customer segments either organically or using partnerships with third party service providers. They will not only adhere to the industry standard APIs but also extend these specs to create own services with a focus on data monetization.
Increasingly, a company’s APIs represent a business development tool and a new go-to-market channel that can generate substantial revenues from referrals and usage fees. Given the strategic importance and revenue potential of this resource, the C-suite must integrate APIs into its corporate decision making.
The next post will take a technical look into the core (desired) features of an API Management Platform.
A few weeks ago on the invitation of DZone Magazine, I jointly authored a Big Data Reference Architecture along with my friend & collaborator, Tim Spann (https://www.linkedin.com/in/timothyspann/). Tim & I distilled our experience working on IIoT projects to propose an industrial strength digital architecture. It brings together several technology themes – Big Data , Cyber Security, Cognitive Applications, Business Process Management and Data Science. Our goal is to discuss a best in class architecture that enables flexible deployment for new IIoT capabilities allowing enterprises to build digital applications. The abridged article was featured in the new DZone Guide to Big Data: Data Science & Advanced Analytics which can be downloaded at https://dzone.com/guides/big-data-data-science-and-advanced-analytics
How the Internet Of Things (IoT) leads to the Digital Mesh..
The Internet of Things (IoT) has become one of the four top hyped up technology paradigms affecting the world of business. The other usual suspects being Big Data, AI/Machine Learning & Blockchain. Cisco predicts that the IOT is expected to impact about 25 billion connected things by 2020 and affect about $2 trillion of economic value globally across a diverse range of verticals. These devices are not just consumer oriented devices such as smartphones and home monitoring systems but dedicated industry objects such as sensors, actuators, engines etc.
The interesting angle to all this is the fact that autonomous devices are already beginning to communicate with one another using IP based protocols. They largely exchanging state & control information around various variables. With the growth of computational power on these devices, we are not far off from their sending over more granular and interesting streaming data – about their environment, performance and business operations – all of which will enable a higher degree of insightful analytics to be performed on the data. Gartner Research has termed this interconnected world where decision making & manufacturing optimization can occur via IoT as the “Digital Mesh“.
The evolution of technological innovation in areas such as Big Data, Predictive Analytics and Cloud Computing now enables the integration and analysis of massive amounts of device data at scale while performing a range of analytics and business process workflows on the data.
According to Gartner, the Digital Mesh will thus lead to an interconnected data information deluge powered by the continuous data from these streams. These streams will encompasses classical IoT endpoints (sensors, field devices, actuators etc) sending data in a variety of formats – text, audio, video & social data streams – along with new endpoints in areas as diverse as Industrial Automation, Remote Healthcare, Public Transportation, Connected Cars, Home Automation etc. These intelligent devices will increasingly begin communicating with their environments in a manner that will encourage collaboration in a range of business scenarios. The industrial cousin of IoT is the Industrial Internet of Things (IIIoT).
Defining the Industrial Internet Of Things (IIoT)
The Industrial Internet of Things (IIoT) can be defined as a ecosystem of capabilities that interconnects machines, personnel and processes to optimize the industrial lifecycle. The foundational technologies that IIoT leverages are Smart Assets, Big Data, Realtime Analytics, Enterprise Automation and Cloud based services.
The primary industries impacted the most by the IIoT will include Industrial Manufacturing, the Utility industry, Energy, Automotive, Transportation, Telecom & Insurance.
According to Markets and Markets, the annual worldwide Industrial IoT market is projected to exceed $319 billion in 2020, which represents an 8% a compound annual growth rate (CAGR). The top four segments are projected to be manufacturing, energy and utilities, auto & transportation and healthcare.
Architectural Challenges for Industrial IoT versus Consumer IoT..
Consumer based IoT applications generally receive the lion’s share of media attention. However the ability of industrial devices (such as sensors) to send ever more richer data about their operating environment and performance characteristics is driving a move to Digitization and Automation across a range of industrial manufacturing.
Thus, there are four distinct challenges that we need to account for in an Industrial IOT scenario as compared to Consumer IoT.
The IIoT needs RobustArchitectures that are able to handle millions of device telemetry messages per second. The architecture needs to take into account that all kinds of devices operating in environments ranging from the constrained to
IIoT also calls for the highest degrees of Infrastructure and Application reliability across the stack. For instance, a lost message or dropped messages in a healthcare or a connected car scenario may mean life or death for a patient, or, an accident.
An ability to integrate seamlessly with existing Information Systems. Lets be clear, these new age IIOT architectures need to augment existing systems such as Manufacturing Execution Systems (MES) or Traffic Management Systems. In Manufacturing, MES systems continually improve the product lifecycle and perform better resource scheduling and utilization. This integration helps these systems leverage the digital intelligence and insights across (potentially) millions of devices across complex areas of operation.
An ability to incorporate richer kinds of analytics than has been possible before that provide a great degree of context. This ability to reason around context is what provides an ability to design new business models which cannot be currently imagined due to lack of agility in the data and analytics space.
What will IIoT based Digital Applications look like..
Digital Applications are being designed for specific device endpoints across industries. While the underlying mechanisms and business models differ from industry to industry, all of these use predictive analytics based on a combination of real time data processing & data science algorithms. These techniques extract insights from streaming data to provide digital services on existing toolchains, provide value added customer service, predict device performance & failures, improve operational metrics etc.
Examples abound. For instance, a great example in manufacturing is the notion of a Digital Twin which Gartner called out last year. A Digital twin is a software personification of an Intelligent device or system. It forms a bridge between the real world and the digital world. In the manufacturing industry, digital twins can be setup to function as proxies of Things like sensors and gauges, coordinate measuring machines, vision systems, and white light scanning. This data is sent over a cloud based system where it is combined with historical data to better maintain the physical system.
The wealth of data being gathered on the shop floor will ensure that Digital twins will be used to reduce costs and increase innovation. Thus, in global manufacturing – Data science will soon make it’s way into the shop floor to enable the collection of insights from these software proxies. We covered the phenomenon of Servitization in manufacturing in a previous blogpost.
In the Retail industry, an ability to detect a customer’s location in realtime and combining that information with their historical buying patterns can drive real time promotions and an ability to dynamically price retail goods.
Solution Requirements for an IIoT Architecture..
At a high level, the IIoT reference architecture should support six broad solution areas-
Device Discovery – Discovering a range of devices (and their details) on the Digital Mesh for an organization within and outside the firewall perimeter
Performing Remote Lifecycle Configuration of these devices ranging from startup to modification to monitoring to shut down
Performing Deep Security level introspection to ensure the patch levels etc are adequate
Creating Business workflows on the Digital Mesh. We will do this by marrying these devices to enterprise information systems (EISs)
Performing Business oriented Predictive Analytics on these devices, this is critical to
On a futuristic basis, support optional integration with the Blockchain to support a distributed organizational ledger that can coordinate activity across all global areas that an enterprise operates in.
Building Blocks of the Architecture
Listed below are the foundational blocks of our reference architecture. Though the requirements will vary across industries, an organization can reasonably standardize on a number of foundational components as depicted below and then incrementally augment them as the interactions between different components increase based on business requirements.
Our reference architecture includes the following major building blocks –
Device Integration Layer
Data & Middleware Tier
Digital Application Layer
It also includes the following cross cutting concerns which span across the above layers –
Device and Data Security
Business Process Management
Data Governance – Provenance, Auditing, Logging
The next section provides a brief overview of the reference architecture’s components at a logical level.
Device Layer –
The first requirement of IIIoT implementations is to support connectivity from the Things themselves or the Device layer depicted at the bottom. The Device layer includes a whole range of sensors, actuators, smartphones, gateways and industrial equipment etc. The ability to connect with devices and edge devices like routers, smart gateways using a variety of protocols is key. These network protocols include Ethernet, WiFi, and Cellular which can all directly connect to the internet. Other protocols that need a gateway device to connect include Bluetooth, RFID, NFC, Zigbee et al. Devices can connect directly with the data ingest layer shown above but it is preferred that they connect via a gateway which can perform a range of edge processing.
This is important from a business standpoint for instance, in certain verticals like healthcare and financial services, there exist stringent regulations that govern when certain identifying data elements (e.g. video feeds) can leave the premises of a hospital or bank etc. A gateway cannot just perform intelligent edge processing but also can connect thousands of device endpoints and facilitate bidirectional communication with the core IIoT architecture.
The ideal tool for these constantly evolving devices, metadata, protocols, data formats and types is Apache NiFi. These agents will send the data to an Apache NiFi gateway or directly into an enterprise Apache NiFi cluster in the cloud or on-premise.
A subproject of Apache NiFi – MiNiFi provides a complementary data collection approach that supplements the core tenets of NiFi in dataflow management. However due to its small footprint and low resource consumption, is well suited to handle dataflow from sensors and other IOT devices. It provides central management of agents while providing full chain of custody information on the flows themselves.
For remote locations, more powerful devices like the Arrow BeagleBone Black Industrial and MyPi Industrial, it is very simple to run a tiny Java or C++ MiNiFi agent for secure connectivity needs.
The data sent by the device endpoints are then modeled into an appropriate domain representation based on the actual content of the messages. The data sent over also includes metadata around the message. A canonical model can optionally be developed (based on the actual business domain) which can support a variety of applications from a business intelligence standpoint.
Apache NiFi supports the flexibility of ingesting changing file formats, sizes, data types and schemas. The devices themselves can send a range of feeds in different formats. E.g. XML now and based on upgraded capabilities – richer JSON tomorrow. NiFi supports ingesting any file type that the devices or the gateways may send. Once the messages are received by Apache NiFi, they are enveloped in security with every touch to each flow file controlled, secured and audited. NiFi flows also provide full data provenance for each file, packet or chunk of data sent through the system. NiFi can work with specific schemas if there are special requirements for file types, but it can also work with unstructured or semi structured data just as well. From a scalability standpoint, NiFi can ingest 50,000 streams concurrently on a zero-master shared nothing cluster that horizontally scales via easy administration with Apache Ambari.
Data and Middleware Layer –
The IIIoT Architecture recommends a Big Data platform with native message oriented middleware (MOM) capabilities to ingest device mesh data. This layer will also process device data in such a fashion – batch or real-time – as the business needs demand.
Application protocols such as AMQP, MQTT, CoAP, WebSockets etc are all deployed by many device gateways to communicate application specific messages. The reason for recommending a Big Data/NoSQL dominated data architecture for IIOT is quite simple. These systems provide Schema on Read which is an innovative data handling technique. In this model, a format or schema is applied to data as it is accessed from a storage location as opposed to doing the same while it is ingested. From an IIOT standpoint, one must not just deal with the data itself but also metadata such as timestamps, device id, other firmware data such as software version, device manufactured data etc. The data sent from the device layer will consist of time series data and individual measurements.
The IIoT data stream can thus be visualized as a constantly running data pump which is handled by a Big Data pipeline takes the raw telemetry data from the gateways, decides which ones are of interest and discards the ones not deemed significant from a business standpoint. Apache NiFi is your gateway and gate keeper. It ingests the raw data, manages the flow of thousands of producers and consumers, does basic data enrichment, sentiment analysis in stream, aggregation, splitting, schema translation, format conversion and other initial steps to prepare the data. It does that all with a user-friendly web UI and easily extendible architecture.It will then send raw or processed data to Kafka for further processing by Apache Storm, Apache Spark or other consumers.Apache Storm is a distributed real-time computation engine that reliably processes unbounded streams of data.Storm excels at handling complex streams of data that require windowing and other complex event processing. While Storm processes stream data at scale, Apache Kafka distributes messages at scale. Kafka is a distributed pub-sub real-time messaging system that provides strong durability and fault tolerance guarantees. NiFi, Storm and Kafka naturally complement each other, and their powerful cooperation enables real-time streaming analytics for fast-moving big data. All the stream processing is handled by NiFi-Storm-Kafka combination.
Appropriate logic is built into the higher layers to support device identification, ID lookup, secure authentication and transformation of the data. This layer will process data (cleanse, transform, apply a canonical representation) to support Business Automation (BPM), BI (business intelligence) and visualization for a variety of consumers. The data ingest layer will also providing notification and alerts via Apache NiFi.
Here are some typical uses for this event processing pipeline:
a. Real-time data filtering and pattern matching
b. Enrichment based on business context
c. Real-time analytics such as KPIs, complex event processing etc
d. Predictive Analytics
e. Business workflow with decision nodes and human task nodes
Digital Application Tier –
Once IIoT knowledge has become part of the Hadoop based Data Lake, all the rich analytics, machine learning and deep learning frameworks, tools and libraries now become available to Data Scientists and Analysts. They can easily produce insights, dashboards, reports and real-time analytics with IIoT data joined with existing data in the lake including social media data, EDW data, log data. All your data can be queried with familiar SQL through a variety of interfaces such as Apache Phoenix on HBase, Apache Hive LLAP and Apache Spark SQL. Using your existing BI tools or the open sourced Apache Zeppelin, you can produce and share live reports. You can run TensorFlow in containers on YARN for deep learning insights on your images, videos and text data; while running YARN clustered Spark ML pipelines fed by Kafka and NiFi to run streaming machine learning algorithms on trained models.
A range of predictive applications are suitable for this tier. The models themselves should seek to answer business questions around things like -Asset failure, the key performance indicators in a manufacturing process and how they’re trending, insurance policy pricing etc.
Once the device data has been ingested into a modern data lake, key functions that need to be performed include data aggregation, transformation, enriching, filtering, sorting etc.
As one can see, this can get very complex very quick – both from a data storage and processing standpoint. A Cloud based infrastructure with its ability to provide highly scalable compute, network and storage resources is a natural fit to handle bursty IIoT applications. However, IIoT applications add their own diverse requirements of computing infrastructure, namely the ability to accommodate hundreds of kinds of devices and network gateways – which means that IT must be prepared to support a large diversity of operating systems and storage types
The tier is also responsible for the integration of the IIoT environment into the business processes of an enterprise. The IIoT solution ties into existing line-of-business applications and standard software solutions through adapters or Enterprise Application Integration (EAI) and business-to-business (B2B) gateway capabilities. End users in business-to-business or business-to-consumer scenarios will interact with the IIOT solution and the special- purpose IIoT devices through this layer. They may use the IIoT solution or line-of-business system UIs, including apps on personal mobile devices, such as smartphones and tablets.
The topic of Security is perhaps the most important cross cutting concern across all layers of the IIoT architecture stack. Needless to say, each of the layers must support the strongest data encryption, authentication and authentication capabilities for devices, users and partner applications. Accordingly, capabilities must be provided to ingest and store security feeds, IDS logs for advanced behavioral analytics, server logs, device telemetry. These feeds must be constantly analyzed across three domains – the Device domain, the Business domain and the IT domain. The below blogpost delves into some of these themes and is a good read to get a deeper handle on this issue from a SOC (security operations center) standpoint.
It is evident from the above that IIoT will enormous opportunity for businesses globally. It will also create layers of complexity and opportunity for Enterprise IT. The creation of smart digital services on the data served up will further depend on the vertical industries. Whatever be the kind of business model – whether tracking behavior, location sensitive pricing, business process automation etc – the end goal of IT architecture should be to create enterprise business applications that are ultimately data native and analytics driven.
“Any enterprise CEO really ought to be able to ask a question that involves connecting data across the organization, be able to run a company effectively, and especially to be able to respond to unexpected events. Most organizations are missing this ability to connect all the data together.” Tim Berners Lee -(English computer scientist, best known as the inventor of the World Wide Web)
We have discussed vertical industry business challenges across sectors like Banking, Insurance, Retail and Manufacturing in some level of detail over the last two years. Though enterprise business models vary depending on the industry, there is a common Digital theme raging across all industries in 2017. Every industry is witnessing an upswing in the numbers of younger and digitally aware customers. Estimates of this influential population are as high as 40% in areas such as Banking and Telecommunications. They represent a tremendous source of revenue but can also defect just as easily if the services offered aren’t compelling or easy to use – as the below illustration re the Banking industry illustrates.
These customers are Digital Natives i.e they are highly comfortable with technology and use services such as Google, Facebook, Uber, Netflix, Amazon, Google etc almost hourly in their daily lives. As a consequence, they expect a similar seamless & contextual experience while engaging with Banks, Telcos, Retailers, Insurance companies over (primarily) digital channels. Enterprises then have a dual fold challenge – to store all this data as well as harness it for real time insights in a way that is connected with internal marketing & sales.
As many studies have shown, companies that constantly harness data about their customers and perform speedy advanced analytics outshine their competition. Does that seem a bombastic statement? Not when you consider that almost half of all online dollars spent in the United States in 2016 were spent on Amazon and almost all digital advertising revenue growth in 2016 was accounted by two biggies – Google and Facebook. 
Let us now return to the average Enterprise. The vast majority of industrial applications (numbering around an average of 1000+ applications at large enterprises according to research firm NetSkope) generally lag the innovation cycle. This is because they’re created using archaic technology platforms by teams that conform to rigid development practices. The Fab Four (Facebook Amazon Google Netflix) and others have shown that Enterprise Architecture is a business differentiator but the Fortune 500 have not gotten that message as yet. Hence they largely predicate their software development on vendor provided technology instead of open approaches. This anti-pattern is further exacerbated by legacy organizational structures which ultimately leads to these applications holding a very parochial view of customer data. These applications can typically be classified in one of the buckets – ERP, Billing Systems, Payment Processors, Core Banking Systems, Service Management Systems, General Ledger, Accounting Systems, CRM, Corporate Email, Salesforce, Customer On-boarding etc etc.
These enterprise applications are then typically managed by disparate IT groups scattered across the globe. They often serve different stakeholders who seem to have broad overlapping interests but have conflicting organizational priorities for various reasons. These applications then produce and data in silos – localized by geography, department, or, line of business, or, channels.
Organizational barriers only serve to impede data sharing for various reasons – ranging from competitive dynamics around who owns the customer relationship, regulatory reasons to internal politics etc. You get the idea, it is all a giant mishmash.
Before we get any further, we need to define that dreaded word – Silo.
What Is a Silo?
A mind-set present in some companies when certain departments or sectors do not wish to share information with others in the same company. This type of mentality will reduce the efficiency of the overall operation, reduce morale, and may contribute to the demise of a productive company culture. (Source- Business Dictionary -)
Data is the Core Asset in Every Industry Vertical but most of it is siloed in Departments, Lines of Business across Geographies..
Let us be clear, most Industries do not suffer from a shortage of data assets. Consider a few of the major industry verticals and a smattering of the kinds of data that players in these areas commonly possess –
Data In Banking–
Customer Account data e.g. Names, Demographics, Linked Accounts etc
Core Banking Data going back decades
Transaction Data which captures the low level details of every transaction (e.g debit, credit, transfer, credit card usage etc)
Wire & Payment Data
Trade & Position Data
General Ledger Data e.g AP (accounts payable), AR (accounts receivable), cash management & purchasing information etc.
Data from other systems supporting banking reporting functions.
DATA IN HEALTHCARE–
Structured Clinical data e.g. Patient ADT information
Free hand notes
Patient Insurance information
Patient Trial Data
Medical Images – e.g. CAT Scans, MRIs, CT images etc
DATA IN MANUFACTURING–
Supply chain data
Operational data from the shop floor
Sensor & telemetry data
Sales campaign data
The typical flow of data in an enterprise follows a familiar path –
Data is captured in large quantities as a result of business operations (customer orders, e commerce transactions, supply chain activities, Partner integration, Clinical notes et al). These feeds are captured using a combination of techniques – mostly ESB (Enterprise Service Bus) and Message Brokers.
The raw data streams then flow into respective application owned silos where over time a great amount of data movement (via copying, replication and transformation operations – the dreaded ETL) occurs using proprietary vendor developed systems. Vendors in this space have not only developed shrink wrapped products that make them tens of billions of dollars annually but also imposed massive human capital requirements of enterprises to program & maintain these data flows.
Once all of the relevant data has been normalized, transformed and then processed, it is then copied over into business reporting systems where it is used to perform a range of functions – typically for reporting for use cases such as Customer Analytics, Risk Reporting, Business Reporting, Operational improvements etc.
Rinse and repeat..
Due to this old school methodology of working with customer, operational data, most organizations have no real time data processing capabilities in place & they thus live in a largely reactive world. What that means is that their view of a given customers world is typically a week to 10 days old.
Another factor to consider is – the data sources described out above are what can be described as structured data or traditional data. However, organizations are now on-boarding large volumes of unstructured data as has been captured in the below blogpost. Oftentimes, it is easier for Business Analysts, Data Scientists and Data Architects to get access to external data faster than internal data.
Getting access to internal data typically means jumping over multiple hoops from which department is paying for the feeds, the format of the feeds, regulatory issues, cyber security policy approvals, SOX/PCI compliance et al. The list is long and impedes the ability of business to get things done quickly.
Since Gene Kim coined the term ‘Technical Debt‘ , it has typically been used in an IT- DevOps- Containers – Data Center context. However, technology areas like DevOps, PaaS, Cloud Computing with IaaS, Application Middleware, Data centers etc in and of themselves add no direct economic value to customers unless they are able to intelligently process Data. Data is the most important technology asset compared to other IT infrastructure considerations. You do not have to take my word for that. It so happens that The Economist just published an article where they discuss the fact that the likes of Google, Facebook, Amazon et al are now virtual data monopolies and that global corporations are way way behind in the competitive race to own Data .
Thus, it is ironic that while the majority of traditional Fortune 500 companies are still stuck in silos, Silicon Valley companies are not just fast becoming the biggest owners of global data but are also monetizing them on the way to record profits. Alphabet (Google’s corporate parent), Amazon, Apple, Facebook and Microsoft are the five most valuable listed firms in the world. Case in point – their profits are around $25bn in the first quarter of 2017 and together they make up more than half the value of the NASDAQ composite index. 
The Five Business Challenges that Data Fragmentation causes (or) Death by Silo …
How intelligently a company harnesses it’s data assets determines it’s overall competitive position. This truth is being evidenced in sectors like Banking and Retail as we have seen in previous posts.
What is interesting, is that in some countries which are concerned about the pace of technological innovation, National regulatory authorities are creating legislation to force slow moving incumbent corporations to unlock their data assets. For example, in the European Union as a result of regulatory mandates – the PSD2 & Open Bank Standard – a range of agile players across the value chain (e.g FinTechs ) will soon be able to obtain seamless access to a variety of retail bank customer data by accessing using standard & secure APIs.
Once obtained the data can help these companies can reimagine it in manifold ways to offer new products & services that the banks themselves cannot. A simple use case can be that they can provide personal finance planning platforms (PFMs) that help consumers make better personal financial decisions at the expense of the Banks owning the data. Surely, FinTechs have generally been able to make more productive use of client data than have banks. They do this by providing clients with intuitive access to cross asset data, tailoring algorithms based on behavioral characteristics and by providing clients with a more engaging and unified experience.
Why cannot the slow moving established Banks do this? They suffer from a lack of data agility due to the silos that have been built up over years of operations and acquisitions. None of these are challenges for the FinTechs which can build off of a greenfield technology environment.
To recap, let us consider the five ways in which Data Fragmentation hurts enterprises –
#1 Data Silos Cause Missed Top line Sales Growth –
Data produced by disparate applications which use scattered silos to store them causes challenges in enabling a Single View of a customer across channels, products and lines of business. This then makes everything across the customer lifecycle a pain – ranging from smooth on-boarding, to customer service to marketing analytics. Thus, it impedes an ability to segment customers intelligently, perform cross sell & up sell. This sheer inability to understand customer journeys (across different target personas) also leads customer retention issues. When underlying data sources are fragmented, communication between business teams moves over to other internal mechanisms such as email, chat and phone calls etc. This is a recipe for delayed business decisions which are ultimately ineffective as they depend more on intuition than are backed by data.
#2 Data Silos are the Root Cause of Poor Customer Service –
Across industries like Banking, Insurance, Telecom & Manufacturing, the ability to get a unified view of the customer & their journey is at the heart of the the enterprises ability to understand their customers preferences & needs. This is also crucial in promoting relevant offerings and in detecting customer dissatisfaction. Currently most enterprises are woefully inadequate at putting together this comprehensive Single View of their Customers (SVC). Due to operational silos, each department possess a silo & limited view of the customer across other silos (or channels). These views are typically inconsistent in and of themselves as they lack synchronization with other departments. The net result is that the companies typically miss a high amount of potential cross-sell and up-sell opportunities.
#3 – Data Silos produce Inaccurate Analytics –
First off most Analysts need to wait long times to acquire the relevant data they need to test their hypotheses. Thus, since the data they work on is of poor quality as a result of fragmentation, so are the analytics operate on the data.
Let us take an example in Banking, Mortgage Lending, an already complex business process has been made even more so due to the data silos built around Core Banking, Loan Portfolio, Consumer Lending applications.Qualifying borrowers for Mortgages needs to be based on not just historical data that is used as part of the origination & underwriting process (credit reports, employment & income history etc) but also data that was not mined hitherto (social media data, financial purchasing patterns,). It is a well known fact there are huge segments of the population (especially the millennials) who are broadly eligible but under-banked as they do not satisfy some of the classical business rules needed to obtain approvals on mortgages. Each of the silos store partial customer data. Thus, Banks do not possess an accurate and holistic picture of a customer’s financial status and are thus unable to qualify the customer for a mortgage in quick time with the best available custom rate.
#4 – Data Silos hinder the creation of new Business Models –
The abundance of data created over the last decade is changing the nature of business. If it follows that enterprise businesses are being increasingly built around data assets, then it must naturally follow that data as a commodity can be traded or re-imagined to create revenue streams off it. As an example, pioneering payment providers now offer retailers analytical services to help them understand which products perform best and how to improve the micro-targeting of customers. Thus, data is the critical prong of any digital initiative. This has led to efforts to monetize on data by creating platforms that either support ecosystems of capabilities. To vastly oversimplify this discussion ,the ability to monetize data needs two prongs – to centralize it in the first place and then to perform strong predictive modeling at large scale where systems need to constantly learn and optimize their interactions, responsiveness & services based on client needs & preferences. Thus, Data Silos hurt this overall effort more than the typical enterprise can imagine.
#5 – Data Silos vastly magnify Cyber, Risk and Compliance challenges –
Enterprises have to perform a range of back-office functions such as Risk Data Aggregation & Reporting, Anti Money Laundering Compliance and Cyber Security Monitoring.
It must naturally follow that as more and more information assets are stored across the organization, it is a manifold headache to deal with securing each and every silo from a range of bad actors – extremely well funded and sophisticated adversaries ranging from criminals to cyber thieves to hacktivists. On the business compliance front, sectors like Banking & Insurance need to maintain large AML and Risk Data Aggregation programs – silos are the bane of both. Every industry needs fraud detection capabilities as well, which need access to unified data.
My intention for this post is clearly to raise more questions than provide answers. There is no question Digital Platforms are a massive business differentiator but they need to have access to an underlying store of high quality, curated, and unified data to perform their magic. Industry leaders need to begin treating high quality Data as the most important business asset they have & to work across the organization to rid it of Silos.