Why APIs Are a Day One Capability In Digital Platforms..

As enterprises embark or continue on their Digital Journey, APIs are starting to emerge as a key business capability and one that we need to discuss. Regular readers of this blog will remember that APIs are one of the common threads across the range of architectures we have discussed in Banking, Insurance and IoT et al. In this blogpost, we will discuss the five key imperatives or business drivers for enterprises embarking on a centralized API Strategy. 

Digital Platforms are composed of an interconnected range of enterprise services exposed as APIs across the Internet.

API Management as a Native Digital Capability..

The use of application programming interfaces (APIs) has been well documented across web scale companies such as Facebook, Amazon and Google et al.Over the last decade, APIs have begun emerging as the primary means for B2B and B2C companies to interact with their customers, partners and employees. The leader enterprises already have Digital Platform efforts underway as opposed to creating standalone Digital applications. Digital Platforms aim to increase the number of product and client channels of interaction so that enterprises can reach customer audiences that were hitherto untapped. The primary mode of interaction with a variety of target audiences in such digital settings are via APIs.

APIs enable the creation of new business models that can deliver differentiated experiences (source – IBM)

APIs are widely interoperable, relatively easy to create and form the front end of many internet scale platforms. APIs are leveraged to essentially access the core services provided by these platforms and can be used to create partner and customer ecosystems. Leaders such as PayPal, Amazon & FinTechs such as Square, Mint etc have overwhelmingly used APIs as a way to not only open their platforms to millions of developers but also to offer innovative services.

As of 2015, programmableweb.com estimated that over 12,000 APIs were already being offered by enterprise firms. Leaders such as Salesforce.com were generating about 50% of their revenue through APIs. Salesforce.com created a thriving marketplace – AppExchange – for apps created by its partners that work on its platform which numbered around 300 at the time of writing. APIs were contributing 60% of revenues at eBay and a staggering 90% for Expedia.com. eBay uses APIs to create additional exposure for it’s products – list auctions on other websites, get bidder information about sold items, collect feedback on transactions, and list new items for sale. Expedia’s APIs allowed customers to use third party websites to book flights, cars, and hotels. [2]

The Three Core Competencies of Digital – Cloud, Big Data & Intelligent Middleware

While most of the Fortune 500 have already begun experimenting with the value that APIs can deliver, the conversation around these capabilities needs to be elevated from an IT level to a line of business to a CIO/Head of Marketing. APIs help generate significant revenue upside while enabling rapid experimentation in business projects. Examples of API usage abound in industries like Financial Services, Telecom, Retail and Healthcare.

The Main Kinds of APIs

While the categories of APIs will vary across industry, some types of APIs have been widely accepted. The three most popular from a high level are described below –

  1. Private APIs – These are APIs defined for use by employees and internal systems within an organization or across a global company. By their very nature, they’re created for sensitive internal functions and have access to privileged functions that external actors cannot perform.
  2. Customer APIs – Customer APIs are provided as a way of enabling used by global customers to conduct business using product/service distribution channels – examples include product orders, view catalogs etc. These carry a very limited set of privileges limited to customer facing actions in a B2C context.
  3. Partner APIs – Partner APIs are used for varying levels of business to be able to perform business functions in the context of a B2B relationship. Examples include Affiliate programs in Retail, inventory management, Supply Orders in Manufacturing & Billing functions in Financial Services etc.The API provider hosts marketplaces that enable partner developers to create software that leverages these APIs.

The Five Business Drivers for an Enterprise API Strategy..

The question for enterprise executives then becomes, when do they begin to invest in a central API Management Platform?  Is such a decision based on the API sprawl in the organization or the sheer number of APIs being manually managed etc?

While the exact trigger point may vary for every enterprise, Let us consider the five key value drivers..

Driver #1 APIs enable Digital Platforms to evolve into ecosystems

In my mind, the first and most important reason to move to a centralized API strategy is to evolve standalone Digital capabilities into a coherent Platform. Different lines of business can use these capabilities to develop an ecosystem of applications that can be offered as a SaaS (Software as a Service). The end state of Digital Platforms are to operate business systems at massive scale in terms of customers, partners and employees.

The two central ideas at the heart of a platform based approach are as follows –

  1. Create new customer revenue streams by reaching out to new customer segments across the globe or in new (and non traditional) markets. Examples of these platforms abound in the business world. In financial services, Banks & Credit reporting agencies are able to monetize their assets of years of customer & product data by reselling them to interested third parties which use them either for new product creation or to offer services that simplify a pressing industry issue – Customer Onboarding.
  2. Reduce cost in current business models by extending core processes to business partners and also by automating manual communication steps (which are almost always higher cost and inefficient). For instance, Amazon has built their retail business using partner APIs to extend retailing provisioning, entitlement, enablement and order fulfillment processes.

Driver #2 Impact the Customer experience

We have seen how mobile systems are a key source of customer engagement. Offering the customer a seamless experience while they transact with an organization is a key way of disarming competition. Accordingly, Digital projects emphasize the importance of capabilities such as Customer Journey Mapping (CJM) and Single View of Customer (SVC) as the minimum table stakes that they need to provide. For instance, in Retail Banking, players are feeling the pressure to move beyond the traditional transactional banking model to a true customer centric model by offering value added services on the customer data that they already possess. APIs are leveraged across such projects to enrich the views of the customer (typically with data from external systems) as well as to expose these views to customers themselves, business partners and employees.

Demystifying Digital – Why Customer 360 is the Foundational Digital Capability – ..(1/3)

Driver #3 Cloud Computing & DevOps

This one is all too familiar to anyone working in technology. We have seen how both Cloud Computing & DevOps are the foundation of agile technology implementations across a range of back end resources. These include but are not limited to Compute, NAS/SAN storage, Big Data, Application platforms, and other middleware. Extending that idea, Cloud (IaaS/PaaS) is a set of APIs.

APIs are used to abstract out the internals of these underlying platform services. Application Developers and other infrastructure services use well defined APIs to interact with the platforms. These APIs enable the provisioning, deployment and management of platform services.

APIs have become the de facto model that provide developers and administrators with the ability to assemble Digital applications such as microservices using complicated componentry. Thus, there is a strong case to be made for adopting an API centric strategy when evolving to a Software Defined Datacenter.

A huge trend on the developer side has been the evolution of Continuous build, integration and deployment processes. The integration of APIs into the DevOps process has begun with use cases ranging from using publicly available APIs being used to trigger CI jobs to running CI/CD jobs using a cloud based provider.

Why Digital Disruption is the Cure for the Common Data Center..

Driver #4 APIs enable Business & Product Line Experimentation

APIs thus enable companies to constantly churn out innovative offerings while still continuously adapting & learning from customer feedback. Internet scale companies such as Facebook provide edge APIs that enable thousands of companies to write applications that driver greater customer volumes to the Facebook platform. The term API Economy is increasingly in vogue and it connotes a loosely federated ecosystem of companies, consumers, business models and channels

The API economy is a set of business models and channels — based on secure access of functionality and the exchange of data to an ecosystem of developers and the users of the app constructs they build — through an API, either within a company or via the internet, with business partners and customers.

The Three Habits of Highly Effective Real Time Enterprises…

Driver #5 Increasingly, APIs are needed to comply with Regulatory Mandates

We have already seen how key industries such as Banking and Financial Services, regulatory authorities are at the forefront of forcing incumbents to support Open APIs. APIs thus become a mechanism for increasing competition to benefit consumer choice. The Regulators are  changing the rules of participation in the banking & payments industry and APIs are a key enabling factor in this transformation.

Under the PSD2, Banks and Payment Providers in the EU will need to unlock access to their customer data via Open APIs

Why the PSD2 will Spark Digital Innovation in European Banking and Payments….

Financial Services, Healthcare, Telecom and Retail.. a case in point for why APIs present an Enormous Opportunity for the Fortune 500..

Banking – At various times, we have highlighted various business & innovation issues with Banking providers in the areas of Retail Banking, Payment Providers and Capital Markets. Regimes such as Payment Systems Directive (PSD2) in the EU will compel staid industry players to innovate faster than they otherwise would. FinTechs across the industry offer APIs to enable third party services to use their offerings.

Healthcare – there is broad support in the industry for Open APIs to drive improved patient care & highly efficient billing processes as well as to ensure realtime engagement across stakeholders.

APIs across the Healthcare value chain can ensure more aligned care plans and business processes. (Image Credit – Chilmark)

In the Telecom industry, nearly every large operator has developed APIs which are offered to customers and the developer community. Companies such as AT&T and Telefonica are using their anonymized access to hundreds of millions of subscribers to grant large global brands access to nonsensitive customer data. Federated platforms such as the GSM Association’s oneAPI are already promoting the usage of industry APIs.[1]

Retailers are building new business models based on functionality such as Product Catalogs, Product Search, Online Customer Orders, Inventory Management and Advanced Analytics (such as Recommendation Engines). APIs enable retailers to expand their footprints beyond the brick and mortar store & an online presence.

Ranking Your API Maturity..

Is there a maturity model for APIs?  We can try listing those into three different strategic options for Banks. Readers can extrapolate these into for their specific industry segment.

  1. Minimally Compliant Enterprises – Here we should categorize Companies that seek to provide compliance with a minimal Open API. Taking the example of Banking, while this may be the starting point for several organizations, staying too long in this segment will mean gradual market share erosion as well as a loss of customer lifetime value (CLV) over time. The reason for this is that FinTechs and other startups will offer a range of services such as Instant mortgages,  personal financial management tools, paperless approval processes for a range of consumer accounts etc. It is also anticipated that such organizations will treat their API strategy as a localized effort and will allocate personnel to the project mainly around the front office and marketing.
  2. Digital Starters -Players that have begun exploring opening up customer data but are looking to support the core Open API but also introduce their own proprietary APIs. While this approach may work in the short to medium term, it will only impose integration headaches on the banks as time goes on.
  3. Digital Innovators – The Digital Innovators will lead the way in adopting APIs. These companies will fund dedicated teams in lines of business serving their particular customer segments either organically or using partnerships with third party service providers. They will not only adhere to the industry standard APIs but also extend these specs to create own services with a focus on data monetization.

Conclusion..

Increasingly, a company’s APIs represent a business development tool and a new go-to-market channel that can generate substantial revenues from referrals and usage fees. Given the strategic importance and revenue potential of this resource, the C-suite must integrate APIs into its corporate decision making.

The next post will take a technical look into the core (desired) features of an API Management Platform.

References..

[1] Forrester Research 2016 – “Sizing the Market for API Management Solutions” http://resources.idgenterprise.com/original/AST-0165452_Forrester_Sizing_the_market_for_api_management_solutions.pdf 

[2]  Harvard Business Review 2016 – “The Strategic Value of APIs” – https://hbr.org/2015/01/the-strategic-value-of-apis

Why Banks, Payment Providers and Insurers Should Digitize Their Risk Management..

When models turn on, brains turn off.” – Dr. Til Schuermann, Formerly Research Officer in the Banking Studies function at the Federal Reserve Bank of New York.Currently Partner at Oliver Wyman & Company.

There exist two primary reasons for Enterprises such as Banks, Insurers, Payment Providers and FinTechs to pursue best in class Risk Management Processes and Platforms. The first need is compliance driven by various regulatory reporting mandates such as the Basel Reporting Requirements, the FRTB, the Dodd‐Frank Act, Solvency II, CCAR and CAT/MiFiD II in the United States & the EU. The second reason is the need to drive top-line sales growth leveraging using Digital technology. This post advocates the implementation of Digital Technology on Risk Management across both the areas.

Image Credit – Digital Enterprise

Recapping the Goals of Regulatory Reform..

There are many kinds of Risk, ranging from the three keystone kinds – Credit, Market and Operational to the Basel-II.5/III accords, FRTB, Dodd Frank etc. The best enterprises not only manage Risk well but they also turn it into a source of competitive advantage. Leading banks have recognized this and according to McKinsey forecasts, while risk-operational processes such as credit administration today account for the majority of the some (50 percent) of the Risk function’s staff, and analytics just 15 percent, by 2025 those figures will be around 25 percent and 40 percent respectively. [1]

Whatever be the kind of Risk, certain themes are common from a regulatory intention standpoint-

  1. Limiting risks that may cause wider harm to the economy by restricting certain activities such as preventing banks with retail operations from engaging in proprietary trading activities
  2. Requiring that banks increase the amount of and quality of capital held on reserve to back their assets and by requiring higher liquidity positions
  3. Ensuring that banks put in place appropriate governance standards ensuring that boards and management interact not just internally but also with regulators and their clients
  4. Upgrading governance standards, enabling a fundamental change in bank governance and the way boards interact with both management and regulators. These ambitions were expressed in various new post‐crisis rules and approaches.
  5.  Tackle the “too big to fail” challenge for highly complex businesses spanning multiple geographies, product lines and multifaceted customer segments. Accurate risk reporting ensures adequate capital conservation buffers.

Beyond the standard models used for Risk regulatory reporting, Banks & FinTechs are pushing the uses of risk modeling to new areas such as retail lending, SME lending. Since the crisis of 2008, new entrants have begun offering alternatives to traditional financial services in areas such as payments, mortgage loans, cryptocurrency, crowdfunding, alternative lending, and Investment management. The innovative use of Risk analytics lies at the core of the FinTechs success.

Across these areas, risk models are being leveraged in diverse areas such as marketing analytics to gain customers, defend against competition etc. For instance, realtime analytic tools are also being used to improve the credit granting processes. The intention is to gain increased acceptance by pre-approving qualified customers quickly without the manual intervention that can cause weeks of delays. Again, according to McKinsey, the goals of leading Banks are to approve up to 90 percent of consumer loans in seconds, generate efficiencies of 50 percent leading to revenue increases of 5 to 10 percent. Thus, leading institutions are using Risk Analytics to rethink their business models and to expand their product portfolios. [2]

Over the last two years, this blog has extensively covered areas such as cyber security, fraud detection, anti money laundering (AML) etc from a data analytics standpoint. The industry has treated Risk as yet another defensive function but over the next 10 years, it is expected that the Risk function will be an integral part of all of these above areas thus driving business revenue growth & detecting financial fraud, crimes. There is no doubt that Risk is a true cross cutting concern across a range of business functions & not just the traditional Credit, Market, Liquidity and Operational silos. Risk strategy needs to be a priority at the highest levels of an organization.

The Challenges with Current Industry Risk Architectures..

Almost an year ago, we discussed these technology issues in the below blogpost. To recap – most industry players have a mishmash of organically developed & shrink wrapped IT systems. These platforms run critical Core Banking Applications to Trade Lifecycle to Securities Settlement to Financial Reporting etc.  Each of these systems operates in an application, workflow, data silo with it’s own view of the enterprise. These are all kept in sync largely via data replication & stove piped process integration. Further siloed risk functions ensure that different risk reporting applications are developed using duplicative technology paradigms causing massive IT spend. Further, the preponderance of complex vendor supplied systems ensures lengthy release cycles and complex data center deployment requirements.

The Five Deadly Sins of Financial Services IT..

Industry Risk Architectures Suffer From Five Limitations

 A Roadmap for Digitization of Risk Architectures..

The end state or how a Digital Risk function will look like will vary for every institution embarking on this journey. There are six foundational elements we can still point out a few guideposts based on the .

#1 Automate Back & Mid Office Processes Across Risk and Compliance  –

As discussed, Many business processes across the front, mid and back office involve risk management. These processes range from risk data aggregation, customer on boarding, loan approvals, regulatory compliance (AML,KYC, CRS & FATCA), enterprise financial reporting  & Cyber Security.It is critical to move all and any manual steps from these business functions to a highly automated model. Doing so will not only reduce operational costs in a huge way but also demonstrate substantial auditability capabilities to regulatory authorities.

#2 Design Risk Architectures to handle Real time Data Feeds –

A critical component of Digital Risk is the need to incorporate real time data feeds across Risk applications. While Risk algorithms have traditionally dealt with historical data, new regulations such as FRTB explicitly call for various time horizons. These imply that Banks  to run a full spectrum of analytics across many buckets on data seeded from real time interactions. While the focus has been on the overall quality and auditability of data, the real time requirement is critical as one moves from front office applications such as customer on boarding, loan qualifications & pre-approvals to  key areas such as  market, credit and liquidity risks. Why is this critical? We have discussed the need for real time decision making insights for business leaders. Understanding risk exposures and performing root cause analysis in real time is a huge business capability for any Digital Enterprise.

#3 Experiment with Advanced Analytics and Machine Learning 

In response to real time risk reporting, the analytics themselves will be begin to get considerably more complex. This technology complexity will only be made more difficult with multiple teams working on all of these areas. This calls out for standardization of the calculations themselves across the firm. This also implies that from an analytics standpoint, a large number of scenarios on a large volume of data.  For Risk to become truly a digital practice, the innovative uses of Data Science across areas such as customer segmentation, fraud detection, social graph analysis must all make their way into risk management. Insurance companies and Banks are already deploying self learning algorithms in applications that deal with credit underwriting, employee surveillance and fraud detection. Wealth Managers are deploying these in automated investment advisory.  Thus, machine learning will support critical risk influenced areas such as Loan Underwriting, Credit Analytics, Single view of risk etc. All of these areas will need to leverage predictive modeling leading to better business decisions across the board.

#4 Technology Led Cross Organization Collaboration –

McKinsey predicts [1] that in the coming five to ten years, different regulatory ratios such as capital, funding, leverage, total loss-absorbing capacity etc will drive  the composition of the balance sheet to support profitability. Thus the risk function will work with finance and strategy functions to help optimize the enterprise balance sheet across various economic scenarios and then provide executives with strategic choices (e.g. increase or shrink a loan portfolio, for example), and likely regulatory impacts across these scenarios. Leveraging analytical optimization tools, an improvement on return on equity (ROE) by anywhere between 50 and 400 basis points has been forecasted.

The Value Drivers in Digitization of Risk Architectures..

McKinsey contends that the automation of credit processes and the digitization of the key steps in the credit value chain can yield cost savings of up to 50 percent. The benefits of digitizing credit risk go well beyond even these improvements. Digitization can also protect bank revenue, potentially reducing leakage by 5 to 10 percent. [2]

To give an example, by putting in place real-time credit decision making in the front line, banks reduce the risk of losing creditworthy clients to competitors as a result of slow approval processes. Additionally, banks can generate credit leads by integrating into their suite of products new digital offerings from third parties and Fintech’s, such as unsecured lending platforms for business. Finally, credit risk costs can be further reduced through the integration of new data sources and the application of advanced-analytics techniques. These improvements generate richer insights for better risk decisions and ensure more effective and forward-looking credit risk monitoring. The use of machine-learning techniques, for example, can help banks improve the predictability of credit early-warning systems by up to 25 percent [2].

The Questions to Ask at the Start of Risk Transformation..

There are three questions at this phase every Enterprise needs to ask at the outset –

  • What customer focused business capabilities can be enabled across the organization by incorporating an understanding of the various kinds of Risk ?
  • What aspects of this Risk transformation can be enabled by digital technology? Where are the current organizational and technology gaps that inhibit innovation?
  • How do we measure ROI and Business success across these projects before and after the introduction of ? How do we benchmark ourselves from a granular process standpoint against the leaders?

Conclusion..

As the above makes it clear, traditional legacy based approaches to risk data management reporting do not lend themselves well to managing your business effectively. When things are going well it has become very difficult for executives and regulators to get a good handle on how the business is functioning. In the worst of times, the risk function can fail to function well as models do not perform effectively.  It is not enough to take an incremental approach to improving current analytics approaches. The need of the hour is to incorporate the state of the art data management and analytic approaches based on Big Data, Machine Learning and Artificial Intelligence.

References

Why Digital Platforms Need A Software Defined Datacenter..(2/7)

The first blog in this seven part series (@ http://www.vamsitalkstech.com/?p=1833) introduced and discussed a reference architecture for Software Defined Data Centers (SDDC).The key runtime technology paradigm that enables Digital applications is the agility in the underlying datacenter infrastructure. Using a SDDC approach, complex underlying infrastructure (primarily Compute, Storage and Network) is abstracted away from the applications running on them.  This second blog post will discuss traditional datacenter challenges with running large scale Digital Applications.

Image Credit – Datacenter Dynamics

Introduction

Every Enterprise in the middle of Digital reinvention realizes that the transformation component is critically based on technology – a mix of Big Data, Cloud, IoT, Predictive Analytics etc. It is stark from the above that the traditional IT assets & the enterprise datacenter is in need of a substantial refresh. Systems that dominate the legacy landscape such as mainframes, midrange servers, proprietary storage systems are slowly being phased out in favor of Cloud platforms running commodity x86 servers with the Linux OS, Big Data workloads, Predictive Analytics etc.

Traditional datacenters were built with application specific workloads in mind with silos of monitoring tools whereas Digital implies a move to fluid applications with changing workload requirements and more unified monitoring across the different layers.

We have dwelt on how the Digital platforms are underpinned by Cloud, Big Data and Intelligent Middleware.

The Three Core Competencies of Digital – Cloud, Big Data & Intelligent Middleware

It comes as no surprise that according to Gartner Research, by 2020, the Software-Defined Datacenter (SDDC) will become the dominant architecture in at least 75 percent of global data centers[1]. With the increasing adoption of APIs across the board and rapid increase in development of cloud-native digital applications using DevOps methodologies, the need for SDDC is only forecast to increase.

For those new to the concept of SDDC, attached is a link to the first blog in this series below where we discussed the overall technical concept along with a reference architecture.

Why Software Defined Infrastructure & why now..(1/6)

Legacy Datacenter vs SDDC..

For the last two decades, the vast majority of enterprise software applications created were based on monolithic architectures. These were typically created by dispersed teams who modeled the designs around organizational silos and the resulting inchoate patterns of communication. These applications were created by globally siloed developer teams who would then pass the deployment artifacts over to the operations team. These applications were then deployed in datacenters typically on high end servers using Vertical Scaling where multiple instances of an application are run on a few high end servers. As load increases on the application, adding more CPU, RAM etc to these server increased it’s ability scale.These applications were typically deployed, managed and updated in silos.

Thus, much of what exists in the data centers across enterprise are antiquated technology stacks. These range from proprietary hardware platforms to network devices & switches to monolithic applications running on them. Other challenges surrounding these systems include inflexible, proprietary integration & data architectures.

The vast majority of  current workloads are focused around systems such as ERP and other back office applications. They are unsuited to running cloud native applications such as Digital Platforms which support large scale users and need real time insights around customer engagement.

Quite often these legacy applications have business & process logic tightly coupled with infrastructure code. This results in complex manual processes, monolithic applications, out of compliance systems with out of date on patch levels and tightly coupled systems integration. Some of these challenges have been termed – Technical Debt.

While it is critical for Datacenters to operate in a manner that maximizes their efficiency. They also need to manage costs from an infrastructure, power and cooling standpoint while ultimately delivering the right business outcomes for the organization.

IDC forecasts that by 2018, 50% of new datacenter infrastructure investments will be for systems of engagement, insight, and action rather than maintaining existing systems of record.[2]

A great part of this transformation is also cultural. It is clear and apparent to me that the relationship lines of business (LOBs) have with their IT teams – typically central & shared – is completely broken at a majority of large organizations. Each side cannot seem to view either the perspective or the passions of the other. This dangerous dysfunction usually leads to multiple complaints from the business. Examples of which include –

  • IT is perceived to be glacially slow in providing infrastructure needed to launch new business initiatives or to amend existing ones. This leads to the phenomenon of ‘Shadow IT’ where business applications are  run on public clouds bypassing internal IT
  • Something seems to be lost in translation while conveying requirements to different teams within IT
  • IT is too focused on technological capabilities – Virtualization, Middleware, Cloud, Containers, Hadoop et al without much emphasis on business value drivers

Rapid provisioning of IT resources is a huge bottleneck which frequently leads to lines of business adopting the public cloud to run their workloads.  According to Rakesh Kumar, managing vice president at Gartner – “For over 40 years, data centers have pretty much been a staple of the IT ecosystem,Despite changes in technology for power and cooling, and changes in the design and build of these structures, their basic function and core requirements have, by and large, remained constant. These are centered on high levels of availability and redundancy, strong, well-documented processes to manage change, traditional vendor management and segmented organizational structures. This approach, however, is no longer appropriate for the digital world.” [1]

Further, Cloud-native applications are evolving into enterprise architectures built on granular microservices. Each microservice runs its own linux container. Thus, Digital architectures are evolving to highly standardized stacks that can scale “horizontally”.  Horizontal Scaling refers to increasing the overall footprint of an application’s architecture by quickly adding more servers as opposed to increasing the capacity of existing servers.

The below illustration depicts the needs of a Digital datacenter as opposed to the traditional model.

The Five Challenges of Running Massively distributed Architectures..

The SDDC with it’s focus on software controlling commodity hardware enables a range of flexibility and cost savings that were simply not possible before. In the next section, we will consider what requirements Digital Applications impose on a traditional datacenter.

What Do Digital Applications Require From Data Center Infrastructure..

As one can see from the above, traditional approaches to architecting data centers do not scale well from both a technology and from a cost standpoint as far as Digital Applications are concerned. As the diagram below captures, there are five main datacenter challenges that are encountered while architecting and deploying large or medium scale digital applications.

Running Digital Applications in legacy data centers requires surmounting five important challenges.
Running Digital Applications in legacy data centers requires surmounting five important challenges.

#1– Digital Applications Need Fast Delivery of Complex, Multivaried Application Stacks 

Digital applications are a combination of several different technology disciplines – Big Data, Intelligent Middleware, Mobile applications etc. Thus, data centers will need to run clusters of multi-varied applications at scale. Depending on the scope – a given application will consist of web servers, application servers, Big Data processing clusters, message queues, business rules and process management engines et al.

In the typical datacenter configuration, servers follow a vertical scaling model which limits their ability to host multi tenant applications. This is so as they are not inherently multi tenant in that they cannot natively separate workloads of different kinds when they’re running on the same underlying hardware. The typical traditional approach to ameliorate this has been to invest in multiple sets of hardware (servers, storage arrays) to physically separate applications which resulting in increases in running cost, a higher personnel requirement and manual processes around system patch and maintenance etc.

#2– Digital Applications Need Real Time Monitoring & Capacity Management of complex Architectures

Digital Applications also call for the highest degrees of Infrastructure and Application reliability across the stack. This implies not only a high level of monitoring but also seamless deployment of large scale applications across large clusters. Digital Applications are data intensive. Data flows into them from various sources in realtime for processing. These applications are subject to spikes in usage and as a result the underlying infrastructure hosting these can display issues with poor response times and availability.

Further, these applications are owned by combined teams of Developers and Operations. Owing to microservice architectures, the number of failure points also increase. Thus, Datacenter infrastructure is also shared between both teams with each area expected to understand the other discipline and even participate in it.

Traditional datacenters suffered from high capacity and low utilization rates. Capacity Management is critical across compute, network and storage. Sizing these resources (vCPU, vRAM, virtual Network etc) and dynamically managing their placement is a key requirement for digital application elasticity.

The other angle to this is the fact that Digital applications typically work on a chargeback model where Central IT needs to only charge the line of business for IT services consumed. This implies that IT can smartly manage capacity consumption on a real time basis using APIs. Thus, monitoring, capacity management and chargeback all need to be an integrated capability.

#3– Digital Applications Call for Dynamic Workload Scheduling

The ability to provide policy driven application & workload scheduling is a key criteria for Digital Applications. These applications work best on a self service paradigm. The capability of leveraging APIs to reconfigure & re provision infrastructure resources dynamically based on application workload needs. For instance, most Digital applications leverage linux containers which need to be dynamically scheduled and migrated across different hosts. Digital Applications thus need to be fluid in terms of how they scale across multiple hosts.

#4– Digital Applications Need Speedy Automation Across the Layers 

We discussed how one of the critical differentiators for Digital Enterprise applications is the standardization of architectural stacks. Depending on the scale, size and complexity of applications – choices of web development frameworks, libraries, application servers, databases and Big Data stacks need to be whittled down to a manageable few. This increases dependencies for applications across the infrastructure. From a horizontal scalability perspective, thousands of instances of popular applications will need to run on large scale infrastructure. What is key is ensuring that a high degree of automation from a cloud system administration standpoint. Automation spans  a variety of topics- lines of business self service, server automation, dynamic allocation of infrastructure, intelligent deployments, configuration of runtime elements using a template based approach, patching and workflow management. 

#5– Seamless Operations and Deployment Management at Scale

Traditional datacenters typically take weeks to months to deliver new applications. Digital Applications call for multiple weekly deployments and an ability to roll up or go down versions quickly. Application deployment and security patch management needs to include a range of use cases such as rolling deployments which ensure zero downtime, canary deployments to test functionality with a subset of users, sharded deployments et al. From an application maintenance standpoint, understanding where performance issues are occurring, such as delayed response times is of critical importance in ensuring customer satisfaction.

For instance, in the Retail industry, online shopping cart abandonment is as high as 70% when website response times are slow.

The lack of support for any of these operational features in Digital Applications can be fatal to user acceptance. And this can ultimately result a range of issues – increased CapEx and OpEx, high server to sysadmin ratios and unacceptably high downtimes.

In summary, the traditional datacenter is not a good fit for the new age Digital Platform.

The SDDC Technology Ecosystem

It is evident from the first post (@ http://www.vamsitalkstech.com/?p=1833) that Software Defined Datacenters have evolved into large & complex ecosystems dominated by open source technology.

It has become increasingly difficult for enterprise CXOs and IT leadership to identify which projects do what and how they all fit together.

I believe, the current SDDDC technology ecosystem could be broken down into four complementary categories –

  1. Cloud Infrastructure – Includes IaaS providers (AWS, Azure, OpenStack etc)  and Service Management Platforms such as ManageIQ
  2. Provisioning & Configuration Management – Tools like Puppet, Ansible and Chef.
  3. Serverless Infrastructure & DevOps – Includes a range of technologies but primarily PaaS providers such as OpenShift and CloudFoundry who use Linux containers (such as Docker, Rocket) as the basic runtime unit
  4. Cloud Orchestration & Monitoring- Includes a range of projects such as Apache Mesos, Kubernetes

Readers will detect a distinct tilt in my thinking towards open source but it is generally accepted that open technology communities are the ones leading most of the innovation in this space – along with meaty contributions from public cloud providers especially Amazon and Google.

The Roadmap for the rest of the blogs in this series..

In this blog series, we use these highlight specific cloud projects that are leading market adoption in the above categories.

The third and next post in this series will deep dive into Apache Mesos.

Subsequent posts in this series will cover best of breed projects – Docker & Kubernetes, ManageIQ, OpenStack, OpenShift in that order. The final post will round it all together with a sample real-world flow bringing all these projects together using a sample application provisioning flow.

Conclusion

Progressive enterprise IT teams have begun learning from the practices of the web-scale players and have adopting agile ways of developing applications. They have also begun slowly introducing disruptive technologies around Cloud Computing (IaaS & PaaS), Big Data, Agile developer toolsets, DevOps style development pipelines & Deployment Automation etc. Traditional datacenters are siloed in the sense that the core foundational components servers, networking and storage are deployed, managed and monitored by separate teams. This is the antithesis of Digital where all these areas converge in a highly fluid manner.

The next post in this series will discuss Apache Mesos, an exciting new technology project that strives to provide a global cluster manager for the vast diversity of applications found in Digital projects.

References

[1] Gartner – ” Five Reasons Why a Modern Data Center Strategy Is Needed for the Digital World” – http://www.gartner.com/newsroom/id/3029231

[2] IDC Asia/Pacific Unveils its Top Datacenter Predictions for APeJ for 2017 and Beyond –
http://www.idc.com/getdoc.jsp?containerId=prAP42063416

A Digital Reference Architecture for the Industrial Internet Of Things (IIoT)..

A few weeks ago on the invitation of DZone Magazine, I jointly authored a Big Data Reference Architecture along with my friend & collaborator, Tim Spann (https://www.linkedin.com/in/timothyspann/). Tim & I distilled our experience working on IIoT projects to propose an industrial strength digital architecture. It brings together several technology themes – Big Data , Cyber Security, Cognitive Applications, Business Process Management and Data Science. Our goal is to discuss a best in class architecture that enables flexible deployment for new IIoT capabilities allowing enterprises to build digital applications. The abridged article was featured in the new DZone Guide to Big Data: Data Science & Advanced Analytics which can be downloaded at  https://dzone.com/guides/big-data-data-science-and-advanced-analytics

How the Internet Of Things (IoT) leads to the Digital Mesh..

The Internet of Things (IoT) has become one of the four top hyped up technology paradigms affecting the world of business. The other usual suspects being Big Data, AI/Machine Learning & Blockchain. Cisco predicts that the IOT is expected to impact about 25 billion connected things by 2020 and affect about $2 trillion of economic value globally across a diverse range of verticals. These devices are not just consumer oriented devices such as smartphones and home monitoring systems but dedicated industry objects such as sensors, actuators, engines etc.

The interesting angle to all this is the fact that autonomous devices are already beginning to communicate with one another using IP based protocols. They largely exchanging state & control information around various variables. With the growth of computational power on these devices, we are not far off from their sending over more granular and interesting streaming data – about their environment, performance and business operations – all of which will enable a higher degree of insightful analytics to be performed on the data. Gartner Research has termed this interconnected world where decision making & manufacturing optimization can occur via IoT as the “Digital Mesh“.

The evolution of technological innovation in areas such as Big Data, Predictive Analytics and Cloud Computing now enables the integration and analysis of massive amounts of device data at scale while performing a range of analytics and business process workflows on the data.

Image Credit – Sparkling Logic

According to Gartner, the Digital Mesh will thus lead to an interconnected data information deluge powered by the continuous data from these streams. These streams will encompasses classical IoT endpoints (sensors, field devices, actuators etc) sending data in a variety of formats –  text, audio, video & social data streams – along with new endpoints in areas as diverse as Industrial Automation, Remote Healthcare, Public Transportation, Connected Cars, Home Automation etc. These intelligent devices will increasingly begin communicating with their environments in a manner that will encourage collaboration in a range of business scenarios. The industrial cousin of IoT is the Industrial Internet of Things (IIIoT).

Defining the Industrial Internet Of Things (IIoT)

The Industrial Internet of Things (IIoT) can be defined as a ecosystem of capabilities that interconnects machines, personnel and processes to optimize the industrial lifecycle.  The foundational technologies that IIoT leverages are Smart Assets, Big Data, Realtime Analytics, Enterprise Automation and Cloud based services.

The primary industries impacted the most by the IIoT will include Industrial Manufacturing, the Utility industry, Energy, Automotive, Transportation, Telecom & Insurance.

According to Markets and Markets, the annual worldwide Industrial IoT market is projected to exceed $319 billion in 2020, which represents an 8% a compound annual growth rate (CAGR). The top four segments are projected to be manufacturing, energy and utilities, auto & transportation and healthcare.[1]

Architectural Challenges for Industrial IoT versus Consumer IoT..

Consumer based IoT applications generally receive the lion’s share of media attention. However the ability of industrial devices (such as sensors) to send ever more richer data about their operating environment and performance characteristics is driving a move to Digitization and Automation across a range of industrial manufacturing.

Thus, there are four distinct challenges that we need to account for in an Industrial IOT scenario as compared to Consumer IoT.

  1. The IIoT needs Robust Architectures that are able to handle millions of device telemetry messages per second. The architecture needs to take into account that all kinds of devices operating in environments ranging from the constrained to
  2. IIoT also calls for the highest degrees of Infrastructure and Application reliability across the stack. For instance, a lost message or dropped messages in a healthcare or a connected car scenario may mean life or death for a patient, or, an accident.
  3. An ability to integrate seamlessly with existing Information Systems. Lets be clear, these new age IIOT architectures need to augment existing systems such as Manufacturing Execution Systems (MES) or Traffic Management Systems. In Manufacturing, MES systems continually improve the product lifecycle and perform better resource scheduling and utilization. This integration helps these systems leverage the digital intelligence and insights across (potentially) millions of devices across complex areas of operation.
  4. An ability to incorporate richer kinds of analytics than has been possible before that provide a great degree of context. This ability to reason around context is what provides an ability to design new business models which cannot be currently imagined due to lack of agility in the data and analytics space.

What will IIoT based Digital Applications look like..

Digital Applications are being designed for specific device endpoints across industries. While the underlying mechanisms and business models differ from industry to industry, all of these use predictive analytics based on a combination of real time data processing & data science algorithms. These techniques extract insights from streaming data to provide digital services on existing toolchains, provide value added customer service, predict device performance & failures, improve operational metrics etc.

Examples abound. For instance, a great example in manufacturing is the notion of a Digital Twin which Gartner called out last year. A Digital twin is a software personification of an Intelligent device or system.  It forms a bridge between the real world and the digital world. In the manufacturing industry, digital twins can be setup to function as proxies of Things like sensors and gauges, coordinate measuring machines, vision systems, and white light scanning. This data is sent over a cloud based system where it is combined with historical data to better maintain the physical system.

The wealth of data being gathered on the shop floor will ensure that Digital twins will be used to reduce costs and increase innovation. Thus, in global manufacturing – Data science will soon make it’s way into the shop floor to enable the collection of insights from these software proxies. We covered the phenomenon of Servitization in manufacturing in a previous blogpost.

In the Retail industry, an ability to detect a customer’s location in realtime and combining that information with their historical buying patterns can drive real time promotions and an ability to dynamically price retail goods.

Solution Requirements for an IIoT Architecture..

At a high level, the IIoT reference architecture should support six broad solution areas-

  1. Device Discovery – Discovering a range of devices (and their details)  on the Digital Mesh for an organization within and outside the firewall perimeter
  2. Performing Remote Lifecycle Configuration of these devices ranging from startup to modification to monitoring to shut down
  3. Performing Deep Security level introspection to ensure the patch levels etc are adequate
  4. Creating Business workflows on the Digital Mesh. We will do this by marrying these devices to enterprise information systems (EISs)
  5. Performing Business oriented Predictive Analytics on these devices, this is critical to 
  6. On a futuristic basis, support optional integration with the Blockchain to support a distributed organizational ledger that can coordinate activity across all global areas that an enterprise operates in.

Building Blocks of the Architecture

Listed below are the foundational blocks of our reference architecture. Though the requirements will vary across industries, an organization can reasonably standardize on a number of foundational components as depicted below and then incrementally augment them as the interactions between different components increase based on business requirements.

Our reference architecture includes the following major building blocks –

  • Device Layer
  • Device Integration Layer
  • Data & Middleware Tier
  • Digital Application Layer

It also includes the following cross cutting concerns which span across the above layers –

  • Device and Data Security
  • Business Process Management
  • Service Management
  • UX Design
  • Data Governance – Provenance, Auditing, Logging

The next section provides a brief overview of the reference architecture’s components at a logical level.

A Big Data Reference Architecture for the Industrial Internet depicting multiple functional layers

Device Layer – 

The first requirement of IIIoT implementations is to support connectivity from the Things themselves or the Device layer depicted at the bottom. The Device layer includes a whole range of sensors, actuators, smartphones, gateways and industrial equipment etc. The ability to connect with devices and edge devices like routers, smart gateways using a variety of protocols is key. These network protocols include Ethernet, WiFi, and Cellular which can all directly connect to the internet. Other protocols that need a gateway device to connect include Bluetooth, RFID, NFC, Zigbee et al. Devices can connect directly with the data ingest layer shown above but it is preferred that they connect via a gateway which can perform a range of edge processing.

This is important from a business standpoint for instance, in certain verticals like healthcare and financial services, there exist stringent regulations that govern when certain identifying data elements (e.g. video feeds) can leave the premises of a hospital or bank etc. A gateway cannot just perform intelligent edge processing but also can connect thousands of device endpoints and facilitate bidirectional communication with the core IIoT architecture. 

The ideal tool for these constantly evolving devices, metadata, protocols, data formats and types is Apache NiFi.  These agents will send the data to an Apache NiFi gateway or directly into an enterprise Apache NiFi cluster in the cloud or on-premise.

Apache NiFi Eases Dataflow Management & Accelerates Time to Analytics In Banking (2/3)..

 A subproject of Apache NiFi – MiNiFi provides a complementary data collection approach that supplements the core tenets of NiFi in dataflow management. However due to its small footprint and low resource consumption, is well suited to handle dataflow from sensors and other IOT devices. It provides central management of agents while providing full chain of custody information on the flows themselves.

For remote locations, more powerful devices like the Arrow BeagleBone Black Industrial and MyPi Industrial, it is very simple to run a tiny Java or C++ MiNiFi agent for secure connectivity needs.

The data sent by the device endpoints are then modeled into an appropriate domain representation based on the actual content of the messages. The data sent over also includes metadata around the message. A canonical model can optionally be developed (based on the actual business domain) which can support a variety of applications from a business intelligence standpoint.

 Apache NiFi supports the flexibility of ingesting changing file formats, sizes, data types and schemas. The devices themselves can send a range of feeds in different formats. E.g. XML now and based on upgraded capabilities – richer JSON tomorrow. NiFi supports ingesting any file type that the devices or the gateways may send.  Once the messages are received by Apache NiFi, they are enveloped in security with every touch to each flow file controlled, secured and audited.   NiFi flows also provide full data provenance for each file, packet or chunk of data sent through the system.  NiFi can work with specific schemas if there are special requirements for file types, but it can also work with unstructured or semi structured data just as well.  From a scalability standpoint, NiFi can ingest 50,000 streams concurrently on a zero-master shared nothing cluster that horizontally scales via easy administration with Apache Ambari.

Data and Middleware Layer – 

The IIIoT Architecture recommends a Big Data platform with native message oriented middleware (MOM) capabilities to ingest device mesh data. This layer will also process device data in such a fashion – batch or real-time – as the business needs demand.

Application protocols such as AMQP, MQTT, CoAP, WebSockets etc are all deployed by many device gateways to communicate application specific messages.  The reason for recommending a Big Data/NoSQL dominated data architecture for IIOT is quite simple. These systems provide Schema on Read which is an innovative data handling technique. In this model, a format or schema is applied to data as it is accessed from a storage location as opposed to doing the same while it is ingested. From an IIOT standpoint, one must not just deal with the data itself but also metadata such as timestamps, device id, other firmware data such as software version, device manufactured data etc. The data sent from the device layer will consist of time series data and individual measurements.

The IIoT data stream can thus be visualized as a constantly running data pump which is handled by a Big Data pipeline takes the raw telemetry data from the gateways, decides which ones are of interest and discards the ones not deemed significant from a business standpoint.  Apache NiFi is your gateway and gate keeper.   It ingests the raw data, manages the flow of thousands of producers and consumers, does basic data enrichment, sentiment analysis in stream, aggregation, splitting, schema translation, format conversion and other initial steps to prepare the data. It does that all with a user-friendly web UI and easily extendible architecture.  It will then send raw or processed data to Kafka for further processing by Apache Storm, Apache Spark or other consumers.  Apache Storm is a distributed real-time computation engine that reliably processes unbounded streams of data.  Storm excels at handling complex streams of data that require windowing and other complex event processing. While Storm processes stream data at scale, Apache Kafka distributes messages at scale. Kafka is a distributed pub-sub real-time messaging system that provides strong durability and fault tolerance guarantees. NiFi, Storm and Kafka naturally complement each other, and their powerful cooperation enables real-time streaming analytics for fast-moving big data. All the stream processing is handled by NiFi-Storm-Kafka combination.  

Apache Nifi, Storm and Kafka integrate very closely to manage streaming dataflows.

 

Appropriate logic is built into the higher layers to support device identification, ID lookup, secure authentication and transformation of the data. This layer will process data (cleanse, transform, apply a canonical representation) to support Business Automation (BPM), BI (business intelligence) and visualization for a variety of consumers. The data ingest layer will also providing notification and alerts via Apache NiFi.

Here are some typical uses for this event processing pipeline:

a. Real-time data filtering and pattern matching

b. Enrichment based on business context

c. Real-time analytics such as KPIs, complex event processing etc

d. Predictive Analytics

e. Business workflow with decision nodes and human task nodes

Digital Application Tier – 

Once IIoT knowledge has become part of the Hadoop based Data Lake, all the rich analytics, machine learning and deep learning frameworks, tools and libraries now become available to Data Scientists and Analysts.   They can easily produce insights, dashboards, reports and real-time analytics with IIoT data joined with existing data in the lake including social media data, EDW data, log data.   All your data can be queried with familiar SQL through a variety of interfaces such as Apache Phoenix on HBase, Apache Hive LLAP and Apache Spark SQL.   Using your existing BI tools or the open sourced Apache Zeppelin, you can produce and share live reports.   You can run TensorFlow in containers on YARN for deep learning insights on your images, videos and text data; while running YARN clustered Spark ML pipelines fed by Kafka and NiFi to run streaming machine learning algorithms on trained models.

A range of predictive applications are suitable for this tier. The models themselves should seek to answer business questions around things like -Asset failure, the key performance indicators in a manufacturing process and how they’re trending, insurance policy pricing etc. 

Once the device data has been ingested into a modern data lake, key functions that need to be performed include data aggregation, transformation, enriching, filtering, sorting etc.

As one can see, this can get very complex very quick – both from a data storage and processing standpoint. A Cloud based infrastructure with its ability to provide highly scalable compute, network and storage resources is a natural fit to handle bursty IIoT applications. However, IIoT applications add their own diverse requirements of computing infrastructure, namely the ability to accommodate hundreds of kinds of devices and network gateways – which means that IT must be prepared to support a large diversity of operating systems and storage types

The tier is also responsible for the integration of the IIoT environment into the business processes of an enterprise. The IIoT solution ties into existing line-of-business applications and standard software solutions through adapters or Enterprise Application Integration (EAI) and business-to-business (B2B) gateway capabilities. End users in business-to-business or business-to-consumer scenarios will interact with the IIOT solution and the special- purpose IIoT devices through this layer. They may use the IIoT solution or line-of-business system UIs, including apps on personal mobile devices, such as smartphones and tablets.

Security Implementation

The topic of Security is perhaps the most important cross cutting concern across all layers of the IIoT architecture stack. Needless to say, each of the layers must support the strongest data encryption, authentication and authentication capabilities for devices, users and partner applications. Accordingly, capabilities must be provided to ingest and store security feeds, IDS logs for advanced behavioral analytics, server logs, device telemetry. These feeds must be constantly analyzed across three domains – the Device domain, the Business domain and the IT domain. The below blogpost delves into some of these themes and is a good read to get a deeper handle on this issue from a SOC (security operations center) standpoint.

An Enterprise Wide Framework for Digital Cybersecurity..(4/4)

Conclusion

It is evident from the above that IIoT will enormous opportunity for businesses globally. It will also create layers of complexity and opportunity for Enterprise IT. The creation of smart digital services on the data served up will further depend on the vertical industries. Whatever be the kind of business model – whether tracking behavior, location sensitive pricing, business process automation etc – the end goal of IT architecture should be to create enterprise business applications that are ultimately data native and analytics driven.

DZone-GuideToBigData-Apr17

Why Data Silos Are Your Biggest Source of Technical Debt..

Any enterprise CEO really ought to be able to ask a question that involves connecting data across the organization, be able to run a company effectively, and especially to be able to respond to unexpected events. Most organizations are missing this ability to connect all the data together.” Tim Berners Lee -(English computer scientist, best known as the inventor of the World Wide Web)

Image Credit – Device42

We have discussed vertical industry business challenges across sectors like Banking, Insurance, Retail and Manufacturing in some level of detail over the last two years. Though enterprise business models vary depending on the industry, there is a common Digital theme raging across all industries in 2017. Every industry is witnessing an upswing in the numbers of younger and digitally aware customers. Estimates of this influential population are as high as 40% in areas such as Banking and Telecommunications. They represent a tremendous source of revenue but can also defect just as easily if the services offered aren’t compelling or easy to use – as the below illustration re the Banking industry illustrates.

These customers are Digital Natives i.e they are highly comfortable with technology and use services such as Google, Facebook, Uber, Netflix, Amazon, Google etc almost hourly in their daily lives. As a consequence, they expect a similar seamless & contextual experience while engaging with Banks, Telcos, Retailers, Insurance companies over (primarily) digital channels. Enterprises then have a dual fold challenge – to store all this data as well as harness it for real time insights in a way that is connected with internal marketing & sales.

As many studies have shown, companies that constantly harness data about their customers and perform speedy advanced analytics outshine their competition. Does that seem a bombastic statement? Not when you consider that almost half of all online dollars spent in the United States in 2016 were spent on Amazon and almost all digital advertising revenue growth in 2016 was accounted by two biggies – Google and Facebook. [1]

According to The Economist, the world’s most valuable commodity is no longer Oil, but Data. The few large companies depicted in the picture are now virtual monopolies[2] (Image Credit – David Parkins)

Let us now return to the average Enterprise. The vast majority of industrial applications (numbering around an average of 1000+ applications at large enterprises according to research firm NetSkope) generally lag the innovation cycle. This is because they’re created using archaic technology platforms by teams that conform to rigid development practices. The Fab Four (Facebook Amazon Google Netflix) and others have shown that Enterprise Architecture is a business differentiator but the Fortune 500 have not gotten that message as yet. Hence they largely predicate their software development on vendor provided technology instead of open approaches. This anti-pattern is further exacerbated by legacy organizational structures which ultimately leads to these applications holding a very parochial view of customer data. These applications can typically be classified in one of the buckets – ERP, Billing Systems, Payment Processors, Core Banking Systems, Service Management Systems, General Ledger, Accounting Systems, CRM, Corporate Email, Salesforce, Customer On-boarding etc etc. 

These enterprise applications are then typically managed by disparate IT groups scattered across the globe. They often serve different stakeholders who seem to have broad overlapping interests but have conflicting organizational priorities for various reasons. These applications then produce and data in silos – localized by geography, department, or, line of business, or, channels.

Organizational barriers only serve to impede data sharing for various reasons –  ranging from competitive dynamics around who owns the customer relationship, regulatory reasons to internal politics etc. You get the idea, it is all a giant mishmash.

Before we get any further, we need to define that dreaded word – Silo.

What Is a Silo?

A mind-set present in some companies when certain departments or sectors do not wish to share information with others in the same company. This type of mentality will reduce the efficiency of the overall operation, reduce morale, and may contribute to the demise of a productive company culture. (Source- Business Dictionary -[2])

Data is the Core Asset in Every Industry Vertical but most of it is siloed in Departments, Lines of Business across Geographies..

Let us be clear, most Industries do not suffer from a shortage of data assets. Consider a few of the major industry verticals and a smattering of the kinds of data that players in these areas commonly possess – 

Data In Banking– 

  • Customer Account data e.g. Names, Demographics, Linked Accounts etc
  • Core Banking Data going back decades
  • Transaction Data which captures the low level details of every transaction (e.g debit, credit, transfer, credit card usage etc)
  • Wire & Payment Data
  • Trade & Position Data
  • General Ledger Data e.g AP (accounts payable), AR (accounts receivable), cash management & purchasing information etc.
  • Data from other systems supporting banking reporting functions.

DATA IN HEALTHCARE–  

  • Structured Clinical data e.g. Patient ADT information
  • Free hand notes
  • Patient Insurance information
  • Device Telemetry 
  • Medication data
  • Patient Trial Data
  • Medical Images – e.g. CAT Scans, MRIs, CT images etc

DATA IN MANUFACTURING– 

  • Supply chain data
  • Demand data
  • Pricing data
  • Operational data from the shop floor 
  • Sensor & telemetry data 
  • Sales campaign data

The typical flow of data in an enterprise follows a familiar path –

  1. Data is captured in large quantities as a result of business operations (customer orders, e commerce transactions, supply chain activities, Partner integration, Clinical notes et al). These feeds are captured using a combination of techniques – mostly ESB (Enterprise Service Bus) and Message Brokers.
  2. The raw data streams then flow into respective application owned silos where over time a great amount of data movement (via copying, replication and transformation operations – the dreaded ETL) occurs using proprietary vendor developed systems. Vendors in this space have not only developed shrink wrapped products that make them tens of billions of dollars annually but also imposed massive human capital requirements of enterprises to program & maintain these data flows.
  3. Once all of the relevant data has been normalized, transformed and then processed, it  is then copied over into business reporting systems where it is used to perform a range of functions – typically for reporting for use cases such as Customer Analytics, Risk Reporting, Business Reporting, Operational improvements etc.
  4. Rinse and repeat..

Due to this old school methodology of working with customer, operational data, most organizations have no real time data processing capabilities in place & they thus live in a largely reactive world. What that means is that their view of a given customers world is typically a week to 10 days old.

Another factor to consider is – the data sources described out above are what can be described as structured data or traditional data. However, organizations are now on-boarding large volumes of unstructured data as has been captured in the below blogpost. Oftentimes, it is easier for Business Analysts, Data Scientists and Data Architects to get access to external data faster than internal data.

Getting access to internal data typically means jumping over multiple hoops from which department is paying for the feeds, the format of the feeds, regulatory issues, cyber security policy approvals, SOX/PCI compliance et al. The list is long and impedes the ability of business to get things done quickly.

Infographic: The Seven Types of Non Traditional Data that can drive Business Insights

Data and Technical Debt… 

Since Gene Kim coined the term ‘Technical Debt‘ , it has typically been used in an IT- DevOps- Containers – Data Center context. However, technology areas like DevOps, PaaS, Cloud Computing with IaaS, Application Middleware, Data centers etc in and of themselves add no direct economic value to customers unless they are able to intelligently process Data. Data is the most important technology asset compared to other IT infrastructure considerations. You do not have to take my word for that. It so happens that The Economist just published an article where they discuss the fact that the likes of Google, Facebook, Amazon et al are now virtual data monopolies and that global corporations are way way behind in the competitive race to own Data [1].

Thus, it is ironic that while the majority of traditional Fortune 500 companies are still stuck in silos, Silicon Valley companies are not just fast becoming the biggest owners of global data but are also monetizing them on the way to record profits. Alphabet (Google’s corporate parent), Amazon, Apple, Facebook and Microsoft are the five most valuable listed firms in the world. Case in point – their profits are around $25bn  in the first quarter of 2017 and together they make up more than half the value of the NASDAQ composite index. [1]

The Five Business Challenges that Data Fragmentation causes (or) Death by Silo … 

How intelligently a company harnesses it’s data assets determines it’s overall competitive position. This truth is being evidenced in sectors like Banking and Retail as we have seen in previous posts.

What is interesting, is that in some countries which are concerned about the pace of technological innovation, National regulatory authorities are creating legislation to force slow moving incumbent corporations to unlock their data assets. For example, in the European Union as a result of regulatory mandates – the PSD2 & Open Bank Standard –  a range of agile players across the value chain (e.g FinTechs ) will soon be able to obtain seamless access to a variety of retail bank customer data by accessing using standard & secure APIs.

Once obtained the data can help these companies can reimagine it in manifold ways to offer new products & services that the banks themselves cannot. A simple use case can be that they can provide personal finance planning platforms (PFMs) that help consumers make better personal financial decisions at the expense of the Banks owning the data.  Surely, FinTechs have generally been able to make more productive use of client data than have banks. They do this by providing clients with intuitive access to cross asset data, tailoring algorithms based on behavioral characteristics and by providing clients with a more engaging and unified experience.

Why cannot the slow moving established Banks do this? They suffer from a lack of data agility due to the silos that have been built up over years of operations and acquisitions. None of these are challenges for the FinTechs which can build off of a greenfield technology environment.

To recap, let us consider the five ways in which Data Fragmentation hurts enterprises – 

#1 Data Silos Cause Missed Top line Sales Growth  –

Data produced by disparate applications which use scattered silos to store them causes challenges in enabling a Single View of a customer across channels, products and lines of business. This then makes everything across the customer lifecycle a pain – ranging from smooth on-boarding, to customer service to marketing analytics. Thus, it impedes an ability to segment customers intelligently, perform cross sell & up sell. This sheer inability to understand customer journeys (across different target personas) also leads customer retention issues. When underlying data sources are fragmented, communication between business teams moves over to other internal mechanisms such as email, chat and phone calls etc. This is a recipe for delayed business decisions which are ultimately ineffective as they depend more on intuition than are backed by data. 

#2 Data Silos are the Root Cause of Poor Customer Service  –

Across industries like Banking, Insurance, Telecom & Manufacturing, the ability to get a unified view of the customer & their journey is at the heart of the the enterprises ability to understand their customers preferences & needs. This is also crucial in promoting relevant offerings and in detecting customer dissatisfaction. Currently most enterprises are woefully inadequate at putting together this comprehensive Single View of their Customers (SVC). Due to operational silos, each department possess a silo & limited view of the customer across other silos (or channels). These views are typically inconsistent in and of themselves as they lack synchronization with other departments. The net result is that the companies typically miss a high amount of potential cross-sell and up-sell opportunities.

#3 – Data Silos produce Inaccurate Analytics 

First off most Analysts need to wait long times to acquire the relevant data they need to test their hypotheses. Thus, since the data they work on is of poor quality as a result of fragmentation, so are the analytics operate on the data.

Let us take an example in Banking, Mortgage Lending, an already complex business process has been made even more so due to the data silos built around Core Banking, Loan Portfolio, Consumer Lending applications.Qualifying borrowers for Mortgages needs to be based on not just historical data that is used as part of the origination & underwriting process (credit reports, employment & income history etc) but also data that was not mined hitherto (social media data, financial purchasing patterns,). It is a well known fact there are huge segments of the population (especially the millennials) who are broadly eligible but under-banked as they do not satisfy some of the classical business rules needed to obtain approvals on mortgages.  Each of the silos store partial customer data. Thus, Banks do not possess an accurate and holistic picture of a customer’s financial status and are thus unable to qualify the customer for a mortgage in quick time with the best available custom rate.

#4 – Data Silos hinder the creation of new Business Models  


The abundance of data created over the last decade is changing the nature of business. If it follows that enterprise businesses are being increasingly built around data assets, then it must naturally follow that data as a commodity can be traded or re-imagined to create revenue streams off it. As an example, pioneering payment providers now offer retailers analytical services to help them understand which products perform best and how to improve the micro-targeting of customers. Thus, data is the critical prong of any digital initiative. This has led to efforts to monetize on data by creating platforms that either support ecosystems of capabilities. To vastly oversimplify this discussion ,the ability to monetize data needs two prongs – to centralize it in the first place and then to perform strong predictive modeling at large scale where systems need to constantly learn and optimize their interactions, responsiveness & services based on client needs & preferences. Thus, Data Silos hurt this overall effort more than the typical enterprise can imagine.

#5 – Data Silos vastly magnify Cyber, Risk and Compliance challenges – 

Enterprises have to perform a range of back-office functions such as Risk Data Aggregation & Reporting, Anti Money Laundering Compliance and Cyber Security Monitoring.

Cybersecurity – The biggest threat to the Digital Economy..(1/4)

It must naturally follow that as more and more information assets are stored across the organization, it is a manifold headache to deal with securing each and every silo from a range of bad actors – extremely well funded and sophisticated adversaries ranging from criminals to cyber thieves to hacktivists. On the business compliance front, sectors like Banking & Insurance need to maintain large AML and Risk Data Aggregation programs – silos are the bane of both. Every industry needs fraud detection capabilities as well, which need access to unified data.

Conclusion

My intention for this post is clearly to raise more questions than provide answers. There is no question Digital Platforms are a massive business differentiator but they need to have access to an underlying store of high quality, curated, and unified data to perform their magic. Industry leaders need to begin treating high quality Data as the most important business asset they have & to work across the organization to rid it of Silos.

References..

[1]  The Economist – “The world’s most valuable resource is no longer oil, but data” – http://www.economist.com/news/leaders/21721656-data-economy-demands-new-approach-antitrust-rules-worlds-most-valuable-resource

[2] Definition of Silo Mentality – http://www.businessdictionary.com/definition/silo-mentality.html

Here Is What Is Causing The Great Brick-And-Mortar Retail Meltdown of 2017..(1/2)

Amazon and other pure plays are driving toward getting both predictive and prescriptive analytics. They’re analyzing and understanding information at an alarming rate. Brands have pulled products off of Amazon because they’re learning more about them than the brands themselves.” — Todd Michaud, Founder and CEO of Power Thinking Media

By April 2017,17 major retailers announced plans to close stores (Image Credit: Clark Howard)

We are witnessing a meltdown in Storefront Retail..

We are barely halfway through 2017, and the US business media is rife with stories of major retailers closing storefronts. The truth is inescapable that the Retail industry is in the midst of structural change. According to a research report from Credit Suisse, around 8,600 brick-and-mortar stores will shutter their doors in 2017. The number in 2016 was 2,056 stores and 5,077 in 2015 which points to industry malaise [1].

The Retailer’s bigger cousin – the neighborhood Mall – is not doing any better. There are around 1,200 malls in the US today and that number is forecast to decline to just about 900 in a decade.[3]

It is clear that in the coming years, Retailers (and malls) across the board will remain under pressure due to a variety of changes – technological, business model and demographic.

So what can legacy Retailers do to compete with and disarm the online upstart?

Six takeaways for Retail Industry watchers..

Six takeaways that should have industry watchers take notice from the recent headlines –

  1. The brick and mortar retail store pullback has accelerated in 2017 – an year of otherwise strong economic expansion. Typical consumer indicators that influence consumer spending on retail are generally pointing upwards. Just sample the financial data – the US has seen increasing GDP for eight straight years, the last 18 months have seen wage growth for middle & lower income Americans and gas prices are at all time lows.[3] These kinds of relatively strong consumer data trends cannot explain a slowdown in physical storefronts. Consumer spending is not shrinking to due to declining affordability/spending power.
  2. Retailers that have either declared bankruptcy or announced large scale store closings include marquee names across the different categories of retail. Ranging from Apparel to Home Appliances to Electronics to Sporting Goods. Just sample some of the names – Sports Authority, RadioShack, HHGregg, American Apparel, Bebe Stores, Aeropostale, Sears, Kmart, Macy’s, Payless Shoes, JC Penney etc. So this is clearly a trend across various sectors in retail and not confined to a given area, for instance, women’s apparel.
  3. Some of this “Storefront Retail bubble burst” can definitely be attributed to hitherto indiscriminate physical retail expansion. The first indicator is here is in the glut of residual excess retail space.  The WSJ points out that the retail expansion dates back almost 30 years ago when retailers began a “land grab” to open more stores – not unlike the housing boom a decade or so ago. [1] North America now has a glut of both retail stores and shopping malls while per capita sales has begun declining. The US especially has almost five times retail space per capita compared to the UK. American consumers are also swapping materialism for more experiences.[3]  Thus, an over-buildout of retail space is one of the causes of the ongoing crash.

    The US has way more shopping space compared to the rest of the world. (Credit – Cowan and Company)
  4. The dominant retail trend in the world is online ‘single click’ shopping. This is evidenced by declining in-store Black Friday sales in 2016 when compared with record Cyber Monday (online) sales. As online e-commerce volume increases year on year, online retailers led by Amazon are surely taking market share away from the struggling brick-and mortar Retailer who has not kept up with the pace of innovation. The uptick in online retail is unmistakeable as evidenced by the below graph (src – ZeroHedge) depicting the latest retail figures. Department-store sales rose 0.2% on the month, but were down 4.5% from a year earlier. Online retailers such as Amazon, posted a 0.6% gain from the prior month and a 11.9% increase from a year earlier.[3]

    Retail Sales – Online vs In Store Shopping (credit: ZeroHedge)
  5. Legacy retailers are trying to play catch-up with the upstarts who excel at technology. This has sometimes translated into acquisitions of online retailers (e.g. Walmart’s buy of Jet.com). However, the Global top 10 Retailers are dominated by the likes of Walmart, Costco, the Kroger, Walgreens etc. Amazon comes in only at #10 which implies that this battle is only in it’s early days. However, legacy retailers are saddled by huge fixed costs & their investors prefer dividend payouts to investments in innovations. Thus their CEOs are incentivized to focus on the next quarter, not the next decade like Amazon’s Jeff Bezos who is famously known to not evidence any signs of increasing Amazon’s profitability. Though traditional retailers have begun accelerating investments (both organic and via acquisition) in the critical areas of Cloud Computing, Big Data,Mobility and Predictive Analytics – the web scale majors such as Amazon are far far ahead of typical Retail IT shop.

  6. The fastest growing Retail industry brands are companies that use Data as a core business capability to impact the customer experience versus as just another component of an overall IT system. Retail is a game of micro customer interactions that drive sales and margin. This implies a Retailer’s ability to work with realtime customer data – whether it’s sentiment data, clickstream data and historical purchase data to drive marketing promotions, personally relevant services, order fulfillment, show-rooming, loyalty programs etc etc.On the back end, the ability to streamline operations by pulling together data from operations, supply chains are helping retailers fine-tune & automate operations especially from a delivery standpoint.

    In Retail, Technology Is King..

    So, what makes Retail somewhat of a unique industry in terms of it’s data needs? I posit that there are four important characteristics –

    • First and foremost, Retail customers especially the millennials are very open about sharing their brand preferences and experiences on social media. There is a treasure trove of untapped data and similar out there. Data needs to be collected and monetized on. We will explore this in more detail in the next post.
    • Secondly, leaders such as Amazon use customer, product data and a range of other technology capabilities to shape the customer experience versus the other way around for traditional retailers. They do this based on predictive analytic approaches such as machine learning and deep learning. Case in point is Amazon which has now morphed from an online retailer to a Cloud Computing behemoth with it’s market leading AWS (Amazon Web Services). In fact it’s best in class IT enabled it to experiment with retail business models. E.g. The Amazon Prime subscription at $99-a-year Amazon Prime subscription, which includes free two delivery, music and video streaming service that competes with Netflix. As of March 31, 2017 Amazon had 80 million Prime subscribers in the U.S , an increase of 36 percent from a year earlier, according to Consumer Intelligence Research Partners.[3]
    • Thirdly, Retail organizations need to become Data driven businesses. What does that mean or imply? They need to rely on data to drive every core business process – e.g. realtime insights about customers, supply chains, order fulfillment and inventory. This data however spans every kind from traditional structured data (sales data, store level transactions, customer purchase histories, supply chain data, advertising data etc) to non traditional data (social media feeds as there is a strong correlation between the products people rave about and what they ultimately purchase), location data, economic performance data etc). This Data variety represents a huge challenge to Retailers in terms of managing, curating and analyzing these feeds.
    • Fourth, Retail needs to begin aggressively adopting the IoT capabilities they already have in place in the area of Predictive Analytics. This implies tapping and analyzing data from in store beacons, sensors and actuators across a range of use cases from location based offers to restocking shelves.

      ..because it enables new business models..

      None of the above analysis claims that physical stores are going away. They serve a very important function in allowing consumers a way to try on products and allowing for the human experience. However, online definitely is where the growth primarily will be.

      The Next and Final Post in this series..

      It is very clear from the above that it now makes more sense to talk about a Retail Ecosystem which is composed of store, online, mobile and partner storefronts.

      In that vein, the next post in this two part series will describe the below four progressive strategies that traditional Retailers can adopt to survive and favorably compete in today’s competitive (and increasingly online) marketplace.

      These are –

    • Reinventing Legacy IT Approaches – Adopting Cloud Computing, Big Data and Intelligent Middleware to re-engineer Retail IT

    • Changing Business Models by accelerating the adoption of Automation and Predictive Analytics – Increasing Automation rates of core business processes and infusing them with Predictive intelligence thus improving customer and business responsiveness

    • Experimenting with Deep Learning Capabilities  –the use of Advanced AI such as Deep Neural Nets to impact the entire lifecycle of Retail

    • Adopting a Digital or a ‘Mode 2’ Mindset across the organization – No technology can transcend a large ‘Digital Gap’ without the right organizational culture

      Needless to say, the theme across all of the above these strategies is to leverage Digital technologies to create immersive cross channel customer experiences.

References..

[1] WSJ – ” Three hard lessons the internet is teaching traditional stores” – https://www.wsj.com/articles/three-hard-lessons-the-internet-is-teaching-traditional-stores-1492945203

[2] The Atlantic  – “The Retail Meltdown” https://www.theatlantic.com/business/archive/2017/04/retail-meltdown-of-2017/522384/?_lrsc=2f798686-3702-4f89-a86a-a4085f390b63

[3] WSJ – ” Retail Sales fall for the second straight month” https://www.wsj.com/articles/u-s-retail-sales-fall-for-second-straight-month-1492173415

Demystifying Digital – Reference Architecture for Single View of Customer / Customer 360..(3/3)

The first post in this three part series on Digital Foundations @ http://www.vamsitalkstech.com/?p=2517 introduced the concept of Customer 360 or Single View of Customer (SVC).  This second post in the series discussed the concept of Customer Journey Mapping (CJM) – http://www.vamsitalkstech.com/?p=3099 . We discussed specific benefits from both a business & operational standpoint that are enabled by SVC & CJM. The third & final post will focus on a technical design & architecture needed to achieve both these capabilities.

Business Requirements for Single View of Customer & Customer Journey Mapping…

The following key business requirements need to be supported for three key personas- Customer, Marketing & Customer Service – from a SVC and CJM standpoint.

  1. Provide an Integrated Experience: A fully integrated omnichannel experience for both the customer and internal stakeholder (marketing, customer service, regulatory, managerial etc) roles. This means a few important elements – consistent information across all touchpoints, the right information to the right user at the right time, an ability to view the CJM graph with realtime metrics on Customer Lifetime Value (CLV) etc.
  2. Continuously Learning Customer Facing System: An ability for the customer facing portion of the architecture to learn constantly to fine-tune it’s understanding of the customers real time picture. This includes an ability to understand the customer’s journey.
  3. Contextual yet Seamless Movement across Channels: The ability for customers to transition seamlessly from one channel to the other while conducting business transactions.
  4. Ability to introduce Marketing Programs for existing Customers: An ability to introduce marketing and customer retention and other loyalty programs in a dynamic manner. These include and ability to combine historical data with real time data about customer interactions and other responses like clickstreams – to provide product recommendations and real time offers.
  5. Customer Acquisition: An ability to perform low cost customer acquisition and to be able to run customized offers for segments of customers from a back-office standpoint.

Key Gaps in existing Single View (SVC) Architectures ..

It needs to be kept in mind that every organization is different from an IT legacy investment and operational standpoint. As such, a “one-size-fits-all” architecture is impossible to create. However, highlighted below are some common key data and application architecture gaps that I have observed from a data standpoint while driving to a SVC (Single View of Customer) with multiple leading enterprises.

  1. The lack of a single, unique & global customer identifier – The need to create a single universal customer identifier (based on various departmental or line of business identifiers) and to use it as a primary key in the customer master list
  2. Once the identifier is created in either the source system or in the datalake, organizations need to figure out a way to cascade that identifier into the Book of Record systems (CRM systems, webapps and ERP systems) so that the architecture can begin knitting together a single view of the customer. This may also involve periodically go out across the BOR systems, link all the customers data and pull the data into the lake;
  3. Many companies deal with multiple customer on-boarding systems. At some point, these  on-boarding processes need  to be centralized. For instance in Banking esp In Capital markets, customer on-boarding done in six or seven different areas; all of these ideally need to be consolidated into one.
  4. Graph Data Semantics – Once created, the Master Customer identifier should be mapped to all the other identifiers lines of business use to uniquely identify their customer; the ability to use simple or more complex matching techniques (Rule based matching, machine learning based matching & search based matching) is highly called for.
  5. MDM (Master Data Management) systems have traditionally automated some of this process by creating & owning that unique customer identifier. However Big Data capabilities help by linking that unique customer identifier to all the other ways the customer may be mapped across the organization. To this end,  data may be exported into an MDM system backed by a traditional RDBMS; or; the computation of the unique identifier can be done in a data lake and then exported into an MDM system.

Let us discuss the generic design of the architecture (depicted above) with a focus on the following subsystems –

A Reference Architecture for Single View of Customer/ Customer 360
  1. At the very top, different channels depict with different touch points In today’s connected world, the customer experience spans multiple different touch points throughout the customer lifecycle. A customer should be able to move through multiple different touch points during the buying process. Customers should be able to start, pause transactions (e.g. An Auto Loan application) from one channel and restart/complete them from another.
  2. A Big Data enabled application architecture is chosen. This needs to account for two different data processing paradigms. The first is a realtime component. The architecture must be capable of handling events within a few milliseconds. The second is an ability to handle massive scale data analysis in a retrospective manner. Both these components are provided by a Hadoop stack. The real time component leverages – Apache NiFi, Apache HBase, HDFS, Kafka, Storm and Spark. The batch component leverages  HBase, Apache Titan, Apache Hive, Spark and MapReduce.
  3. The range of Book of Record and external systems send data into the central datalake. Both realtime and batch components highlighted above send the data into the lake. The design of the lake itself will be covered in more detail in the below section.
  4. Starting from the upper-left side, we have the Book of Record Systems sending across transactions. These are ingested into the lake using any of the different ingestion frameworks provided in Hadoop. E.g. Flume, Kafka, Sqoop, HDFS API for batch transfers etc.  The ingestion layer depicted is based on Apache NiFi and is used to load data into the data lake.  Functionally, it is made up of real time data loaders and end of day data loaders. The real time loaders load the data as it is created in the feeder systems, the EOD data loaders will adjust the data end of the day based on the P&L sign off and the end of day close processes.  The main data feeds for the system will be from the book of record transaction systems (BORTS) but there may also be multiple data feeds from transaction data providers and customer information systems.
  5. The UI Framework is standardized across all kinds of clients. For instance this could be an HTML 5 GUI Framework that contains reusable widgets that can be used for mobile and browser based applications.  The framework also need to deal with common mobile issues such as bandwidth and be able to automatically throttle the data back where bandwidth is limited.It also needs to facilitate the construction of large user defined pivot tables for ad hoc reporting. It utilizes UI framework components for its GUI construction and communicates with the application server via the web services layer.
  6. API access is also provided by Web Services for partner applications to leverage: This is the application layer that that provides a set of RESTful web services that control the GUI behavior and that control access to the persistent data and the data that is cached on the data fabric.
  7. The transactions are taken through the pipeline of enrichment and the profiles of customers are stored in HBase. .
  8. The core data processing platform is then based on a datalake pattern which has been covered in this blog before. It includes the following pattern of processing.
    1. Data is ingested real time into a HBase database (which uses HDFS as the underlying storage layer). Tables are designed in HBase to store the profile of a trade and it’s lifecycle.
    2. Producers are authenticated at the point of ingest.
    3. Once the data has been ingested into HDFS, it is taken through a pipeline of processing (L0 to L3) as depicted in the below blogpost.

      http://www.vamsitalkstech.com/?p=667

  9. Speed Layer: The computational grid that makes up the Speed layer can be a distributed in memory data fabric like Infinispan or GemFire, or a computation process can be overlaid directly onto a stateful data fabric technology like Spark or GemFire. The choice is dependent of the language choices that have been made in building the other key analytic libraries. If multiple language bindings are required (e.g. C# & Java) then the data fabric will typically be a different product than the Grid.

Data Science for Customer 360

 Consider the following usecases that are all covered under Customer 360 –

  1. The ability to segment customers into categories based on granular data attributes
  2. Improve customer targeting for new promotions & increasing acquisition rate
  3. Increasing cross sell and upsell rates
  4. Understanding influencers among customer segments & helping these net promoters recommend products to other customers
  5. Performing market basket analysis of what products/services are typically purchased together
  6. Understanding customer risk profiles
  7. Creating realtime views of customer lifetime value (CLV)
  8. Reducing customer attrition

The obvious capability that underlies all of these is Data Science. Thus, Predictive Analytics is the key compelling paradigm that enables the buildout of the dynamic Customer 360.

The Predictive Analytics workflow always starts with a business problem in mind. Examples of these would be “A marketing project to detect which customers are likely to buy new products or services in the next six months based on their historical & real time product usage patterns – which are denoted by x,y or z characteristics” or “Detect realtime fraud in credit card transactions.” or “Perform certain algorithms based on the predictions”. In usecases like these, the goal of the data science process is to be able to segment & filter customers by corralling them into categories that enable easy ranking. Once this is done, the business is involved to setup easy and intuitive visualization to present the results. In the machine learning process, an entire spectrum of algorithms can be tried to solve such business problems.

A lot of times, business groups working on Customer 360 projects have a hard time explaining what they would like to see – both data and the visualization. In such cases, a prototype makes things way more easy from a requirements gathering standpoint.  Once the problem is defined, the data scientist/modeler identifies the raw data sources (both internal and external) which comprise the execution of the business challenge.  They spend a lot of time in the process of collating the data (from Oracle, DB2, Mainframe, Greenplum, Excel sheets, External datasets etc). The cleanup process involves fixing a lot of missing values, corrupted data elements, formatting fields that indicate time and date etc.

The Data Scientist working with the business needs to determine how much of this raw data is useful and how much of it needs to be massaged to create a Customer 360 view. Some of this data needs to be extrapolated to form the features using formulas – so that a model can be created. The models created often involve using languages such as R and Python.

Feature engineering takes in business features in the form of feature vectors and creates predictive features from them. The Data Scientist takes the raw features and creates a model using a mix of various algorithms. Once the model has been repeatedly tested for accuracy and performance, it is typically deployed as a service.

The transformation phase involves writing code to be able to to join up like elements so that a single client’s complete dataset is gathered in the Data Lake from a raw features standpoint.  If more data is obtained as the development cycle is underway,  the Data Science team has no option but to go back & redo the whole process.

Models as a Service (MaaS) is the Data Science counterpart to Software as a Service.The MaaS takes in business variables (often hundreds of them as inputs) and provides as output business decisions/intelligence, measurements, visualizations that augment decision support systems.

Once these models are deployed and updated nightly based on their performance – the serving layer takes advantage of them to drive real time 360 decisioning.

To Sum Up…

In this short series we have discussed that customers and data about their history, preferences, patterns of behavior, aspirations etc are the most important corporate asset. Big Data technology and advances made in data storage, processing and analytics can help architect a dynamic Single View that can help maximize competitive advantage across every industry vertical.

The Three Habits of Highly Effective Real Time Enterprises…

All I do is sit at home and watch Netflix. ” – Kylie Irving

The Power of Real Time

Anyone who has used Netflix to watch a movie or used Uber to hail a ride knows how simple, time efficient, inexpensive and seamless it is to do either. Chances are that most users of Netflix and Uber would never again use a physical video store or a taxi service unless they did not have a choice. Thus it should not come as a surprise that within a short span of a few years, these companies have acquired millions of delighted customers using their products (which are just apps) while developing market capitalizations of tens of billions of dollars.

As of early 2016, Netflix had about 60 million subscribers[1] and is finding significant success in producing its own content thus continuing to grab market share from the established players like NBC, Fox and CBS. Most Netflix customers opt to ditch Cable and are choosing to stream content in real time across a variety of devices.

Uber is nothing short of a game changer in the ride sharing business. Not just in busy cities but also in underserved suburban areas, Uber services save plenty of time and money in enabling riders to hail cabs. In congested metro areas, Uber also provides near instantaneous rides for a premium which motivates more drivers to service riders. As someone, who has used Uber in almost every continent in the world, it is no surprise that as of 2016, Uber dominates in terms of market coverage, operating in 400 cities in 70+ countries.[2]

What is the common theme in ordering a cab using Uber or a viewing a movie on Netflix ?

Answer – Both services are available at the click of a button, they’re lightning quick and constantly build on their understanding of your tastes, ratings and preferences. In short, they are Real Time products.

Why is Real Time such a powerful business capability?

In the Digital Economy, the ability to interact intelligently with consumers in real time is what makes possible the ability to create new business models and to drive growth in existing lines of business.

So, what do Real Time Enterprises do differently

What underpins a real time enterprise are three critical factors or foundational capabilities as shown in the below illustration. For any enterprise to be considered real time, the presence of these three components is what decides the pace of consumer adoption. Real time capabilities are part business innovation and part technology.

Let us examine these…

#1 Real Time Businesses possess a superior Technology Strategy

First and foremost, business groups must be able to define a vision for where they would like their products and services to be able to do to acquire younger and more dynamic consumers.

As companies adopt new business models, the technologies that support them must also change along with the teams that deliver them. IT departments have to move to more of a service model while delivering agile platforms and technology architectures for business lines to develop products around.

Why Digital Disruption is the Cure for the Common Data Center..

It needs to be kept in mind that these new approaches should be incubated slowly and gradually. They must almost always be business or usecase driven at first.

#2 Real Time Enterprises are extremely smart about how they leverage data

The second capability is an ability to break down data silos in an organization. Most organizations have no idea of what to do with all the data they generate. Sure, they use a fraction of it to perform business operations but beyond that most of this data is simply let go. As a consequence they fail to view their customer as a dynamic yet unified entity. Thus, they have no idea as to how to market more products or to estimate the risk being run on their behalf etc. The ability to add  is a growing emphasis on the importance of the role of the infrastructure within service orientation. As the common factor that is present throughout an organization, the networking infrastructure is potentially the ideal tool for breaking down the barriers that exist between the infrastructure, the applications and the business. Consequently, adding greater intelligence into the network is one way of achieving the levels of virtualization and automation that are necessary in a real-time operation.

Across Industries, Big Data Is Now the Engine of Digital Innovation..

#3 Real Time Enterprises use Predictive Analytics and they automate the hell out of every aspect of their business

Real time enterprises get the fact that using only Business Intelligence (BI) dashboards is largely passe. BI implementations base their insights on data that is typically stale, (even by days). BI operates in a highly siloed manner based on long cycles of data extraction, transformation, indexing etc.

However, if products are to be delivered over mobile and other non traditional channels, then BI is ineffective at providing just in time analytics that can drive an understanding of a dynamic consumers wants and needs. The Real Time enterprise demands that workers at many levels ranging from line of business managers to executives have fresh, high quality and actionable information on which they can base complex yet high quality business decisions. These insights are only enabled by Data Science and Business Automation. When deployed strategically – these techniques can scale to enormous volumes of data and help reason over them reducing manual costs.  They can take on business problems that can’t be managed manually because of the huge amount of data that must be processed.

Why Big Data & Advanced Analytics are Strategic Corporate Assets..

Conclusion..

Real time Enterprises do a lot of things right. They constantly experiment with creating new and existing business capabilities with a view to making them appealing to a rapidly changing clientele. They refine these using constant feedback loops and create cutting edge technology stacks that dominate the competitive landscape. Enterprises need to make the move to becoming Real time.

Neither Netflix nor Uber are sitting on their laurels. Netflix (which discontinued mail in DVDs and moved to an online only model a few years ago) continues to expand globally betting that the convenience of the internet will eventually turn it into a major content producer. Uber is prototyping self driving cars in Pittsburgh and intends to rollout its own fleet of self driving vehicles thus replacing it’s current 1.5 million drivers and also beginning a food delivery business around urban centers eventually[4].

Sure, the ordinary organization is no Netflix or Uber and when a journey such as the one to real time capabilities is embarked on, things can and will go wrong in this process. However, the cost of continuing with business as usual can be incalculable over the next few years.  There is always a startup or a competitor that wants to deliver what you do at much lower cost and at a lightning fast clip. Just ask Blockbuster and the local taxi cab company.

References

[1] Netflix Statistics 2016 – Statistica.com

[2] Fool.com “Just how dominant is Uber” – http://www.fool.com/investing/general/2015/05/24/lyft-vs-uber-just-how-dominant-is-uber-ridesharing.aspx

[3] Expanded Ramblings – “Uber Statistics as of Oct 2016” http://expandedramblings.com/index.php/uber-statistics/

[4] Uber Self driving cars debut in Pittsburgh – “http://www.wsj.com/articles/inside-ubers-new-self-driving-cars-in-pittsburgh-1473847202”

Why Digital Disruption is the Cure for the Common Data Center..

The foundation of digital business is the boundary-free enterprise, which is made possible by an array of time- and location-independent computing capabilities – cloud, mobile, social and data analytics plus sensors and APIs. There are no shortcuts to the digital enterprise.”

— Mike West,Analyst,Saugatack Research 2015

At its core Digital is a fairly straightforward concept. It is essentially about offering customers more contextual and relevant experiences while creating internal teams that can turn on a dime to serve customers. It is clear that these kinds of consumer capabilities just cannot be offered using an existing technology stack. This blogpost seeks to answer what this next generation computing stack may look like.

What Digital has in Store for Enterprises…

Digital transformation is a daily fact of life at web scale shops like Google, Amazon, Apple, Facebook and Netflix. These mega shops have built not just intuitive and appealing applications but have gradually evolved them into platforms that offer discrete marketplaces that serve global audiences. They also provide robust support for mobile applications that deliver services such as content, video, e-commerce, gaming etc via such channels. In fact they have heralded the age of new media and in doing so have been transforming both internally (business models, internal teams & their offerings) as well as externally.

CXOs at established Fortune 1000 enterprises were unable to find resonance in these stories from the standpoint of their enterprise’s reinvention. This makes a lot of sense as these established companies have legacy investments and legacy stakeholders – both of which represent change inhibitors that the FANGs (Facebook Amazon Netflix and Google) did not have. Enterprise practitioners need to understand how Digital technology can impact both existing technology investments and the future landscape.

Where are most Enterprises at the moment…

Much of what exists in the datacenters across organizations are antiquated from a technology stack. These range from hardware platforms to network devices & switches to monolithic applications running on them. Connecting these applications are often proprietary or manual integraton architectures. There are inflexible, proprietary systems & data architectures, lots of manual processes, monolithic applications and tightly coupled integration. Rapid provisioning of IT resources is a huge bottleneck which frequently leads to lines of business adopting the public cloud to run their workloads.  According to Rakesh Kumar, managing vice president at Gartner – “For over 40 years, data centers have pretty much been a staple of the IT ecosystem,Despite changes in technology for power and cooling, and changes in the design and build of these structures, their basic function and core requirements have, by and large, remained constant. These are centered on high levels of availability and redundancy, strong, well-documented processes to manage change, traditional vendor management and segmented organizational structures. This approach, however, is no longer appropriate for the digital world.” [2]

On that note, the below blogpost had captured the three essential technology investments that make up Digital Transformation.

The Three Core Competencies of Digital – Cloud, Big Data & Intelligent Middleware

If Digital has to happen, IT is one of the largest stakeholders…

Digital applications present seamless expereinces across channels & devices, are tailored to individual customers needs, understand their preferences & need to be developed in an environment of constant product innovation.

So, which datacenter capabilities are required to deliver this?

Figuring out the best architectural foundation to support , leverage & monetize on digital experiences is complex.  The past few years have seen the rapid evolution of many transformational technologies—Big Data, Cognitive Computing, Cloud technology (Public clouds, OpenStack, PaaS, Containers, Software-defined networking & storage), the Blockchain – the list goes on and on. These are leading enterprises to a smarter way of developing enterprise applications and to a more modern, efficient, scalable, cloud-based architectures.

So, what capabilities do Datacenters need to innovate towards?

digital_datacenter

                                         The legacy Datacenter transitions to the Digital Datacenter

While, the illustration above is self explanatory. Enterprise IT will need to majorly embrace Cloud Computing – whatever forms the core offering may take – public, private or hybrid. The compute infrastructure ranging from a mix of open source virtualization to Linux containers. Containers essentially virtualize the operating system so that multiple workloads can run on a single host, instead of virtualizing a server to create multiple operating systems. These containers are easily ported across different servers without the need for reconfiguration and require less maintenance because there are fewer operating systems to manage. For instance, the OpenStack Cloud Project specifies Docker (a defacto standard), a Linux format for containers that’s designed to automate the deployment of applications as highly portable, self-sufficient containers.

Cloud computing will also enable the rapid scale up & scale down across the gamut of infrastructure (compute – VM/Baremetal/Containers, storage – SAN/NAS/DAS, network – switches/routers/Firewalls etc) in near real-time (NRT). Investments in SDN (Software Defined Networking) will be de riguer in order to improve software based provisioning, network, time to market and to drive network equipment costs down. The other vector that brings about datacenter innovation is around automation i.e vastly reducing manual efforts in network and application provisioning. These capabilities will be key as the vast majority of digital applications are deployed as Software as a service (SaaS).

An in depth discussion of these Software Defined capabilities can be found at the below blogpost.

Financial Services IT begins to converge towards Software Defined Datacenters..

Applications developed for a Digital infrastructure will be developed as small, nimble processes that communicate via APIs and over infrastructure like service mediation components (e.g Apache Camel). These microservices based applications will offer huge operational and development advantages over legacy applications. While one does not expect legacy but critical applications that still run on mainframes (e.g. Core Banking, Customer Order Processing etc) to move over to a microservices model anytime soon, customer facing applications that need responsive digital UIs will definitely move.

Which finally brings us to the most important capability of all – Data. The heart of any successful Digital implementation is Data. The definition of Data includes internal data (e.g. customer data, data about transactions, customer preferences data), external datasets & other relevant third party data (e.g. from retailers) etc.  While each source of data may not radically change an application’s view of its customers, the combination of all promises to do just that.

The significant increases in mobile devices and IoT (Internet of Things) capable endpoints will ensure exponential increases in data volumes will occur. Thus Digital applications will need to handle this data – not just to process it but also to be able to glean real time insights.  Some of the biggest technology investments in ensuring a unified customer journeys are in the areas of Big Data & Predictive Analytics. Enterprises should be able to leverage a common source of data that transcends silos (a data lake) to be able to drive customer decisions that drive system behavior in real time using advanced analytics such as Machine Learning techniques, Cognitive computing platforms etc which can provide accurate and personalized insights to drive the customer journey forward.

Can Datacenters incubate innovation ?

Finally, one of the key IT architectural foundation strategies companies need to invest in is modern application development. Gartner calls such a feasible approach “Bimodal IT”. According to Gartner, “infrastructure & operations leaders must ensure that their internal data centers are able to connect into a broader hybrid topology“.[2]  Let us consider Healthcare – a reasonably staid vertical as an example. In a report released by EY, “Order from Chaos – Where big data and analytics are heading, and how life sciences can prepare for the transformational tidal wave,” [1] the services firm noted that an agile environment can help organizations create opportunities to turn data into innovative insights. Typical software development life cycles that require lengthy validations and quality control testing prior to deployment can stifle innovation. Agile software development, which is adaptive and is rooted in evolutionary development and continuous improvement, can be combined with DevOps, which focuses on the the integration between the developers and the teams who deploy and run IT operations. Together, these can help life sciences organizations amp up their application development and delivery cycles. EY notes in its report that life sciences organizations can significantly accelerate project delivery, for example, “from three projects in 12 months to 12 projects in three months.”

Finally, Big Data has evolved to enable the processing of data in a batch, interactive, low latency manner depending on the business requirements – which is a massive gain for Digital projects. Big Data and DevOps will both go hand in hand to deliver new predictive capabilities.

Further, business can create digital models of client personas and integrate these with predictive analytic tiers in such a way that an API (Application Programming Interface) approach is provided to integrate these with the overall information architecture.

Conclusion..

More and more organizations are adopting a Digital first business strategy.  The current approach as in vogue – to treat these as one-off, tactical project investments – does not simply work or scale anymore. There are various organizational models that one could employ from the standpoint of developing analytical maturity. These ranging from a shared service to a line of business led approach. An approach that I have seen work very well is to build a Digital Center of Excellence (COE) to create contextual capabilities, best practices and rollout strategies across the larger organization.

References –

[1] E&Y – “Order From Chaos” http://www.ey.com/Publication/vwLUAssets/EY-OrderFromChaos/$FILE/EY-OrderFromChaos.pdf

[2] Gartner – ” Five Reasons Why a Modern Data Center Strategy Is Needed for the Digital World” – http://www.gartner.com/newsroom/id/3029231

Why Big Data & Advanced Analytics are Strategic Corporate Assets..

The industry is all about Digital now. The explosion in data storage and processing techniques promises to create new digital business opportunities across industries. Business Analytics concerns itself from deriving insights from data that is produced as a byproduct of business operations as well as external data that reflects customer insights. Due to their critical importance in decision making, Business Analytics is now a boardroom matter and not just one confined to the IT teams. My goal in this blogpost is to quickly introduce the analytics landscape before moving on to the significant value drivers that only Predictive Analytics can provide.

The Impact of Business Analytics…

The IDC “Worldwide Big Data and Analytics Spending Guide 2016”, predicts that the big data and business analytics market will grow from $130 billion by the end of this year to $203 billion by 2020[1] at a   compound annual growth rate (CAGR) of 11.7%. This exponential growth is being experienced across industry verticals such as banking & insurance, manufacturing, telecom, retail and healthcare.

Further, during the next four years, IDC finds that large enterprises with 500+ employees will be the main driver in big data and analytics investing, accounting for about $154 billion in revenue. The US will lead the market with around $95 billion in investments during the next four years – followed by Western Europe & the APAC region [1].

The two major kinds of Business Analytics…

When we discuss the broad topic of Business Analytics, it needs to be clarified that there are two major disciplines – Descriptive and Predictive. Industry analysts from Gartner & IDC etc. will tell you that one also needs to widen the definition to include Diagnostic and Prescriptive. Having worked in the industry for a few years, I can safely say that these can be subsumed into the above two major categories.

Let’s define the major kinds of industrial analytics at a high level –

Descriptive Analytics is commonly described as being retrospective in nature i.e “tell me what has already happened”. It covers a range of areas traditionally considered as BI (Business Intelligence). BI focuses on supporting operational business processes like customer onboarding, claims processing, loan qualification etc via dashboards, process metrics, KPI’s (Key Performance Indicators). It also supports a basic level of mathematical techniques for data analysis (such as trending & aggregation etc.) to infer intelligence from the same.  Business intelligence (BI) is a traditional & well established analytical domain that essentially takes a retrospective look at business data in systems of record. The goal of the Descriptive disciplines is to primarily look for macro or aggregate business trends across different aspects or dimensions such as time, product lines, business units & operating geographies.

  • Predictive Analytics is the forward looking branch of analytics which tries to predict the future based on information about the past. It describes what “can happen based on the patterns in data”. It covers areas like machine learning, data mining, statistics, data engineering & other advanced techniques such as text analytics, natural language processing, deep learning, neural networks etc. A more detailed primer on both along with detailed use cases are found here –

The Data Science Continuum in Financial Services..(3/3)

The two main domains of Analytics are complementary yet different…

Predictive Analytics does not intend to, nor will it, replace the BI domain but only adds significant sophisticated analytical capabilities enabling businesses to be able to do more with all the data they collect. It is not uncommon to find real world business projects leveraging both these analytical approaches.

However from an implementation standpoint, the only common area of both approaches is knowledge of the business and the sources of data in an organization. Most other things about them vary.

For instance, predictive approaches both augment & build on the BI paradigm by adding a “What could happen” dimension to the data.

The Descriptive Analytics/BI workflow…

BI projects tend to follow a largely structured process which has been well defined over the last 15-20 years. As the illustration below describes it, data produced in operational systems is subject to extraction, transformation and eventually is loaded into a data warehouse for consumption by visualization tools.

descriptive_analytics

                                                                       The Descriptive Analysis Workflow 

Descriptive Analytics and BI add tremendous value to well defined use cases based on a retrospective look at data.

However, key challenges with this process are –

  1. the lack of a platform to standardize and centralize data feeds leads to data silos which cause all kinds of cost & data governance headaches across the landscape
  2. complying with regulatory initiatives (such as Anti Money Laundering or Solvency II etc.) needs the warehouse to handle varying types of data which is a huge challenge for most of the EDW technologies
  3. the ability to add new & highly granular fields to the data feeds in an agile manner requires extensive relational modeling upfront to handle newer kinds of schemas etc.

Big Data platforms have overcome past shortfalls in security and governance and are being used in BI projects at most organizations. An example of the usage of Hadoop in classic BI areas like Risk Data Aggregation are discussed in depth at the below blog.

http://www.vamsitalkstech.com/?p=2697

That being said, BI projects tend to follow a largely structured process which has been well defined over the last 20 years. This space serves a large existing base of customers but the industry has been looking to Big Data as a way of constructing a central data processing platform which can help with the above issues.

BI projects are predicated on using an EDW (Enterprise Data Warehouse) and/or RDBMS (Relational Database Management System) approach to store & analyze the data. Both these kinds of data storage and processing technologies are legacy in terms of both the data formats they support (Row-Column based) as well as the types of data they can store (structured data).

Finally, these systems fall short of processing data volumes generated by digital workloads which tend to be loosely structured (e.g mobile application front ends, IoT devices like sensors or ATM machines or Point of Sale terminals), & which need business decisions to be made in near real time or in micro batches (e.g detect credit card fraud, suggest the next best action for a bank customer etc.) and increasingly cloud & API based to save on costs & to provide self-service.

That is where Predictive Approaches on Big Data platforms are beginning to shine and fill critical gaps.

The Predictive Analytics workflow…

Though the field of predictive analytics has been around for years – it is rapidly witnessing a rebirth with the advent of Big Data. Hadoop ecosystem projects are enabling the easy ingestion of massive quantities of data thus helping the business gather way more attributes about their customers and their preferences.

data_science_process

                                                                    The Predictive Analysis Workflow

The Predictive Analytics workflow always starts with a business problem in mind. Examples of these would be “A marketing project to detect which customers are likely to buy new products or services in the next six months based on their historical & real time product usage patterns – which are denoted by x, y or z characteristics” or “Detect real-time fraud in credit card transactions.”

In use cases like these, the goal of the data science process is to be able to segment & filter customers by corralling them into categories that enable easy ranking. Once this is done, the business is involved to setup easy and intuitive visualization to present the results.

A lot of times, business groups have a hard time explaining what they would like to see – both data and the visualization. In such cases, a prototype makes things easier from a requirements gathering standpoint.  Once the problem is defined, the data scientist/modeler identifies the raw data sources (both internal and external) which comprise the execution of the business challenge.  They spend a lot of time in the process of collating the data (from Oracle/SQL Server, DB2, Mainframes, Greenplum, Excel sheets, external datasets, etc.). The cleanup process involves fixing a lot of missing values, corrupted data elements, formatting fields that indicate time and date etc.

The data wrangling phase involves writing code to be able to join various data elements so that a single client’s complete dataset is gathered in the Data Lake from a raw features standpoint.  If more data is obtained as the development cycle is underway, the Data Science team has no option but to go back & redo the whole process. The modeling phase is where algorithms come in – these can be supervised or unsupervised. Feature engineering takes in business concepts & raw data features and creates predictive features from them. The Data Scientist takes the raw & engineered features and creates a model using a mix of various algorithms. Once the model has been repeatedly tested for accuracy and performance, it is typically deployed as a service. Models as a Service (MaaS) is the Data Science counterpart to Software as a Service. The MaaS takes in business variables (often hundreds of inputs) and provides as output business decisions/intelligence, measurements, & visualizations that augment decision support systems.

 How Predictive Analytics changes the game…

Predictive analytics can bring about transformative benefits in the following six ways.

  1. Predictive approaches can be applied to a much wider & richer variety of business challenges thus enabling an organization to achieve outcomes that were not really possible with the Descriptive variety. For instance, these use cases range from Digital Transformation to fraud detection to marketing analytics to IoT (Internet of things) across industry verticals. Predictive approaches are real-time and not just batch oriented like the Descriptive approaches.
  2. When deployed strategically – they can scale to enormous volumes of data and help reason over them reducing manual costs.  It can take on problems that can’t be managed manually because of the huge amount of data that must be processed.
  3. They can predict the results of complex business scenarios by being able to probabilistically predict different outcomes across thousands of variables by perceiving minute dependencies between them. An example is social graph analysis to understand which individuals in a given geography are committing fraud and if there is a ring operating
  4. They are vastly superior at handling fine grained data of manifold types than can be handled by the traditional approach or by manual processing. The predictive approach also encourages the integration of previously “dark” data as well as newer external sources of data.
  5. They can also suggest specific business actions(e.g. based on the above outcomes) by mining data for hitherto unknown patterns. The data science approach constantly keeps learning in order to increase its accuracy of decisions
  6. Data Monetization–  they can be used to interpret the mined data to discover solutions to business challenges and new business opportunities/models

References

[1] IDC Worldwide Semiannual Big Data and Business Analytics Spending Guide – Oct 2016 “Double-Digit Growth Forecast for the Worldwide Big Data and Business Analytics Market Through 2020 Led by Banking and Manufacturing Investments, According to IDC

http://www.idc.com/getdoc.jsp?containerId=prUS41826116