Here Is What Is Causing The Great Brick-And-Mortar Retail Meltdown of 2017..(1/2)

Amazon and other pure plays are driving toward getting both predictive and prescriptive analytics. They’re analyzing and understanding information at an alarming rate. Brands have pulled products off of Amazon because they’re learning more about them than the brands themselves.” — Todd Michaud, Founder and CEO of Power Thinking Media

By April 2017,17 major retailers announced plans to close stores (Image Credit: Clark Howard)

We are barely halfway through 2017, and the US business media is rife with stories of major retailers closing storefronts. The truth is inescapable that the Retail industry is in the midst of structural change. According to a research report from Credit Suisse, around 8,600 brick-and-mortar stores will shutter their doors in 2017. The number in 2016 was 2,056 stores and 5,077 in 2015 which points to industry malaise [1].  It is clear that in the coming years, Retailers across the board will remain under pressure due to a variety of changes – technological, business model and demographic. So what can Retailers do to compete?

Six takeaways for Retail Industry watchers..

Six takeaways that should have industry watchers take notice from the recent headlines –

  1. The brick and mortar retail store pullback has accelerated in 2017 – an year of otherwise strong economic expansion. Typical consumer indicators that influence consumer spending on retail are generally pointing upwards. Just sample the financial data – the US has seen increasing GDP for eight straight years, the last 18 months have seen wage growth for middle & lower income Americans and gas prices are at all time lows.[3] These kinds of trends should not usually presage a slowdown in physical storefronts due to declining consumer affordability.
  2. The retailers that have either declared bankruptcy or announced large scale store closings include marquee names across the different areas of retail. Ranging from Apparel to Home Appliances to Electronics to Sporting Goods. Sample the names – Sports Authority, RadioShack, HHGregg, American Apparel, Bebe Stores, Aeropostale, Sears, Kmart, Macy’s, Payless Shoes, JC Penney etc. So this is clearly a trend across various sectors in retail and not confined to a given area, for instance, women’s apparel.
  3. Some of this “Storefront Retail bubble burst” can definitely be attributed to hitherto indiscriminate physical retail expansion. The first indicator is the glut of residual excess retail space.  The WSJ points out that the retail expansion dates back almost 30 years ago when retailers began a “land grab” to open more stores – not unlike the housing boom a decade or so ago. [1] North America now has a glut of both retail stores and shopping malls while per capita sales has begun declining. The US especially has almost five times retail space per capita compared to the UK. American consumers are also swapping materialism for more experiences.[3] Thus, there are much deeper issues here than just an over-buildout of retail space especially changing consumer preferences.
  4. We now live in a world where online ‘single click’ shopping is the dominant retail trend. This is evidenced by declining in-store Black Friday sales in 2016 when compared with increased Cyber Monday (online) sales. As online e-commerce volume increases year on year, online retailers led by Amazon are surely taking market share away from the brick-and mortar Retailer who has not kept up with the pace of innovation. The uptick in online retail is unmistakeable as evidenced by the below graph (src – ZeroHedge) depicting the latest retail figures. Department-store sales rose 0.2% on the month, but were down 4.5% from a year earlier. Online retailers such as Amazon, posted a 0.6% gain from the prior month and a 11.9% increase from a year earlier.[3]

    Retail Sales – Online vs In Store Shopping (credit: ZeroHedge)
  5. Legacy retailers are trying to play catch-up with the upstarts who excel at technology. This has sometimes translated into acquisitions of online retailers (e.g. Walmart’s buy of However, the Global top 10 Retailers are dominated by the likes of Walmart, Costco, the Kroger, Walgreens etc. Amazon comes in only at #10 which implies that this battle is only in it’s early days. However, legacy retailers are saddled by huge fixed costs, investors who prefer dividends to investments in innovations. their CEOs are incentivized to focus on the next quarter, not the next decade like Amazon’s Jeff Bezos. Though traditional retailers have begun accelerating investments in Cloud Computing, Big Data and Predictive Analytics – the web scale majors such as Amazon are far far ahead of typical Retail IT shop.

  6. The fastest growing Retail industry brands are companies that use Data as a core business capability to impact the customer experience versus as just another component of an overall IT system. Retail is a game of micro customer interactions that drive sales and margin. This implies a retailer’s ability to work with realtime customer data – whether it’s sentiment data, clickstream data and historical purchase data to drive marketing promotions, order fulfillment, show-rooming, loyalty programs etc. On the back end, the ability to streamline operations by pulling together data from operations, supply chains are helping retailers fine-tune their operations especially from a product delivery standpoint.

    In Retail, Data Is King..

    So, what makes Retail a very different unique in terms of it’s data needs? I posit that there are four important characteristics –

    • First and foremost, Retail customers esp millenials are very open about sharing their brand preferences and experiences on social media. There is a treasure trove of untapped data out there.
    • Secondly, leaders such as Amazon use data and a range of other technology capabilities to shape the customer experience versus the other way around for traditional retailers. They do this based on predictive analytic approaches such as machine learning and deep learning. Case in point is Amazon which has now morphed from an online retailer to a Cloud Computing behemoth with it’s market leading AWS (Amazon Web Services). In fact it’s best in class IT enabled it to experiment with retail business models. E.g. The Amazon Prime subscription at $99-a-year Amazon Prime subscription, which includes free two delivery, music and video streaming service that competes with Netflix. As of March 31, 2017 Amazon had 80 million Prime subscribers in the U.S , an increase of 36 percent from a year earlier, according to Consumer Intelligence Research Partners.[3]
    • Thirdly, Retail organizations need to begin relying on data to drive realtime insights about customers, supply chains and inventory.
    • Fourth, Retail needs to begin aggressively adopting IoT. This implies tapping and analyzing data from in store beacons, sensors and actuators.

      ..because it enables new business models..

      None of the above analysis claims that physical stores are going away. They serve a very important function in allowing consumers a way to try on products and allowing for the human experience. However, online definitely is where the growth primarily will be.

      The Next and Final Post in this series..

      It is very clear from the above that it now makes more sense to talk about a Retail Ecosystem which is composed of store, online, mobile and partner storefronts.

      In that vein, the next post in this two part series will describe the below four progressive strategies that traditional Retailers can adopt to survive and favorably compete in today’s competitive (and increasingly online) marketplace.

      These are –

    • Reinventing Legacy IT Approaches – Adopting Cloud Computing, Big Data and Intelligent Middleware to re-engineer Retail IT

    • Changing Business Models by accelerating the adoption of Automation and Predictive Analytics – Increasing Automation rates of core business processes and infusing them with Predictive intelligence thus improving customer and business responsiveness

    • Experimenting with Deep Learning Capabilities  –the use of Advanced AI such as Deep Neural Nets to impact the entire lifecycle of Retail

    • Adopting a Digital or a ‘Mode 2’ Mindset across the organization – No technology can transcend a large ‘Digital Gap’ without the right organizational culture

      Needless to say, the theme across all of the above these strategies is to leverage Digital technologies to create immersive cross channel customer experiences.


[1] WSJ – ” Three hard lessons the internet is teaching traditional stores” –

[2] The Atlantic  – “The Retail Meltdown”

[3] WSJ – ” Retail Sales fall for the second straight month”

Hadoop is Not Failing, it is the Future of Data..

Madam..What use is a new-born baby?”’ – Michael Faraday – (Apocryphal quote) when asked about the utility of electricity a new invention in the 1800s…

Source – DataEconomy

Why Hadoop Is Thriving and Will Continue to do so…

As my readers are aware I have been heavily involved in the Big Data space for the last two years. This time has been an amazing and transformative personal experience as I have been relentlessly traveling the globe advising global banking leaders across all continents.

Thus, it should come as no surprise that the recent KDNuggets article somewhat provocatively titled   – “Hadoop is Failing – Why” – managed get me disagreeing right from the get go .

The author, though well meaning from what I can tell, bases the article on several unfounded assumptions. Before we delve into those, let us consider the following background.

The onset of Digital Architectures in enterprise businesses implies the ability to drive continuous online interactions with global consumers/customers/clients or patients. The goal is not just provide engaging visualization but also to personalize services customers care about – while working across multiple channels/modes of interaction. Mobile applications first began forcing the need for enterprise applications  to support multiple channels of interaction with their consumers. For example Banking now requires an ability to engage consumers in a seamless experience across an average of four to five channels – Mobile, eBanking, Call Center, Kiosk etc. Healthcare is a close second where caregivers expect patient, medication & disease data at their fingertips with a few finger swipes on an iPad app.Big Data technology evolved to overcome the limitations of existing data approaches (RDBMS & EDW)  to keep up data architecture & analysis challenges inherent in the Digital application stack.

These challenges include –

  1. The challenge of data volume explosion – please read blog below for a detailed discussion.

  2. The amazing data variety enterprises are now forced to deal with, traveling at high velocity

  3. Surely Hadoop has it’s own technical constraints -the ability to support low latency BI (Business Intelligence) queries for one. However, the sheer inability of pre-Hadoop approaches to scale with exploding data ingest and management of massive data caused two business challenges for Digital Architectures. The first challenge is the ability to glean real time insights from vast streams of (structured & unstructured) data flowing into enterprise architectures. The second is it’s ability to work with advanced analytics – Predictive Analytics and Deep Learning – at fast speeds (quite often tens of thousands to tens of millions of messages per second) enables the ability to solve complex problems across domains. Hadoop only turns these challenges into business opportunities for efficient adopters.

Why the Darwinian Open Source Ecosystem ensures Hadoop is a robust and mature technology platform 

Big Data is backed by the open source community with most Hadoop ecosystem technology (25+ projects) incubated, developed and maintained in the Apache ecosystem. The Open Source community is inherently Darwinian in nature.  Its focus on code quality, industry adoption, a concrete roadmap and committers means that If a project lacks  then it is for sure headed for the graveyard.  Put another way, there can be no stragglers in this ecosystem.

Let us now consider the chief assumptions made by the author in the above article.

Assumption 1  –  Hadoop adoption is staying flat at best

The best part of my job is working with multiple customers daily on their business initiatives and figuring out how to apply technology to solving these complex challenges. I can attest that adoption in the largest enterprise is anything but stagnating. While my view is certainly anecdotal and confined to the four walls of one company, adoption is indeed skyrocketing at verticals like Banking, Telecom, Manufacturing & Insurance. The early corporate movers working with the leading vendors, have more or less figured out the kinks in the technology as applied to their business challenges. The adoption patterns are maturing and they are realizing massive business value from it. A leading vendor, Hortonworks, moved to $100 million annual revenues quicker than any other tech startup – which is testament to the potential of this space. Cloudera just went public. All this growth has been accompanied by somewhat declining revenues & stock prices at leading EDW vendors. I forecast that the first Big Data ‘startup’ to $1 billion in revenue will happen over the next five-seven years, at a somewhat faster rate compared to the revered open source pioneer Red Hat. At a minimum Hadoop projects cut tens of millions of dollars from costly and inflexible enterprise data warehouse projects. Nearly every large organization has begun deploying Hadoop as an Enterprise Landing Zone (ELZ) to augment an EDW.

Assumption 2  – The business value of projects created using Hadoop is unclear

The author has a point here but let me explain why this is an organizational challenge and not really the fault of any technology stack – Middleware or Cloud or Big Data. The challenge is that it is often a fine art to figure out the business value of Big Data projects working across complex organizational structures. IT groups can surely start POCs as science or “one-off resume builder” projects but the lines of business need to get involved from the get go sooner than any other technology category. Big Data isn’t about the infrastructural plumber’s job of storing massive volumes of data but really about creating business analytics on the data collected and curated. Whether those analytics are simply old school BI or Data Science oriented depends on the culture and innovativeness of an organization.

Organizations are using Big Data to not only solve existing business challenges (sell more products, detect fraud, run risk reports etc) but also to rapid experiment with new business models using the insights gleaned from Big Data Analytics. It falls to the office of an enlightened CDO (Chief Data Officer) to own the technology, create the appropriate internal costing models and to onboard lines of business (LOBs) projects into the data lake.

There are two questions every CDO needs to ask at the outset –

  • What business capabilities are going to be enabled across the organization?
  • What aspects of digital transformation can be enabled best by Big Data?

Assumption 3  – Big Data is only valid technical solution for massive data volumes in the Petabytes (PBs). 

The author writes ‘You don’t need Hadoop if you don’t really have a problem of huge data volumes in your enterprise, so hundreds of enterprises were hugely disappointed by their useless 2 to 10TB Hadoop clusters – Hadoop technology just doesn’t shine at this scale.’

This could not be further from the observed reality for three reasons.

Firstly, most of the projects in the terabyte (TB) range exist as tenants in larger clusters The real value of data lakes is being able to build out cross organizational data repositories that were simply too expensive or too hard to build before. Once you have all the data in one place, you can mashup it up, analyze it in ways heretofore unknown.

Secondly, as I’ve covered in the below post, many players are leveraging Big Data to gain the crucial “speed” advantage while working with TBs of data.

Thirdly, I recommend that every client start ‘small’ and use a data lake to serve as an Enterprise Landing Zone (ELZ) for data produced as a result of regular business operations. Hadoop clusters not only serve as cheap storage but also perform a range of rote but compute intensive data processing tasks (data joining, sorting, segmentation, binning etc etc) that saves the EDW from a range of taxing operations.

Assumption 4  – Hadoop skills are hard to find.

In the author’s words – “..while 57% said that the skills gap was the major reason, a number that is not going to be corrected overnight. This coincides with findings from Indeed who tracked job trends with ‘Hadoop Testing’ in the title, with the term featured in a peak of 0.061% of ads in mid 2014, which then jumped to 0.087% in late 2016, an increase of around 43% in 18 months. What this may signal is that adoption hasn’t necessarily dropped to the extent that anecdotal evidence would suggest, but companies are simply finding it difficult to extract value from Hadoop from their current teams and they require greater expertise.”

The skills gap is real and exists in three primary areas – Data Scientists, Data Engineers and Hadoop Administrators.

However this is nothing unique to Hadoop and is common with every new technology. Companies need to bridge this by augmenting the skills of internal staff, working with the Global Systems Integrators (GSI) who have all added Big Data practice areas and by engaging with academia. In fact, the prospect of working on Big Data projects can attract talent to the organization.

How Should large Organizations proceed on their Big Data journey?

So what are the best practices to avoid falling into the “Big Data does not provide value” trap?

  • Ensuring that Big Data and a discussion of it’s business and technical capabilities are conducted at the highest levels. Big Data needs to be part of an organizations DNA at the highest levels and should be discussed in the context of the other major technology forces driving industry – Cloud, Mobility, DevOps, Social, APIs etc .
  • Creating or constituting a team under the CDO (Chief Data Officer). Teams can be both physical, virtual and need to take into account organizational politics
  • Creating a COE (Center of Excellence) or such federated approach where central team works with lines of business IT on these projects
  • As part of the COE, institute a process to onboard the latest skills
  • Instituting appropriate governance and project oversight
  • Identifying key business metrics that will drive Big Data projects. This blog has covered many such areas but these include detailed analyses on expected growth acceleration, cost reduction, risk management and enabling competitive advantage.
  • Engaging the lines of business to develop these capabilities in an iterative manner. Almost all successful Big Data projects are delivered in a DevOps fashion.


The Big Data ecosystem and Hadoop technology provide mature, stable and feature rich platforms for global vertical organizations to implement complex Digital projects. However, the technology maturity is only a necessary factor. The ability of the organization in terms of an innovation oriented mindset is key in driving internal change. So is inculcating a learning mindset across business leadership, IT teams, internal domain experts and management. The universal maxim – “one gets only out of something as much as they put into it”  is more truer than ever with Big Data. While it is easy to blame a technology or a vendor or lack of skilled personnel for perceived project failures, one should guard against a status quo-ist mindset. You can rest assured that your competition are not sitting still.

Why Big Data Analytics is the Future of CRM..

A  question that I get a lot from customers is around how Big Data can help augment CRM systems. The answer isn’t just about the ability to aggregate loads of information to produce much richer views of the data but also about feeding this data to produce richer digital analytics. 

Why Combine CRM with Big Data

Customer Relationship Management (CRM) systems primarily resolve around customer information and captures a customers interactions with a company.  The strength of CRM systems is their ability to work with structured data such as customer demographic information (Name, Identifiers, Address, product history etc)

Industry customers will want to use their core CRM customer profiles as a foundational capability and then augment it with additional data as shown in the below diagram –

  1. Core CRM Records as shown at the bottom layers storing structured customer contact data
  2. Extended attribute information from MDM systems,
  3. Customer Experience Data such as Social (sentiment, propensity to buy), Web clickstreams, 3rd party data, etc. (i.e. behavioral, demographics, lifestyle, interests, etc).
  4. Any Linked accounts for customers
  5. The ability to move to a true Customer 360 or Single View
How Big Data Can Augment CRM systems (credit – Mike Ger)

All of these non traditional data streams shown above and depicted below can be stored on commodity hardware clusters. This can be done at a fraction of the cost of traditional SAN storage. The combined data can then be analyzed effectively in near real time thus providing support for advanced business capabilities.

The Seven Kinds of Non Traditional Data that has become prevalent over the last five years

Seven Common Business Capabilities

Once all of this data has been ingested into a datalake from CRM systems, Book Of Record Transaction Systems (BORT), unstructured data sources etc the following kinds of analysis are performed on it. Big Data based on Hadoop can help join CRM data (customer demographics, sales information, advertising campaign info etc) with additional data. This rich view of a complete dataset can provide the below business capabilities.

  • Customer Segmentation– For a given set of data, predict for each individual in a population, a discrete set of classes that this individual belongs to. An example classification is – “For all retail banking clients in a given population, who are most likely to respond to an offer to move to a higher segment”.
  • Pattern recognition and analysis – discover new combinations of business patterns within large datasets. E.g. combine a customers structured data with clickstream data analysis. A major bank in NYC is using this data to settle mortgage loans.
  • Customer Sentiment analysis is an technique used to find degrees of customer satisfaction and how to improve them with a view of increasing customer net promoter scores (NPS).
  • Market basket analysis is commonly used to find out associations between products that are purchased together with a view to improving marketing products. E.g Recommendation engines which to understand what banking products to recommend to customers.
  • Regression algorithms aim to characterize the normal or typical behavior of an individual or group within a larger population. It is frequently used in anomaly detection systems such as those that detect AML (Anti Money Laundering) and Credit Card fraud.
  • Profiling algorithms divide data into groups, or clusters, of items that have similar properties.
  • Causal Modeling algorithms attempt to find out what business events influence others.

Four business benefits in combining Big Data with CRM Systems –

  1. Hadoop can make CRM systems more efficient and cost effective – Most CRM technology is based on an underlying relational database or enterprise data warehouse. These legacy data storage technologies suffer from data collection delays and processing challenges. Hadoop with it’s focus on Schema On Read (SOR) and parallelism can enable low cost storage combined with efficient processing
  2. This integration can focus on improving customer experience – Combining past interactions with historical data across both systems can provide a realtime single view of a customer thus helping agents work better with their customer.
  3. Combine Data in innovative ways to create new products – Once companies have deep insights into customer behavior and purchasing patterns, they can combine the data to create or modify existing service and products.
  4. Gain Realtime insights –  Online transactions are increasing in number year on year. The onset of Digital Architectures in enterprise businesses implies the ability to drive continuous online interactions with global consumers/customers/clients or patients. The goal is not just provide engaging visualization but also to personalize services clients care about across multiple channels of interaction. The only way to attain digital success is to understand your customers at a micro level while constantly making strategic decisions on your offerings to the market. This essentially means operating in a real time world – which leads to Big Data.

To Sum Up…

Combining CRM with Big Data can help maximize competitive advantage across every industry vertical. These advantages not only stem from cheaply storing and analyzing vastly richer data. These business insights are deployed in areas such as marketing, customer service and new product ideation.

Demystifying Digital – Reference Architecture for Single View of Customer / Customer 360..(3/3)

The first post in this three part series on Digital Foundations @ introduced the concept of Customer 360 or Single View of Customer (SVC).  This second post in the series discussed the concept of Customer Journey Mapping (CJM) – . We discussed specific benefits from both a business & operational standpoint that are enabled by SVC & CJM. The third & final post will focus on a technical design & architecture needed to achieve both these capabilities.

Business Requirements for Single View of Customer & Customer Journey Mapping…

The following key business requirements need to be supported for three key personas- Customer, Marketing & Customer Service – from a SVC and CJM standpoint.

  1. Provide an Integrated Experience: A fully integrated omnichannel experience for both the customer and internal stakeholder (marketing, customer service, regulatory, managerial etc) roles. This means a few important elements – consistent information across all touchpoints, the right information to the right user at the right time, an ability to view the CJM graph with realtime metrics on Customer Lifetime Value (CLV) etc.
  2. Continuously Learning Customer Facing System: An ability for the customer facing portion of the architecture to learn constantly to fine-tune it’s understanding of the customers real time picture. This includes an ability to understand the customer’s journey.
  3. Contextual yet Seamless Movement across Channels: The ability for customers to transition seamlessly from one channel to the other while conducting business transactions.
  4. Ability to introduce Marketing Programs for existing Customers: An ability to introduce marketing and customer retention and other loyalty programs in a dynamic manner. These include and ability to combine historical data with real time data about customer interactions and other responses like clickstreams – to provide product recommendations and real time offers.
  5. Customer Acquisition: An ability to perform low cost customer acquisition and to be able to run customized offers for segments of customers from a back-office standpoint.

Key Gaps in existing Single View (SVC) Architectures ..

It needs to be kept in mind that every organization is different from an IT legacy investment and operational standpoint. As such, a “one-size-fits-all” architecture is impossible to create. However, highlighted below are some common key data and application architecture gaps that I have observed from a data standpoint while driving to a SVC (Single View of Customer) with multiple leading enterprises.

  1. The lack of a single, unique & global customer identifier – The need to create a single universal customer identifier (based on various departmental or line of business identifiers) and to use it as a primary key in the customer master list
  2. Once the identifier is created in either the source system or in the datalake, organizations need to figure out a way to cascade that identifier into the Book of Record systems (CRM systems, webapps and ERP systems) so that the architecture can begin knitting together a single view of the customer. This may also involve periodically go out across the BOR systems, link all the customers data and pull the data into the lake;
  3. Many companies deal with multiple customer on-boarding systems. At some point, these  on-boarding processes need  to be centralized. For instance in Banking esp In Capital markets, customer on-boarding done in six or seven different areas; all of these ideally need to be consolidated into one.
  4. Graph Data Semantics – Once created, the Master Customer identifier should be mapped to all the other identifiers lines of business use to uniquely identify their customer; the ability to use simple or more complex matching techniques (Rule based matching, machine learning based matching & search based matching) is highly called for.
  5. MDM (Master Data Management) systems have traditionally automated some of this process by creating & owning that unique customer identifier. However Big Data capabilities help by linking that unique customer identifier to all the other ways the customer may be mapped across the organization. To this end,  data may be exported into an MDM system backed by a traditional RDBMS; or; the computation of the unique identifier can be done in a data lake and then exported into an MDM system.

Let us discuss the generic design of the architecture (depicted above) with a focus on the following subsystems –

A Reference Architecture for Single View of Customer/ Customer 360
  1. At the very top, different channels depict with different touch points In today’s connected world, the customer experience spans multiple different touch points throughout the customer lifecycle. A customer should be able to move through multiple different touch points during the buying process. Customers should be able to start, pause transactions (e.g. An Auto Loan application) from one channel and restart/complete them from another.
  2. A Big Data enabled application architecture is chosen. This needs to account for two different data processing paradigms. The first is a realtime component. The architecture must be capable of handling events within a few milliseconds. The second is an ability to handle massive scale data analysis in a retrospective manner. Both these components are provided by a Hadoop stack. The real time component leverages – Apache NiFi, Apache HBase, HDFS, Kafka, Storm and Spark. The batch component leverages  HBase, Apache Titan, Apache Hive, Spark and MapReduce.
  3. The range of Book of Record and external systems send data into the central datalake. Both realtime and batch components highlighted above send the data into the lake. The design of the lake itself will be covered in more detail in the below section.
  4. Starting from the upper-left side, we have the Book of Record Systems sending across transactions. These are ingested into the lake using any of the different ingestion frameworks provided in Hadoop. E.g. Flume, Kafka, Sqoop, HDFS API for batch transfers etc.  The ingestion layer depicted is based on Apache NiFi and is used to load data into the data lake.  Functionally, it is made up of real time data loaders and end of day data loaders. The real time loaders load the data as it is created in the feeder systems, the EOD data loaders will adjust the data end of the day based on the P&L sign off and the end of day close processes.  The main data feeds for the system will be from the book of record transaction systems (BORTS) but there may also be multiple data feeds from transaction data providers and customer information systems.
  5. The UI Framework is standardized across all kinds of clients. For instance this could be an HTML 5 GUI Framework that contains reusable widgets that can be used for mobile and browser based applications.  The framework also need to deal with common mobile issues such as bandwidth and be able to automatically throttle the data back where bandwidth is limited.It also needs to facilitate the construction of large user defined pivot tables for ad hoc reporting. It utilizes UI framework components for its GUI construction and communicates with the application server via the web services layer.
  6. API access is also provided by Web Services for partner applications to leverage: This is the application layer that that provides a set of RESTful web services that control the GUI behavior and that control access to the persistent data and the data that is cached on the data fabric.
  7. The transactions are taken through the pipeline of enrichment and the profiles of customers are stored in HBase. .
  8. The core data processing platform is then based on a datalake pattern which has been covered in this blog before. It includes the following pattern of processing.
    1. Data is ingested real time into a HBase database (which uses HDFS as the underlying storage layer). Tables are designed in HBase to store the profile of a trade and it’s lifecycle.
    2. Producers are authenticated at the point of ingest.
    3. Once the data has been ingested into HDFS, it is taken through a pipeline of processing (L0 to L3) as depicted in the below blogpost.

  9. Speed Layer: The computational grid that makes up the Speed layer can be a distributed in memory data fabric like Infinispan or GemFire, or a computation process can be overlaid directly onto a stateful data fabric technology like Spark or GemFire. The choice is dependent of the language choices that have been made in building the other key analytic libraries. If multiple language bindings are required (e.g. C# & Java) then the data fabric will typically be a different product than the Grid.

Data Science for Customer 360

 Consider the following usecases that are all covered under Customer 360 –

  1. The ability to segment customers into categories based on granular data attributes
  2. Improve customer targeting for new promotions & increasing acquisition rate
  3. Increasing cross sell and upsell rates
  4. Understanding influencers among customer segments & helping these net promoters recommend products to other customers
  5. Performing market basket analysis of what products/services are typically purchased together
  6. Understanding customer risk profiles
  7. Creating realtime views of customer lifetime value (CLV)
  8. Reducing customer attrition

The obvious capability that underlies all of these is Data Science. Thus, Predictive Analytics is the key compelling paradigm that enables the buildout of the dynamic Customer 360.

The Predictive Analytics workflow always starts with a business problem in mind. Examples of these would be “A marketing project to detect which customers are likely to buy new products or services in the next six months based on their historical & real time product usage patterns – which are denoted by x,y or z characteristics” or “Detect realtime fraud in credit card transactions.” or “Perform certain algorithms based on the predictions”. In usecases like these, the goal of the data science process is to be able to segment & filter customers by corralling them into categories that enable easy ranking. Once this is done, the business is involved to setup easy and intuitive visualization to present the results. In the machine learning process, an entire spectrum of algorithms can be tried to solve such business problems.

A lot of times, business groups working on Customer 360 projects have a hard time explaining what they would like to see – both data and the visualization. In such cases, a prototype makes things way more easy from a requirements gathering standpoint.  Once the problem is defined, the data scientist/modeler identifies the raw data sources (both internal and external) which comprise the execution of the business challenge.  They spend a lot of time in the process of collating the data (from Oracle, DB2, Mainframe, Greenplum, Excel sheets, External datasets etc). The cleanup process involves fixing a lot of missing values, corrupted data elements, formatting fields that indicate time and date etc.

The Data Scientist working with the business needs to determine how much of this raw data is useful and how much of it needs to be massaged to create a Customer 360 view. Some of this data needs to be extrapolated to form the features using formulas – so that a model can be created. The models created often involve using languages such as R and Python.

Feature engineering takes in business features in the form of feature vectors and creates predictive features from them. The Data Scientist takes the raw features and creates a model using a mix of various algorithms. Once the model has been repeatedly tested for accuracy and performance, it is typically deployed as a service.

The transformation phase involves writing code to be able to to join up like elements so that a single client’s complete dataset is gathered in the Data Lake from a raw features standpoint.  If more data is obtained as the development cycle is underway,  the Data Science team has no option but to go back & redo the whole process.

Models as a Service (MaaS) is the Data Science counterpart to Software as a Service.The MaaS takes in business variables (often hundreds of them as inputs) and provides as output business decisions/intelligence, measurements, visualizations that augment decision support systems.

Once these models are deployed and updated nightly based on their performance – the serving layer takes advantage of them to drive real time 360 decisioning.

To Sum Up…

In this short series we have discussed that customers and data about their history, preferences, patterns of behavior, aspirations etc are the most important corporate asset. Big Data technology and advances made in data storage, processing and analytics can help architect a dynamic Single View that can help maximize competitive advantage across every industry vertical.

Hadoop in Finance: Big Data in the Pursuit of Big Bucks

I had the recent honor of being interviewed by Nick Ismail, editor at Information Age UK. We discussed IT trends driving financial services worldwide. For this week’s blogpost, I wanted to share the resulting article published in the magazine on 22 March, 2017 @ (  

Apache Hadoop (the freely available data crunching engine) can be used by financial institutions to grow their top lines while reducing compliance risks often associated with “sticky” financial relationships.

One of the great features of Hadoop is that it allows enterprises to store, analyse and share multiple data streams simultaneously, a fact that is immensely useful helping the users of this technology detect anomalies in their company’s output
One of the great features of Hadoop is that it allows enterprises to store, analyse and share multiple data streams simultaneously, a fact that is immensely useful helping the users of this technology detect anomalies in their company’s output.

There are very few industries that are as data-centric as banking and insurance. If there’s one thing that can be taken as certain in today’s rapidly changing financial world it’s that both the amount and the value of data collected is constantly increasing day by day.

Every interaction that a client or partner system has with a banking institution produces actionable data that has potential business value associated with it.

After the 2008 crisis, regulatory constraints have required the recording and reporting of more data than ever before, and, as capital and liquidity reserve requirements have also increased, the need to know exactly how much capital needs to be reserved, based on current exposures, is critical.

In order to keep up with market trends and run services more effectively, Apache Hadoop, a platform for storing and analysing massive amounts of data, is currently gaining momentum. Its market is forecasted to grow at a compound annual growth rate of 58%, surpassing $1 billion by 2020.

 As data becomes digital, financial institutions can no longer ignore the importance of harnessing information. Major Banks, insurers and capital markets are now able to leverage Hadoop for better decision making, fraud detection, new product development and accurate forecasts.

This article will delve into some of the various opportunities presented by Hadoop in the financial sector.

Distributed computing for smarter decision making

Hadoop is often used in the provision of financial services due to its power in both data processing and analysis. Banks, insurance companies and security firms rely on it to store and process huge amounts of data they accrue in the course of their employ, which would otherwise require expensive or cost prohibitive hardware and software licenses.

Large retail banks receive thousands of incoming applications for checking and savings accounts every week. Practice dictates that bankers would normally consult 3rd-party risk scoring services before opening an account, or granting a loan.

They can (and do) override do-not-open recommendations for applicants with poor banking histories. Many of these high-risk accounts overdraw and charge-off due to mismanagement or fraud, costing banks millions of dollars in losses. Some of this cost is passed on to the customers who responsibly manage their accounts.

Applications built on Hadoop can store and analyse multiple data streams and help, for example, regional bank managers control new account risk in their branches.

They can match banker decisions with the risk information presented at the time of decision and thereby control risk by highlighting if any individuals should be sanctioned, whether policies need updating, and whether patterns of fraud are identifiable.

Over time, the accumulated data informs algorithms that may detect subtle, high-risk behaviour patterns unseen by the bank’s risk analysts.

This is especially important to insurance industries who seek to gather actionable intel on their prospective clients quickly, to make better decisions concerning risk, especially in “sticky” financial relationships (like mortgages) which often last for more than 10 years.

Being successful in this means millions in savings a year, which can then be passed on to low-risk clientele or company shareholders. Hadoop also supports predictive modelling, allowing enterprises to quantify future risk and develop market strategies accordingly.

This information can also then be turned into publicly reportable intelligence, increasing the perceived expertise (and thus brand value) of the enterprise engaged in data analysis.

Turning data into fraud protection

Not only does Hadoop help build companies’ prestige, it also protects them from the professional malfeasance that could be detrimental to it. This especially is a factor that should be studied carefully by financial sector workers, since one of the main benefits they would obtain is the protection against fraud through their data analysis efforts.

Because of the massive capacity for storage in Data Lakes, extensive records can be constantly collated and updated, including what decisions were made, what risks were present at the time of decision, how internal policies on the issue have changed over time, and whether there have been emerging patterns of fraud.

This is crucial also because maintaining control over data and understanding how to query it are set to be massively important considerations for regulatory reporting and adherence. Banks need to understand their existing customer data to predict and modify mortgages as appropriate for borrowers in financial distress.

Tracking money laundering

Following the establishment of international anti-money laundering (AML) requirements to prevent the cross-border flow of funds to criminal and terrorist organisations, malicious actors have chosen to hide in the ever more complex world of trading, away from banks.

This is a global business valued at more than $18.3 trillion, formed of an intricate web of different languages and legal systems.

Due to storage limitations, leading institutions are unable to archive historical trading data logs, subsequently reducing the amount of information available for risk analysis until after close of business. Under these premises, this gap creates a window of time which would allow money laundering or rogue trading go unnoticed.

Hadoop is now able to provide unprecedented speed-to-analytics and an extended data retention timeline, allowing more visibility into all trading activities for a comprehensive and thorough scrutiny.

The trading risk group accesses this shared data lake to processes more position, execution and balance data. They can do this analysis on data from the current workday, and it is highly available for at least five years—much longer than before.

The bottom line

These benefits, ranging from faster data analysis to money-laundering protection, come from being able to act on hitherto undetectable evidence and customer patterns.

One of the great features of Hadoop is that it allows enterprises to store, analyse and share multiple data streams simultaneously, a fact that is immensely useful helping the users of this technology detect anomalies in their company’s output.

A regional bank manager, for example, might have to control new account risk in their branches, the indicators of which can be extremely subtle and nigh imperceptible to the average employee.

Using Hadoop, this information can be collected, scanned for anomalies over various branches, and sent by an automated ticker plant in real time to the relevant decision maker, saving time and improving operational margins.

All the factors outlined above have real world effects on the bottom line of modern institutions and this is only truer for the financial industry. The fact that big data has been warmly welcomed by the financial community only makes the competitive disadvantage felt by tech laggards more pronounced. The point is, data matters more and more, just as data surrounds us more and more, so it’s important that businesses accept and learn to navigate this environment.

Sourced by Vamsi Chemitiganti, GM financial services, Hortonworks

The Definitive Reference Architecture for Market Surveillance (CAT, UMIR and MiFiD II) in Capital Markets..

We have discussed the topic of market surveillance reporting to some depth in previous blogs. e.g.  Over the last decade, Global Financial Markets have embraced the high speed of electronic trading. This trend has only accelerated with the concomitant explosion in trading volumes. The diverse range of instruments & the proliferation of trading venues pose massive regulatory challenges in the area of market conduct supervision and abuse prevention. Banks, Broker dealers, Exchanges and other market participants across the globe are now shelling out millions of dollars in fines for failure to accurately report on market abuse violations. In response to this complex world of high volume & low touch electronic trading, global capital markets regulators have been hard at work across different jurisdictions & global hubs e.g. the FINRA in the US, the IROC in Canada and the ESMA in the European Union. Regulators have created extensive reporting regimes for surveillance with a view to detecting suspicious patterns of trade behavior (e.g, dumping, quote stuffing & non bonafide fake orders etc). The intent to increase market transparency on both the buy and the sell side. Based on the scrutiny Capital Markets players are under, a Big Data Analytics based architecture has become a “must-have” to ensure timely & accurate compliance with these mandates. This blog attempts to discuss such a reference architecture.

Business Technology Requirements for Market Surveillance..

The business requirements for the Surveillance architecture are covered at the below link in more detail but are reproduced below in a concise fashion.

A POV on European Banking Regulation.. MAR, MiFiD II et al

Some of the key business requirements that can be distilled from regulatory mandates include the below:

  • Store heterogeneous data – Both MiFiD II and MAR mandate the need to perform trade monitoring & analysis on not just real time data but also historical data spanning a few years. Among others this will include data feeds from a range of business systems – trade data, eComms, aComms, valuation & position data, order management systems, position management systems, reference data, rates, market data, client data, front, middle & back office, data, voice, chat & other internal communications etc. To sum up, the ability to store a range of cross asset (almost all kinds of instruments), cross format (structured & unstructured including voice), cross venue (exchange, OTC etc) trading data with a higher degree of granularity – is key.
  • Data Auditing – Such stored data needs to be fully auditable for 5 years. This implies not just being able to store it but also putting in place capabilities in place to ensure  strict governance & audit trail capabilities.
  • Manage a huge volume increase in data storage requirements (5+ years) due to extensive Record keeping requirements
  • Perform Realtime Surveillance & Monitoring of data – Once data is collected,  normalized & segmented, it will need to support realtime monitoring of data (around 5 seconds) to ensure that every trade can be tracked through it’s lifecycle. Detecting patterns that could perform surveillance for market abuse and monitor for best execution are key.
  • Business Rules  – Core logic that deals with identifying some of the above trade patterns are created using business rules. Business Rules have been covered in various areas in the blog but they primarily work based on an IF..THEN..ELSE construct.
  • Machine Learning & Predictive Analytics – A variety of supervised ad unsupervised learning approaches can be used to perform extensive Behavioral modeling & Segmentation to discover transactions behavior with a view to identifying behavioral patterns of traders & any outlier behaviors that connote potential regulatory violations.
  • A Single View of an Institutional Client- From the firm’s standpoint, it would be very useful to have a single view capability for clients that shows all of their positions across multiple desks, risk position, KYC score etc.

A Reference Architecture for Market Surveillance ..

This reference architecture aims to provide generic guidance to banking Business IT Architects building solutions in the realm of Market & Trade Surveillance. This supports a host of hugely important global reg reporting mandates – CAT, MiFiD II, MAR etc that Capital Markets need to comply with. While the concepts discussed in this solution architecture discussed are definitely Big Data oriented, they are largely agnostic to any cloud implementation – private, public or hybrid.

A Market Surveillance system needs to include both real time surveillance of trading activity as well as a retrospective (batch oriented) analysis component. The real time component includes the ability to perform realtime calculations (concerning thresholds, breached limits etc). real time queries with the goal of triggering alerts. Both these kinds of analytics span structured and unstructured data sources. For the batch component, the analytics involve data queries, simple to advanced statistics (min, max, avg, std deviation, sorting, binning, segmentation) to running data science models involving text analysis & search etc.

The system needs to process tens of millions to billions of events in a trading window while providing highest uptime guarantees. Batch analysis is always running in the background.

A Hadoop distribution that includes components such as Kafka, HBase and near real time components such as Storm & Spark Streaming provide a good fit for a responsive architecture. Apache NiFi with its ability to ingest data from a range of sources is preferred for it’s ability to support complex data routing, transformation, and system mediation logic in a complex event processing architecture. The capabilities of Hortonworks Data Flow (the enterprise version of Apache NiFi) is covered in the below blogpost in much detail.

Use Hortonworks Data Flow (HDF) To Connect The Dots In Financial Services..(3/3)

A Quick Note on Data Ingestion..

Data volumes in the area of Regulatory reporting can be huge to insanely massive. For instance, at large banks, they can go up to 100s of millions of transactions a day. At market venues such as stock exchanges, they easily enter into the hundreds of billions of messages every trading day. However the data itself is extremely powerful & is really business gold in terms of allowing banks to not just file mundane reg reports but also to perform critical line of business processes such as Single View of  Customer, Order Book Analysis, TCA (Transaction Cost Analysis), Algo Backtesting, Price Creation Analysis etc. The architecture thus needs to support multiple ways of storage, analysis and reporting ranging from compliance reporting to data scientists to business intelligence.

Real time processing in this proposed architecture are powered by Apache NiFi. There are five important reasons for this decision – 

  • First of all, complex rules can be defined in NiFi in a very flexible manner. As an example, one can execute SQL queries in processor A against incoming data from any source (data that isnt from a relational databases but JSON, Avro etc.) and  then route different results to different downstream processors based on the needs for processing while enriching it. E.g. Processor A could be event driven and if any data is being routed there, a field can be added, or an alert sent to XYZ. Essentially this can be very complex, equivalent to a nested rules engine so to speak. 
  • From a Throughput standpoint, a single NiFi node can typically handle somewhere between 50MB/s to 150MB/s depending on your hardware spec and data structure. Assuming 100-500 kbytes of average messages, for a throughput of 600MB/s, the architecture can be sized to about 5-10 NiFi nodes. It is important to note that performance latency of inbound message processing depends on the network, could be extremely small. Under the hood, you are sending data from source to NIfi node (disk), extract some attributes in memory to process, and deliver to the target system.
  • Data quality can be handled via the aforementioned “nested rules engine” approach, consisting of multiple NiFi processors. One can even embed an entire rules engine into a single processor. Similarly, you can define simple authentication rules at the event level. For instance, if Field A = English, route the message to an “authenticated” relationship; otherwise send it to an “unauthenticated” relationship.

  • One of the corner stones in NiFi is called “Data Provenance“, allowing you to have end to end traceability. Not only can the event lifecycle of trade data be traced but you can also track the time at which it happened & the user role who made the change and metadata around why did it happen.

  • Security – NiFi enables authentication at ingest. One can authenticate data via the rules defined in NiFi, or leverage target system authentication which is implemented at processor level. For example, the PutHDFS processor supports kerberized HDFS, the same applies for Solr and so on.

Overall Processing flow..

The below illustration shows the high-level conceptual architecture. The architecture is composed of core platform services and application-level components to facilitate the processing needs across three major areas of a typical surveillance reporting solution:

  • Connectivity to a range of trade data sources
  • Data processing, transformation & analytics
  • Visualization and business connectivity
Reference Architecture for Market Surveillance Reg Reporting – CAT, MAR,MiFiD II et al

The overall processing of data follows the order shown below and depicted in the diagram below –

  1. Data Production – Data related to Trades and their lifecycle is produced from a range of business systems. These data feeds from a range of business systems (including but not limited to) – trade data, valuation & position data, order management systems, position management systems, reference data, rates, market data, client data, front, middle & back office, data, voice, chat & other internal communications etc.
  2. Data Ingestion – Data produced from the the above layer is ingested using Apache NiFi from a range of sources described above. Data can also be filtered and alerts can be setup based on complex event logic. For time series data support HBase can be leveraged along with OpenTSDB. For CEP requirements, such as sliding windows and complex operators, NiFi can be leveraged along with Kafka and Storm pipeline.  Using NiFi will make the process easier to load data into the data lake while applying guarantees around the delivery itself.  Data can be streamed in real time as it is created in the feeder systems. Data is also loaded at end of the trading day based on the P&L sign off and the end of day close processes.  The majority of the data will be fed in from Book of Record Trading systems as well as from market data providers.
  3. As trade and other data is ingested into the data lake, it is important to note that the route in which certain streams are processed will differ from how other streams are processed. Thus the ingest architecture needs to support multiple types of processing ranging from in memory processing, intermediate transformation processing on certain data streams to produce a different representation of the stream. This is where NiFi adds critical support in not just handling a huge transaction throughput but also enabling “on the fly processing” of data in pipelines. As mentioned, NiFi does this via the concept of “processors”.
  4. The core data processing platform is then based on a datalake pattern which has been covered in this blog before. It includes the following pattern of processing.
    1. Data is ingested real time into a HBase database (which uses HDFS as the underlying storage layer). Tables are designed in HBase to store the profile of a trade and it’s lifecycle.
    2. Producers are authenticated at the point of ingest.
    3. Once the data has been ingested into HDFS, it is taken through a pipeline of processing (L0 to L3) as depicted in the below blogpost.

    4. Historical data (defined as T+1) once in the HDFS tier is taken through layers of processing as discussed above. One of the key areas of processing is to run machine learning on the data to discover any hidden patterns in the trades themselves. Patterns that can connote a range of suspicious behavior. Most surveillance applications are based on a search for data that breaches thresholds and seek to match sell & buy orders. The idea is that when these rules are breached, alerts are then generated for compliance officers to conduct further investigation. However this method falls short with complex types of market abuse.A range of supervised learning techniques can then be applied on data such as creating a behavioral profile of different kinds of traders (for instance junior and senior) by classifying & then scoring them based on their likelihood to commit fraud. Thus a range of Surveillance Analytics can be performed on the data. Apache Spark, is highly recommended for near realtime processing not only due to its high performance characteristics but also due to its native support for graph analytics and machine learning – both of which are critical to surveillance reporting.For a deeper look at data science, I recommend the below post.

    5. The other important value driver in deploying Data Science is to perform Advanced Transaction Monitoring Intelligence.  The core idea is to get years worth of Trade data in one location (i.e the datalake) & then applying  unsupervised learning to glean patterns in those transactions. The goal is then to identify profiles of actors with the intent of feeding it into existing downstream surveillance & TM systems.
    6. This knowledge can then be used to constantly learn transaction behavior for similar traders. This can be a very important capability in detecting fraud in traders, customer accounts and instruments.Some of the usecases are –
      • Profile trading activity of individuals with similar traits (types of customers, trading desks & instruments, geographical areas of operations etc.) to perform Know Your Trader
      • Segment traders by similar experience levels and behavior
      • Understand common fraudulent behavior typologies (e.g. spoofing) and clustering such (malicious) trading activities by trader, instrument and volume etc. The goal being to raise appropriate downstream investigation case management system
      • Using advanced data processing techniques like Natural Language Processing, constantly analyze electronic communications and join them up with trade data sources to both detect under the radar activity but also to keep the false positive rate low.
    7. Graph Database – Given that most kinds of trading fraud happens in groups of actors – traders acting in collusion with  verification & compliance – the ability to view complex relationships of interactions and the strength of those interactions can be a significant monitoring capability
    8. Grid Layer – To improve performance, I propose the usage of a distributed in memory data fabric like JBOSS DataGrid or Pivotal GemFire. This can aid in two ways –

      a. Help with fast lookup of data elements by the visualization layer
      b. Help perform fast computation process by overlaying a framework like Spark or MapReduce directly onto a stateful data fabric.

      The choice of tools here is dependent of the language choices that have been made in building the pricing and risk analytic libraries across the Bank. If multiple language bindings are required (e.g. C# & Java) then the data fabric will typically be a different product than the Grid.

      Data Visualization…

      The visualization solution chose shouldI enable the quick creation of interactive dashboards that provide KPIs and other important business metrics from a process monitoring standpoint. Various levels of dashboard need to be created ranging from compliance officer toolboxes, executive dashboard to help identify trends and discover valuable insights.

      Compliance Officer Toolbox (Courtesy: Arcadia Data)

      Additionally, the visualization layer shall provide

      a) A single view of Trader or Trade or Instrument or Entity

      b) Investigative workbench with Case Management capability

      c) The ability follow the lifecycle of a trade

      d) The ability to perform ad hoc queries over multiple attributes

      e) Activity correlation across historical and current data sets

      f) Alerting on specific metrics and KPIs

      To Sum Up…

      The solution architecture described in this blogpost is designed with peaceful enterprise co-existence in mind. In the sense, it interacts and is also integrated with a range of BORT systems and other enterprise systems such as ERP, CRM, legacy surveillance systems. This includes all and any other line of business solutions that typically exist as shared enterprise resources (such as CRM or ERP systems or other line-of-business solutions).

The Industrial Internet Comes of Age..

In 2017, the chief strategic concerns for Global Product Manufacturers are manifold. These range from their ability drive growth in new markets by creating products that younger customers need, cut costs by efficient high volume manufacturing spanning global supply chains  & effective distribution and service. While the traditional lifecycle has always been a huge management challenge the question now is how digital technology can help create new markets and drive higher margins in established areas. In this blogpost, we will consider how IIoT (Internet Of Things) technology can do all of the above and foster new business models -by driving customer value on top of the core product.

An Industry in Digital Dilemma

The last decade has seen tectonic changes in leading manufacturing economies. Along with a severe recession, employment in the industry has moved along the technology curve to a more skilled workforce. The services component of the industry is also steadily increasing i.e manufacturing now consumes business services and also is presented as such in certain sectors. The point is well made that this industry is not monolithic and there are distinct sectors with their own specific drivers for business success[1].

           The diverse sectors within Global Manufacturing (McKinsey [1])

Global manufacturing operations have evolved differently across each of the above five sectors – Global innovators for local markets, Regional processing, Energy intensive commodities, . For instance, each of the sectors has different geographical locations where production takes place, diverse supply chains, support models, efficiency requirements and technological focus areas and competitive forces.

However the trend that is broadly applicable to all of them is the “Industrial Internet”.

The Industrial Internet of Things (IIOT) can be defined as a ecosystem of capabilities that interconnects machines, personnel and processes to optimize the manufacturing lifecycle.  The foundational technologies that IIOT leverages are Smart Assets, Big Data, Realtime Analytics, Enterprise Automation and Cloud based services.

Globally integrated manufacturers must constantly assess and fine-tune their strategy across these above eight stages. A key aspect is to be able to collect data throughout the process to derive real-time insights from the lifecycle, suppliers and customers. IoT technologies allied with Big Data techniques provide ways to store this data and to derive real-time & historical analytic insights. Thus the Manufacturing industry is moving to an entirely virtual world across its lifecycle, ranging from product development, customer demand monitoring to production to inventory management. This trend is being termed as Industry 4.0 or Connected Manufacturing. As devices & systems become more interactive and intelligent, the data they send out can be used to optimize the lifecycle across the value chain thus driving higher utilization of plant capacity and improved operational efficiencies.

Let us consider the impact of the IIOT across the above sectors.

IIOT moves the Manufacturing Industries from Asset Centric to Data Centric

The Generic Product Manufacturing Lifecycle Overview as depicted in the above illustration covers the the most important activities that take place in the manufacturing process. Please note that this is a high level overview and in future posts we will expand upon each stage accordingly.

The overall lifecycle can be broken down into the following eight steps:

  1. Globally Integrated Product Design
  2. Prototyping and Pre-Production
  3. Mass production
  4. Sales and Marketing
  5. Product Distribution
  6. Activation and Support
  7. Value Added Services
  8. Resale and Retirement

IIoT impacts drives Product Design and Innovation

IoT technology can have a profound impact on the above traditional lifecycle in the following ways –

  1. The ability to connect the different aspects of the value chain that hitherto have been disconnected. This will fundamentally transform the asset lifecycle leading to higher manufacturing efficiencies, reduced wastage and more customer centric manufacturing (thus reducing recall rates)
  1. The ability to manage and integrate diverse data from sensors, machine data from operational systems, supplier channels & social media feedback drives real time insights
  2. The Connected asset lifecycle also leads to better inventory management and also drive optimal resupply decisions
  3. Create new business models that leverage data across the lifecycle to enable better product usage, pay for performance or outcome based services or even a subscription based usage model
  4. The ability track real time insights across the customer base thus leading to a more optimized asset lifecycle
  5. Reducing costs by allowing more operations ranging from product maintenance to product demos, customer experience sessions to occur remotely

Manufacturers have been connecting the value chain together for many years now. The M2M (mobile to mobile) implementations have already led to rounds of improvements in the so called ‘illities’ metrics– productivity, quality, reliability etc. The real opportunity with IIoT is being able to create new business models that result from the convergence of Operational Technology (OT) with Information Technology (IT). This journey primarily consists of taking a brick and mortar industry and slowly turning it into a data driven industry.

The benefits of adopting the IIOT range from improved quality owing to better aligned, efficient and data driven processes, higher operational efficiency overall, products better aligned with changing customer requirements, tighter communication across interconnected products and supplier networks.

The next post in this series will delve deeper into the Manufacturing Servitization phenomenon.


  1. McKinsey & Company  – Global Manufacturing Outlook  –

How Big Data & Advanced Analytics can help Real Estate Investment Trusts (REITS)

                                                         Image Credit – Kiplinger’s


Real Estate Investment Trust’s (REITS) are financial companies that own various forms of commercial and residential real estate. These assets include office buildings, retail shopping centers, hospitals, warehouses, timberland and hotels etc. Real estate is growing quite nicely as a component of the global financial business. Given their focus on real estate investments, REITS have always occupied a specialized position in global finance.

Fundamentally, there are three types of REITS –

  1. Equity REITS which exclusively deal in acquiring, improving and selling properties with the aim of higher returns for their investors
  2. Mortgage REITS only buy and sell mortgages
  3. Hybrid REITS which do both #1 and #2 above

REITS have a reasonably straightforward business model – you take the yields from the properties you own and reinvest the funds to be able to pay your investors (a mandated 95% of dividends). Most of the traditional REIT business processes are well handled by conventional types of technology. However more and more REITs are being challenged to develop a compelling Big Data strategy that leverages their tremendous data assets. 

The Five Key Big Data Applications for REITS… 

Let us consider at the five key areas where advanced analytics built on a Big Data foundation can immensely help REITS.

#1 Property Acquisition Modeling 

REITS owners can leverage the rich datasets available around renters demographics, preferences, seasonality, economic conditions in specific markets to better guide capital decisions on acquiring property. This modeling needs to take into account land costs, development costs, fixture costs & any other sales and marketing costs to appeal to tenants. I’d like to call this macro business perspective. Also from a micro business perspective, being able to better study individual properties using a variety of widely available data – MLS listings for similar properties, foreclosures, closeness to retail establishments, work sites, building profiles, parking spaces, energy footprint etc can help them match tenants to their property holdings. All this is critical to getting their investment mix right to meet profitability targets.

                                  Click on the Image for a blogpost discussing Predictive Analytics in Capital Markets

#2 Portfolio Modeling 

REITS can leverage Big Data to perform more granular modeling of their MBS portfolios. As an example, they can feed in a lot more data into their existing models as discussed above. E.g.  Demographic data, macroeconomic factors et al.  

A simple scenario would be if Interest Rates go up by X basis points – what does that mean for my portfolio exposure, Default Rate, Cost Picture, Optimal times to buy certain MBS’s etc ?  REITS can then use that info to enter hedges etc to protect against any downside. Big Data can also help with a range of predictive modeling across all of the above areas as discussed below.  An example is to build a 360 degree view of a given investment portfolio.

                                                         Click on Image for a Customer 360 discussion 

#3 Risk Data Aggregation & Calculations 

The instruments underlying the portfolios themselves carry large amounts of credit & interest rate risk. Big Data is a fantastic platform for aggregating and calculating many kinds of risk exposures as the below link discuss in detail. 


                                            Click on Image for a discussion of Risk Data Aggregation and Measurement 


#4 Detect and Prevent Money Laundering (AML)

Due to the global nature of investment funds flowing into real estate, REITS are highly exposed to money laundering and sanctions risks. Whether or not REITS operate in high risk geographies (India,China, South America, Russia etc) or have complex holding structures – they need to file SAR (Suspicious Activity Reports) with the FinCEN.  There has always been a strong case to be made that shady foreign entities and individuals were laundering ill gotten proceeds to buy US real estate. In early 2016, the FinCEN began implementing Geographic Targeting Orders (GTOs). Title companies based in the United States are now required to clearly identify the real owners of either limited liability companies (LLCs) or any other partnerships, and other legal entities being used to purchase high end residential real estate using cash.

AML as a topic is covered exhaustively in the below series of blogposts (please click on image to open the first one).

                                                         Click on Image for a Deepdive on AML

#5 Smart Cities, Net New Investments and Property Management

In the future, REITS would want to invest in Smart Cities which are positioned to be leading urban centers offering mobility, green technology, personalized medicine, safe services, clean water, traffic management and other forward looking urban amenities. These Smart Cities target a new kind of client- upwardly mobile, technologically savvy, environment conscious millenials. According to RBC Capital Markets, Smart Cities presents a massive investment opportunity for REITS. Such investments could provide REITS offering income yields of around 10-20%. (Source – Ben Forster @ Schroeders).

Smart Cities will be created using a number of high end technologies such as IoT, AI, Virtual Reality, Device Meshes etc. By 2020, it is estimated that these buildings will be generating an enormous amount of data that needs to be stored and analyzed by landlords.

As the below graphic from Cisco attests, the ability to work with IoT data to analyze a range of these micro investment opportunities is a Big Data challenge.

The ongoing maintenance and continuous refurbishment of rental properties is a large portion of the business operation of a REIT. The availability of smart sensors and such IoT devices that can track air quality, home appliance malfunction etc can help greatly with preventive maintenance.


As can be seen from some of the above business areas, most REITS data needs require a holistic approach across the value chain (capital sourcing, investment decisions, portfolio management & operations). This approach spans various horizontal functions like Customer Segmentation, Property Acquisition, Risk, Finance and Business Operations.
The need of the hour for larger REITS is to move to a common model for data storage, model building and testing.  It is becoming increasingly obvious that Big Data can provide massive business opportunities for REITS.