How Modern Taxation can benefit from Big Data Analytics..

We have repeatedly discussed how Predictive Analytics built on Big Data capabilities can create a tremendous amount of differentiation and value for Banking and Insurance companies. However, one of the most important areas of public finance – taxation- is just waking up to the possibilities of using this game changing technology – with the goal of  increasing public revenues via the collection of indirect taxes. In this blogpost, we will examine both perspectives – the taxation authority as well as the Corporate under the burden of tax reporting.

Tax

National Tax Authorities begin information sharing..

The last few years have seen increased digitalization of national tax accounting systems. More and more citizens opt to file their returns electronically. Reporting rules for businesses (such as Banks) that have wealthy account holders who have substantial financial assets spanning continents have also gone up. Different legislations have been promulgated in an effort to ensure that the exchange of information on tax matters is as seamless as it can possibly be – with the goal of curbing tax evasion.

The US Government’s tax arm – the IRS (Internal Revenue Service)- introduced the FATCA (Foreign Account Tax Compliance Act) in 2010 with the intention of detecting and tracking US nationals that reside abroad and owe the US taxes. The FATCA intends to prevent US taxpayers who hold non-US financial assets from avoiding taxation by ensuring that foreign banking institutions report on the US account holders or face a withholding of 30% on all US income.

The Organization for Economic Cooperation and Development (OECD) in conjunction with the US authorities has adopted the Common Reporting Standard (CRS), as an information standard for the automatic exchange of taxpayer information. As part of CRS, more than 90 countries now share information on residents assets and incomes in conformance with the reporting standard. CRS is far wider than FATCA and imposes a significant burden on the compliance team in a bank.

As an example, both the US and the UK governments enforce compliance with CRS. This enables bilateral information sharing between both national taxation authorities. [1]

To that end, national and regional tax authorities have begun using Big Data techniques refined in the private sector to increase the collection of taxes from citizens while reducing fraud and waste in the system.

Across a range of national compliance regimes, Big Data can help improve both the tax collection and improve risk scoring process. It can do this  by way of advanced analytics that operates on a much richer and wider set of data than was possible before. On the business side it can help with ensuring compliance reporting and not overpaying.

Catalogued below are the important areas in which Big Data Analytics can present a huge impact in the field of public taxation.

# 1: Help both Taxation Authorities and Finance Departments Efficiently Store and Process Enormous Amounts of Taxation Data 

The point is well made that the entire indirect taxation process is a complex process in terms of both the breadth and the depth of documentation. This is for both corporate tax departments and of the authorities. Businesses need to look at a complex range of multifaceted tax data across all of the thousands of national, state and local jurisdictions they operate across. Not being able to store, process the data in one place induces all the challenges caused by data silos. The Data Lake is a natural fit to store historical tax documents, Accounts Payables/Receivables, Expense information, business receipts, emails, call transcripts etc. Data from various source systems that drive the tax process should be from Master Data, Reference Data (containing fresh & accurate tax rate/jurisdiction data), data from various Book of Record Systems, ERP, finance and legacy financial systems. The key point here is to automate this data movement and cut down manual ingestion processes thus improving both the speed and quality of the process at the first & most important step itself.

                                                    Click on Image to view a blogpost on Data Lakes 

# 2: Perform Accurate Customer Due Diligence (CDD) by Creating a Single View of Compliance

Given the complexity of both compliance regimes, Big Data can help automate Customer Due Diligence (CDD)/ know your customer (KYC). This is important in helping improve both the tax collection and improve risk scoring process;

Big Data Analytics can also make the CDD highly automated as opposed to a manual laborious process especially around tax avoidance watch lists or suspicious accounts. One of the first steps here is to help create a 360 degree view of entity whether a high risk individual or a corporate entity for taxation. Doing so enables better account servicing as well as provides a holistic activity of fraud. Adding a machine learning process on this can help detect micro patterns of fraud across 10s of accounts across geographies. KYC programs are becoming increasingly daunting undertakings due to issues such as difficulty in identifying customers across multiple lines of business, and lack of a consistent view of customer bank product use and transaction activity. Further complicating these challenges is the advent of new risks such as digital currencies, new and unique payment methods, and continued variation in global data privacy regulation—all of which are resulting in enhanced regulatory scrutiny of banks readiness in this area. 

# 3: Predicting Tax Yields (or Liabilities) More Accurately

Leveraging the ingestion and predictive capabilities of a Big Data based platform, tax authorities can create a full picture of an individual or entity across all accounts or geographies. Internal Bank compliance teams can do the same with their client accounts. This can be used to predict better tax yield and compliance numbers.

# 4: Vastly Improve Tax Fraud & Evasion by improving Risk Scoring & Detection

The Banking industry has already begun leveraging sophisticated fraud detection strategies using socio-demographic data and taxpayers behavior. Big Data can be used to look at 10s of attributes per individual that were previously missed out before even for internal structured data. Adding a machine learning process on this can help detect micro patterns of fraud across 10s of accounts across geographies. 

Big Data based analytics also operate at scale thus eliminating manual and cumbersome spreadsheet based analysis – which bedevils the ability to quickly and visually detect tax fraud and evasion.

# 5: Improve the Auditability & Accuracy of Regulatory Reporting

Most businesses currently use ERP systems and engines within those to help with their taxation process. An example is to calculate VAT (Value Added Tax) obligations using the same.  These traditional tax engines and tools suffer from all the issues that plagued traditional tax data storage – operating on limited data sets which may or may not be accurate, manual processes and reconciliations lead to audits more regularly.  Big Data with its focus on data quality and overall governance can help remedy these issues. It can help improve the quality, timeliness and overall confidence in the reporting thus leading to lower number of audits.

Conclusion..

Opening the door to the latest data storage and processing techniques can help taxation authorities introduce a higher degree of automation into their core business functions. This will allow them to reduce manual data operations, avoid costly reconciliation & reporting discrepancies – thus reducing costly audits.  This will enable them to focus on tasks such as strategic forecasting and better tax planning.

References..

[1] Indirect Tax Compliance in an Era of Big Data – Gillis, McStocker and Percival, KPMG –   http://www.bna.com/indirect-tax-compliance-n17179927659/

How Big Data & Advanced Analytics can help Real Estate Investment Trusts (REITS)

                                                         Image Credit – Kiplinger’s

Introduction…

Real Estate Investment Trust’s (REITS) are financial companies that own various forms of commercial and residential real estate. These assets include office buildings, retail shopping centers, hospitals, warehouses, timberland and hotels etc. Real estate is growing quite nicely as a component of the global financial business. Given their focus on real estate investments, REITS have always occupied a specialized position in global finance.

Fundamentally, there are three types of REITS –

  1. Equity REITS which exclusively deal in acquiring, improving and selling properties with the aim of higher returns for their investors
  2. Mortgage REITS only buy and sell mortgages
  3. Hybrid REITS which do both #1 and #2 above

REITS have a reasonably straightforward business model – you take the yields from the properties you own and reinvest the funds to be able to pay your investors (a mandated 95% of dividends). Most of the traditional REIT business processes are well handled by conventional types of technology. However more and more REITs are being challenged to develop a compelling Big Data strategy that leverages their tremendous data assets. 

The Five Key Big Data Applications for REITS… 

Let us consider at the five key areas where advanced analytics built on a Big Data foundation can immensely help REITS.

#1 Property Acquisition Modeling 

REITS owners can leverage the rich datasets available around renters demographics, preferences, seasonality, economic conditions in specific markets to better guide capital decisions on acquiring property. This modeling needs to take into account land costs, development costs, fixture costs & any other sales and marketing costs to appeal to tenants. I’d like to call this macro business perspective. Also from a micro business perspective, being able to better study individual properties using a variety of widely available data – MLS listings for similar properties, foreclosures, closeness to retail establishments, work sites, building profiles, parking spaces, energy footprint etc can help them match tenants to their property holdings. All this is critical to getting their investment mix right to meet profitability targets.

                                  Click on the Image for a blogpost discussing Predictive Analytics in Capital Markets

#2 Portfolio Modeling 

REITS can leverage Big Data to perform more granular modeling of their MBS portfolios. As an example, they can feed in a lot more data into their existing models as discussed above. E.g.  Demographic data, macroeconomic factors et al.  

A simple scenario would be if Interest Rates go up by X basis points – what does that mean for my portfolio exposure, Default Rate, Cost Picture, Optimal times to buy certain MBS’s etc ?  REITS can then use that info to enter hedges etc to protect against any downside. Big Data can also help with a range of predictive modeling across all of the above areas as discussed below.  An example is to build a 360 degree view of a given investment portfolio.

                                                         Click on Image for a Customer 360 discussion 

#3 Risk Data Aggregation & Calculations 

The instruments underlying the portfolios themselves carry large amounts of credit & interest rate risk. Big Data is a fantastic platform for aggregating and calculating many kinds of risk exposures as the below link discuss in detail. 

  

                                            Click on Image for a discussion of Risk Data Aggregation and Measurement 

 

#4 Detect and Prevent Money Laundering (AML)

Due to the global nature of investment funds flowing into real estate, REITS are highly exposed to money laundering and sanctions risks. Whether or not REITS operate in high risk geographies (India,China, South America, Russia etc) or have complex holding structures – they need to file SAR (Suspicious Activity Reports) with the FinCEN.  There has always been a strong case to be made that shady foreign entities and individuals were laundering ill gotten proceeds to buy US real estate. In early 2016, the FinCEN began implementing Geographic Targeting Orders (GTOs). Title companies based in the United States are now required to clearly identify the real owners of either limited liability companies (LLCs) or any other partnerships, and other legal entities being used to purchase high end residential real estate using cash.

AML as a topic is covered exhaustively in the below series of blogposts (please click on image to open the first one).

                                                         Click on Image for a Deepdive on AML

#5 Smart Cities, Net New Investments and Property Management

In the future, REITS would want to invest in Smart Cities which are positioned to be leading urban centers offering mobility, green technology, personalized medicine, safe services, clean water, traffic management and other forward looking urban amenities. These Smart Cities target a new kind of client- upwardly mobile, technologically savvy, environment conscious millenials. According to RBC Capital Markets, Smart Cities presents a massive investment opportunity for REITS. Such investments could provide REITS offering income yields of around 10-20%. (Source – Ben Forster @ Schroeders).

Smart Cities will be created using a number of high end technologies such as IoT, AI, Virtual Reality, Device Meshes etc. By 2020, it is estimated that these buildings will be generating an enormous amount of data that needs to be stored and analyzed by landlords.

As the below graphic from Cisco attests, the ability to work with IoT data to analyze a range of these micro investment opportunities is a Big Data challenge.

The ongoing maintenance and continuous refurbishment of rental properties is a large portion of the business operation of a REIT. The availability of smart sensors and such IoT devices that can track air quality, home appliance malfunction etc can help greatly with preventive maintenance.

Conclusion..

As can be seen from some of the above business areas, most REITS data needs require a holistic approach across the value chain (capital sourcing, investment decisions, portfolio management & operations). This approach spans various horizontal functions like Customer Segmentation, Property Acquisition, Risk, Finance and Business Operations.
The need of the hour for larger REITS is to move to a common model for data storage, model building and testing.  It is becoming increasingly obvious that Big Data can provide massive business opportunities for REITS.

Why Big Data & Advanced Analytics are Strategic Corporate Assets..

The industry is all about Digital now. The explosion in data storage and processing techniques promises to create new digital business opportunities across industries. Business Analytics concerns itself from deriving insights from data that is produced as a byproduct of business operations as well as external data that reflects customer insights. Due to their critical importance in decision making, Business Analytics is now a boardroom matter and not just one confined to the IT teams. My goal in this blogpost is to quickly introduce the analytics landscape before moving on to the significant value drivers that only Predictive Analytics can provide.

The Impact of Business Analytics…

The IDC “Worldwide Big Data and Analytics Spending Guide 2016”, predicts that the big data and business analytics market will grow from $130 billion by the end of this year to $203 billion by 2020[1] at a   compound annual growth rate (CAGR) of 11.7%. This exponential growth is being experienced across industry verticals such as banking & insurance, manufacturing, telecom, retail and healthcare.

Further, during the next four years, IDC finds that large enterprises with 500+ employees will be the main driver in big data and analytics investing, accounting for about $154 billion in revenue. The US will lead the market with around $95 billion in investments during the next four years – followed by Western Europe & the APAC region [1].

The two major kinds of Business Analytics…

When we discuss the broad topic of Business Analytics, it needs to be clarified that there are two major disciplines – Descriptive and Predictive. Industry analysts from Gartner & IDC etc. will tell you that one also needs to widen the definition to include Diagnostic and Prescriptive. Having worked in the industry for a few years, I can safely say that these can be subsumed into the above two major categories.

Let’s define the major kinds of industrial analytics at a high level –

Descriptive Analytics is commonly described as being retrospective in nature i.e “tell me what has already happened”. It covers a range of areas traditionally considered as BI (Business Intelligence). BI focuses on supporting operational business processes like customer onboarding, claims processing, loan qualification etc via dashboards, process metrics, KPI’s (Key Performance Indicators). It also supports a basic level of mathematical techniques for data analysis (such as trending & aggregation etc.) to infer intelligence from the same.  Business intelligence (BI) is a traditional & well established analytical domain that essentially takes a retrospective look at business data in systems of record. The goal of the Descriptive disciplines is to primarily look for macro or aggregate business trends across different aspects or dimensions such as time, product lines, business units & operating geographies.

  • Predictive Analytics is the forward looking branch of analytics which tries to predict the future based on information about the past. It describes what “can happen based on the patterns in data”. It covers areas like machine learning, data mining, statistics, data engineering & other advanced techniques such as text analytics, natural language processing, deep learning, neural networks etc. A more detailed primer on both along with detailed use cases are found here –

The Data Science Continuum in Financial Services..(3/3)

The two main domains of Analytics are complementary yet different…

Predictive Analytics does not intend to, nor will it, replace the BI domain but only adds significant sophisticated analytical capabilities enabling businesses to be able to do more with all the data they collect. It is not uncommon to find real world business projects leveraging both these analytical approaches.

However from an implementation standpoint, the only common area of both approaches is knowledge of the business and the sources of data in an organization. Most other things about them vary.

For instance, predictive approaches both augment & build on the BI paradigm by adding a “What could happen” dimension to the data.

The Descriptive Analytics/BI workflow…

BI projects tend to follow a largely structured process which has been well defined over the last 15-20 years. As the illustration below describes it, data produced in operational systems is subject to extraction, transformation and eventually is loaded into a data warehouse for consumption by visualization tools.

descriptive_analytics

                                                                       The Descriptive Analysis Workflow 

Descriptive Analytics and BI add tremendous value to well defined use cases based on a retrospective look at data.

However, key challenges with this process are –

  1. the lack of a platform to standardize and centralize data feeds leads to data silos which cause all kinds of cost & data governance headaches across the landscape
  2. complying with regulatory initiatives (such as Anti Money Laundering or Solvency II etc.) needs the warehouse to handle varying types of data which is a huge challenge for most of the EDW technologies
  3. the ability to add new & highly granular fields to the data feeds in an agile manner requires extensive relational modeling upfront to handle newer kinds of schemas etc.

Big Data platforms have overcome past shortfalls in security and governance and are being used in BI projects at most organizations. An example of the usage of Hadoop in classic BI areas like Risk Data Aggregation are discussed in depth at the below blog.

http://www.vamsitalkstech.com/?p=2697

That being said, BI projects tend to follow a largely structured process which has been well defined over the last 20 years. This space serves a large existing base of customers but the industry has been looking to Big Data as a way of constructing a central data processing platform which can help with the above issues.

BI projects are predicated on using an EDW (Enterprise Data Warehouse) and/or RDBMS (Relational Database Management System) approach to store & analyze the data. Both these kinds of data storage and processing technologies are legacy in terms of both the data formats they support (Row-Column based) as well as the types of data they can store (structured data).

Finally, these systems fall short of processing data volumes generated by digital workloads which tend to be loosely structured (e.g mobile application front ends, IoT devices like sensors or ATM machines or Point of Sale terminals), & which need business decisions to be made in near real time or in micro batches (e.g detect credit card fraud, suggest the next best action for a bank customer etc.) and increasingly cloud & API based to save on costs & to provide self-service.

That is where Predictive Approaches on Big Data platforms are beginning to shine and fill critical gaps.

The Predictive Analytics workflow…

Though the field of predictive analytics has been around for years – it is rapidly witnessing a rebirth with the advent of Big Data. Hadoop ecosystem projects are enabling the easy ingestion of massive quantities of data thus helping the business gather way more attributes about their customers and their preferences.

data_science_process

                                                                    The Predictive Analysis Workflow

The Predictive Analytics workflow always starts with a business problem in mind. Examples of these would be “A marketing project to detect which customers are likely to buy new products or services in the next six months based on their historical & real time product usage patterns – which are denoted by x, y or z characteristics” or “Detect real-time fraud in credit card transactions.”

In use cases like these, the goal of the data science process is to be able to segment & filter customers by corralling them into categories that enable easy ranking. Once this is done, the business is involved to setup easy and intuitive visualization to present the results.

A lot of times, business groups have a hard time explaining what they would like to see – both data and the visualization. In such cases, a prototype makes things easier from a requirements gathering standpoint.  Once the problem is defined, the data scientist/modeler identifies the raw data sources (both internal and external) which comprise the execution of the business challenge.  They spend a lot of time in the process of collating the data (from Oracle/SQL Server, DB2, Mainframes, Greenplum, Excel sheets, external datasets, etc.). The cleanup process involves fixing a lot of missing values, corrupted data elements, formatting fields that indicate time and date etc.

The data wrangling phase involves writing code to be able to join various data elements so that a single client’s complete dataset is gathered in the Data Lake from a raw features standpoint.  If more data is obtained as the development cycle is underway, the Data Science team has no option but to go back & redo the whole process. The modeling phase is where algorithms come in – these can be supervised or unsupervised. Feature engineering takes in business concepts & raw data features and creates predictive features from them. The Data Scientist takes the raw & engineered features and creates a model using a mix of various algorithms. Once the model has been repeatedly tested for accuracy and performance, it is typically deployed as a service. Models as a Service (MaaS) is the Data Science counterpart to Software as a Service. The MaaS takes in business variables (often hundreds of inputs) and provides as output business decisions/intelligence, measurements, & visualizations that augment decision support systems.

 How Predictive Analytics changes the game…

Predictive analytics can bring about transformative benefits in the following six ways.

  1. Predictive approaches can be applied to a much wider & richer variety of business challenges thus enabling an organization to achieve outcomes that were not really possible with the Descriptive variety. For instance, these use cases range from Digital Transformation to fraud detection to marketing analytics to IoT (Internet of things) across industry verticals. Predictive approaches are real-time and not just batch oriented like the Descriptive approaches.
  2. When deployed strategically – they can scale to enormous volumes of data and help reason over them reducing manual costs.  It can take on problems that can’t be managed manually because of the huge amount of data that must be processed.
  3. They can predict the results of complex business scenarios by being able to probabilistically predict different outcomes across thousands of variables by perceiving minute dependencies between them. An example is social graph analysis to understand which individuals in a given geography are committing fraud and if there is a ring operating
  4. They are vastly superior at handling fine grained data of manifold types than can be handled by the traditional approach or by manual processing. The predictive approach also encourages the integration of previously “dark” data as well as newer external sources of data.
  5. They can also suggest specific business actions(e.g. based on the above outcomes) by mining data for hitherto unknown patterns. The data science approach constantly keeps learning in order to increase its accuracy of decisions
  6. Data Monetization–  they can be used to interpret the mined data to discover solutions to business challenges and new business opportunities/models

References

[1] IDC Worldwide Semiannual Big Data and Business Analytics Spending Guide – Oct 2016 “Double-Digit Growth Forecast for the Worldwide Big Data and Business Analytics Market Through 2020 Led by Banking and Manufacturing Investments, According to IDC

http://www.idc.com/getdoc.jsp?containerId=prUS41826116

 

Demystifying Digital – Why Customer 360 is the Foundational Digital Capability – ..(1/3)

The first post in this three part series on Digital Foundations introduces the concept of Customer 360 or Single View of Customer (SVC). We will discuss the need for & the definition of the SVC as part of the first step in any Digital Transformation endeavor. We will also discuss specific benefits from both a business & operational state that are enabled by SVC. The second post in the series introduces the concept of a Customer Journey. The third & final post will focus on a technical design & architecture needed to achieve both these capabilities.
 
In an era of exploding organizational touch points, how many companies can truly claim that they know & understand their customers, their needs & evolving preferences deeply and from a realtime perspective?  
How many companies can claim to keep up as a customers product & service usage matures and keep them engaged by cross selling new offerings. How many can accurately predict future revenue from a customer based on their current understanding of their profile?
The answer is not at all encouraging.
Across industries like Banking, Insurance, Telecom & Manufacturing, the ability to get a unified view of the customer & their journey is at the heart of the the enterprise ability to promote relevant offerings & detect customer dissatisfaction. 
  • Currently most industry players are woeful at putting together this comprehensive Single View of their Customers (SVC). Due to operational silos, each department possess a siloed & limited view of the customer across multiple channels. These views are typically inconsistent, lack synchronization with other departments & miss a high amount of potential cross-sell and up-sell opportunities.
  • The Customer Journey problem has been an age old issue which has gotten exponentially more complicated over the last five years as the staggering rise of mobile technology and the Internet of Things (IoT) have vastly increased the number of enterprise touch points that customers are exposed to in terms of being able to discover & purchase new products/services. In an OmniChannel world, an increasing number of transactions are being conducted online. In verticals like Retail and Banking, the number of online transactions approaches an average of 40%. Adding to the problem, more and more consumers are posting product reviews and feedback online. Companies thus need to react in realtime to piece together the source of consumer dissatisfaction.
Another large component of customer outreach are Marketing analytics & the ability to run effective campaigns to recruit customers.

The most common questions that a lot of enterprises fail to answer accurately are –

  1. Is the Customer happy with their overall relationship experience?
  2. What mode of contact do they prefer? And at what time? Can Customers be better targeted at these channels at those preferred times?
  3. What is the overall Customer Lifetime Value (CLV) or how much profit we are able to generate from this customer over their total lifetime?
  4. By understanding CLV across populations, can we leverage that to increase spend on marketing & sales for products that are resulting in higher customer value?
  5. How do we increase cross sell and up-sell of products & services?
  6. Does this customer fall into a certain natural segment and if so, how can we acquire most customers like them?
  7. Can different channels (Online, Mobile, IVR & POS) be synchronized ? Can Customers begin a transaction in one channel and complete it in any of the others without having to resubmit their data?

The first element in Digital is the Customer Centricity & it must naturally follow that a 360 degree view is a huge aspect of that.

single-view-of-the-customer

                                       Illustration – Customer 360 view & its benefits

So what information is specifically contained in a Customer 360 –

The 360 degree view is a snapshot of the below types of data –

  • Customer’s Demographic information – Name, Address, Age etc
  • Length of the Customer-Enterprise relationship
  • Products and Services purchased overall
  • Preferred Channel & time of Contact
  • Marketing Campaigns the customer has responded to
  • Major Milestones in the Customers relationship
  • Ongoing activity – Open Orders, Deposits, Shipments, Customer Cases etc
  • Ongoing Customer Lifetime Value (CLV) Metrics and the Category of customer (Gold, Silver, Bronze etc)
  • Any Risk factors – Likelihood of Churn, Customer Mood Alert, Ongoing issues etc
  • Next Best Action for Customer

How Big Data technology can help..

Leveraging the ingestion and predictive capabilities of a Big Data based platform, banks can provide a user experience that rivals Facebook, Twitter or Google and provide a full picture of customer across all touch points.

Big Data enhances the Customer 360 capability in the following ways  –  

  1. Obtaining a realtime Single View of the Customer (typically a customer across multiple channels, product silos & geographies) across years of account history 
  2. Customer Segmentation by helping businesses understand customer segments down to the individual level as well as at a segment level
  3. Performing Customer sentiment analysis by combining internal organizational data, clickstream data, sentiment analysis with structured sales history to provide a clear view into consumer behavior.
  4. Product Recommendation engines which provide compelling personal product recommendations by mining realtime consumer sentiment, product affinity information with historical data.
  5. Market Basket Analysis, observing consumer purchase history and enriching this data with social media, web activity, and community sentiment regarding past purchase and future buying trends.

Customer 360 can help improve the following operational metrics of a Retailer or a Bank or a Telecom immensely.

  1. Cost to Income ratio; Customers Acquired per FTE; Sales and service FTE’s (as percentage of total FTE’s), New Accounts Per Sales FTE etc
  2.  Sales conversion rates across channels, Decreased customer attrition rates etc.
  3. Improved Net promotor scores (NPS), referral based sales etc

Customer 360 is thus basic digital capability every organization needs to offer their customers, partners & internal stakeholders. This implies a re-architecture of both data management and business processes automation.

The next post will discuss the second critical component of Digital Transformation – the Customer Journey.

Embedding A Culture of Business Analytics into the Enterprise DNA..

IT driven business transformation is always bound to fail” – Amber Storey, Sr Manager, Ernst & Young

The value of Big Data driven Analytics is no longer in question both from a customer as well as an enterprise standpoint. Lack of investment in an analytic strategy has the potential to impact shareholder value negatively.  Business Boards and CXOs are now concerned about their overall levels and maturity of investments in terms of business value – i.e increasing sales, driving down business & IT costs & helping create new business models. It is thus an increasingly accurate argument that smart applications & ecosystems built around them will increasingly dictate enterprise success.

Such examples among forward looking organizations abound across industries. These range from realtime analytics in manufacturing using IoT data streams across the supply chain, the use of natural language processing to drive patient care decisions in healthcare, more accurate insurance fraud detection & driving Digital interactions in Retail Banking etc to quote a few. 

However , most global organizations currently adopt a fairly tactical approach to ensuring the delivery of of traditional business intelligence (BI) and predictive analytics to their application platforms.  This departmental is quite suboptimal in ways as scaleable data driven decisions & culture not only empower decision-makers with up to date and realtime information but also help them develop long term insights into how globally diversified business operations are performing.  Scale is the key word here due to rapidly changing customer trends, partner, supply chain realities & regulatory mandates.

Scale implies speed of learning,  business agility across the organization in terms of having globally diversified operations turn on a dime thus ensuring that the business feels empowered.

A quick introduction to Business (Descriptive & Predictive) Analytics –

Business intelligence (BI) is a traditional & well established analytical domain that essentially takes a retrospective look at business data in systems of record. The goal for BI is to primarily look for macro or aggregate business trends across different aspects or dimensions such as time, product lines, business unites & operating geographies.

BI is primarily concerned with “What happened and what trends exist in the business based on historical data?“. The typical use cases for BI include budgeting, business forecasts, reporting & key performance indicators (KPI).

On the other hand, Predictive Analytics (a subset of Data Science) augments & builds on the BI paradigm by adding a “What could happen” dimension to the data in terms of –

  • being able to probabilistically predict different business scenarios across thousands of variables
  • suggesting specific business actions based on the above outcomes

Predictive Analytics does not intend to nor will it replace the BI domain but only adds significant business capabilities that lead to overall business success. It is not uncommon to find real world business projects leveraging both these analytical approaches.

Creating an industrial approach to analytics – 

Strategic business projects typically begin imbibing a BI/Predictive Analytics based approach as an afterthought to the other aspects of system architecture and buildout. This dated approach then ensures that analytics becomes external to and eventually operating in a reactive mode in the operation of business system.

Having said that, one does need to recognize that an industrial approach to analytics is a complex endeavor that depends on how an organization tackles the convergence of the below approaches –

  1. Organizational Structure
  2. New Age Technology 
  3. A Platforms Mindset
  4. Culture

Creating_An_Analytic_Culture

        Illustration – Embedding A Culture of Business Analytics into the Enterprise DNA..

Lets discuss them briefly – 

Organizational Structure – The historical approach has been to primarily staff analytics teams as a standalone division often reporting to a CIO. This team has responsibility for both the business intelligence as well as some silo of a data strategy. Such a piecemeal approach to predictive analytics ensures that business & application teams adopt a “throw it over the wall” mentality over time.

So what needs to be done? 

In the Digital Age, enterprises should look to centralize both data management as well as the governance of analytics as core business capabilities. I suggest a hybrid organizational structure where a Center of Excellence (COE) is created which reports to the office of the Chief Data Officer (CDO) as well as individual business analytic leaders within the lines of business themselves.

 This should be done to ensure that three specific areas are adequately tackled using a centralized approach- 

  • Investing in creating a data & analytics roadmap by creating a center of excellence (COE)
  • Setting appropriate business milestones with “lines of business” value drivers built into a robust ROI model
  • Managing Risk across the enterprise with detailed scenario planning

New Age Technology –

The onset of Digital Architectures in enterprise businesses implies the ability to drive continuous online interactions with global consumers/customers/clients or patients. The goal is not just provide engaging visualization but also to personalize services clients care about across multiple modes of interaction. Mobile applications first begun forcing the need for enterprise to begin supporting multiple channels of interaction with their consumers. We have seen how how exploding data generation across the global economy has become a clear & present business & IT phenomenon. Data volumes are rapidly expanding across industries. However, while the production of data itself that has increased but it is also driving the need for organizations to derive business value from it. This calls for the collection & curation of data from dynamic,  and highly distributed sources such as consumer transactions, B2B interactions, machines such as ATM’s & geo location devices, click streams, social media feeds, server & application log files and multimedia content such as videos etc – using Big Data.

Cloud Computing is the ideal platform to provide the business with self service as well as rapid provisioning of business analytics. Every new application designed needs to be cloud native from the get go.

The onset of Digital Architectures in enterprise businesses implies the ability to drive continuous online interactions with global consumers/customers/clients or patients. The goal is not just provide engaging Visualization but also to personalize services clients care about across multiple modes of interaction. Mobile applications first begun forcing the need for enterprise to begin supporting multiple channels of interaction with their consumers. For example Banking now requires an ability to engage consumers in a seamless experience across an average of four to five channels – Mobile, eBanking, Call Center, Kiosk etc.

A Platforms Mindset – 

As opposed to building standalone or one-off business applications, a Platform Mindset is a more holistic approach capable of producing higher revenues. Platforms abound in the webscale world at shops like Apple, Facebook & Google etc. Applications are constructed like lego blocks  and they reuse customer & interaction data to drive cross sell and up sell among different product lines. The key components here are to ensure that one starts off with products with high customer attachment & retention. While increasing brand value, it is key to ensure that customers & partners can also collaborate in the improvements in the various applications hosted on top of the platform.

Culture – Business value fueled by analytics is only possible if the entire organization operates on an agile basis in order to collaborate across the value chain. Cross functional teams across new product development, customer acquisition & retention, IT Ops, legal & compliance must collaborate in short work cycles to close the traditional business & IT innovation gap. Methodologies like DevOps who’s chief goal is to close the long-standing gap between the engineers who develop and test IT capability and the organizations that are responsible for deploying and maintaining IT operations – must be adopted. Using traditional app dev methodologies, it can take months to design, test and deploy software. No business today has that much time—especially in the age of IT consumerization and end users accustomed to smart phone apps that are updated daily. The focus now is on rapidly developing business applications to stay ahead of competitors that can better harness Big Data’s amazing business capabilities.

Summary- 

Enterprise wide business analytic approaches designed around the four key prongs  (Structure, Culture, Technology & Platforms)   will create immense operational efficiency, better business models, increased relevance and ultimately drive revenues. These will separate the visionaries, leaders from the laggards in the years to come.

Data Lakes power the future of Industrial Analytics..(1/4)

The first post in this four part series on Data lakes will focus on the business reasons to create one. The second post will delve deeper into the technology considerations & choices around data ingest & processing in the lake to satisfy myriad business requirements. The third will tackle the critical topic of metadata management, data cleanliness & governance. The fourth & final post in the series will focus on the business justification to build out a Big Data Center of Excellence (COE).

Business owners at the C level are saying, ‘Hey guys, look. It’s no longer inordinately expensive for us to store all of our data. I want all of you to make copies. OK, your systems are busy. Find the time, get an extract, and dump it in Hadoop.’”- Mike Lang, CEO of Revelytix

The onset of Digital Architectures in enterprise businesses implies the ability to drive continuous online interactions with global consumers/customers/clients or patients. The goal is not just provide engaging visualization but also to personalize services clients care about across multiple modes of interaction. Mobile applications first begun forcing the need for enterprise to begin supporting multiple channels of interaction with their consumers. For example Banking now requires an ability to engage consumers in a seamless experience across an average of four to five channels – Mobile, eBanking, Call Center, Kiosk etc. Healthcare is a close second where caregivers expect patient, medication & disease data at their fingertips with a few finger swipes on an iPad app.

Big Data has been the chief catalyst in this disruption. The Data Lake architectural & deployment pattern makes it possible to first store all this data & then enables the panoply of Hadoop ecosystem projects & technologies to operate on it to produce business results.

Let us consider a few of the major industry verticals and the sheer data variety that players in these areas commonly possess – 

The Healthcare & Life Sciences industry possess some of the most diverse data across the spectrum ranging from – 

  • Structured Clinical data e.g. Patient ADT information
  • Free hand notes
  • Patient Insurance information
  • Device Telemetry 
  • Medication data
  • Patient Trial Data
  • Medical Images – e.g. CAT Scans, MRIs, CT images etc

The Manufacturing industry players are leveraging the below datasets and many others to derive new insights in a highly process oriented industry-

  • Supply chain data
  • Demand data
  • Pricing data
  • Operational data from the shop floor 
  • Sensor & telemetry data 
  • Sales campaign data

Data In Banking– Corporate IT organizations in the financial industry have been tackling data challenges due to strict silo based approaches that inhibit data agility for many years now.
Consider some of the traditional sources of data in banking –

  • Customer Account data e.g. Names, Demographics, Linked Accounts etc
  • Core Banking Data
  • Transaction Data which captures the low level details of every transaction (e.g debit, credit, transfer, credit card usage etc)
  • Wire & Payment Data
  • Trade & Position Data
  • General Ledger Data e.g AP (accounts payable), AR (accounts receivable), cash management & purchasing information etc.
  • Data from other systems supporting banking reporting functions.

Industries have changed around us since the advent of relational databases & enterprise data warehouses. Relational Databases (RDBMS) & Enterprise Data Warehouses (EDW) were built with very different purposes in mind. RDBMS systems excel at online transaction processing (OLTP) use cases where massive volumes of structured data needs to be processed quickly. EDW’s on the other hand perform online analytical processing functions (OLAP) where data extracts are taken from OLTP systems, loaded & sliced in different ways to . Both these kinds of systems are not simply suited to handle not just immense volumes of data but also highly variable structures of data.

awesome-lake

Let us consider the main reasons why legacy data storage & processing techniques are unsuited to new business realities of today.

  • Legacy data technology enforces a vertical scaling method that is sorely unsuited to handling massive volumes of data in a scale up/scale down manner
  • The structure of the data needs to be modeled in a paradigm called ’schema on write’ which sorely inhibits time to market for new business projects
  • Traditional data systems suffer bottlenecks when large amounts of high variety data are processed using them 
  • Limits in the types of analytics that could be performed. In industries like Retail, Financial Services & Telecommunications, enterprise need to build detailed models of customers accounts to predict their overall service level satisfaction in realtime. These models are predictive in nature and use data science techniques as an integral component. The higher volumes of data along with attribute richness that can be provided to them (e.g. transaction data, social network data, transcribed customer call data) ensures that the models are highly accurate & can provide an enormous amount of value to the business. Legacy systems are not a great fit here.

Given all of the above data complexity and the need to adopt agile analytical methods  – what is the first step that enterprises must adopt? 

The answer is the adoption of the Data Lake as an overarching data architecture pattern. Lets define the term first. A data lake is two things – a small or massive data storage repository and a data processing engine. A data lake provides “massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs“.[1] Data Lake are created to ingest, transform, process, analyze & finally archive large amounts of any kind of data – structured, semistructured and unstructured data.

DL_1

                                  Illustration – The Data Lake Architecture Pattern

What Big Data brings to the equation beyond it’s strength in data ingest & processing is a unified architecture. For instance, MapReduce is the original framework for writing applications that process large amounts of structured and unstructured data stored in the Hadoop Distributed File System (HDFS). Apache Hadoop YARN opened Hadoop to other data processing engines (e.g. Apache Spark/Storm) that can now run alongside existing MapReduce jobs to process data in many different ways at the same time. The result is that ANY kind of application processing can be run inside a Hadoop runtime – batch, realtime, interactive or streaming.

Visualization  – Mobile applications first begun forcing the need for enterprise to begin supporting multiple channels of interaction with their consumers. For example Banking now requires an ability to engage consumers in a seamless experience across an average of four to five channels – Mobile, eBanking, Call Center, Kiosk etc. The average enterprise user is also familiar with BYOD in the age of self service. The Digital Mesh only exacerbates this gap in user experiences as information consumers navigate applications as they consume services across a mesh that is both multi-channel as well as provides Customer 360 across all these engagement points.While information management technology has grown at a blistering pace, the human ability to process and comprehend numerical data has not. Applications being developed in 2016 are beginning to adopt intelligent visualization approaches that are easy to use,highly interactive and enable the user to manipulate corporate & business data using their fingertips – much like an iPad app. Tools such as intelligent dashboards, scorecards, mashups etc are helping change a visualization paradigms that were based on histograms, pie charts and tons of numbers. Big Data improvements in data lineage, quality are greatly helping the visualization space.

The Final Word

Specifically, Data Lake architectural pattern provide the following benefits – 

The ability to store enormous amounts of data with a high degree of agility & low cost: The Schema On Read architecture makes it trivial to ingest any kind of raw data into Hadoop in a manner that preserves it’s structure.  Business analysts can then explore  this data and then defined a schema to suit the needs of their particular application.

The ability to run any kind of Analytics on the data: Hadoop supports multiple access methods (batch, real-time, streaming, in-memory, etc.) to a common data set.  You are only restricted by your use case.

the ability to analyze, process & archive data while dramatically cutting cost : Since Hadoop was designed to work on low-cost commodity servers which have direct attached storage – it helps dramatically lower the overall cost of storage.  Thus enterprises are able to retain source data for long periods, thus providing business applications with far greater historical context.

The ability to augment & optimize Data Warehouses: Data lakes & Hadoop technology are not a ‘rip & replace’ proposition. While they provide a much lower cost environment than data warehouses, they can also be used as the compute layer to augment these systems.  Data can be stored, extracted and transformed in Hadoop. Then a subset of the data i.e the results are loaded into the data warehouse. This enables the EDW to leverage compute cycles and storage to perform truly high value analytics.

The next post of the series will dive deeper into the architectural choices one needs to make while creating a high fidelity & business centric enterprise data lake.

References – 

[1] https://en.wikipedia.org/wiki/Data_lake

The Data Science Continuum in Financial Services..(3/3)

In God we trust. All others must bring data.” – DrEdwards Deming, statistician, professor, author, lecturer, and consultant.

The first post in this three part series described key ways in which innovative applications of data science are slowly changing a somewhat insular banking & financial services industry . The second post then delineated key business use cases enabled by a data driven or data native  approach. The final post will examine foundational Data Science tasks & techniques that are commonly employed to get value from data with financial industry examples. We will round off the discussion with recommendations for industry CXOs.

data-science

The Need for Data Science –

It is no surprise that Big Data approaches were first invented & then refined in web scale businesses at Google, Yahoo, eBay, Facebook and Amazon etc. These web properties offer highly user friendly, contextual & mobile native application platforms  which produce a large amount of complex and multi-varied data from consumers,sensors and other telemetry devices. All this data that is constantly analyzed to drive higher rates of application adoption thus driving a virtuous cycle. We have discussed the (now) augmented capability of financial organizations to acquire, store and process large volumes of data by leveraging the HDFS (Hadoop Distributed Filesystem) running on commodity (x86) hardware.

One of the chief reasons that these webscale shops adopted Big Data is the ability to store the entire data set in Hadoop to build more accurate predictive models. The ability store thousands of  attributes at a much finer grain over a historical amount of time instead of just depending on a statistically significant sample is a significant gain over legacy data technology.

Every year Moore’s Law keeps driving the costs of raw data storage down. At the same time, compute technologies such as MapReduce, Tez, Storm and Spark have enabled the organization and analysis of Big Data at scale. The convergence of cost effective storage and scalable processing allows us to extract richer insights from data. These insights need to then be operationalized@scale to provide business value as the use cases in the last post highlighted @ http://www.vamsitalkstech.com/?p=1582

The differences between Descriptive & Predictive Analytics –

Business intelligence (BI) is a traditional & well established analytical domain that essentially takes a retrospective look at business data in systems of record. The goal for BI is to primarily look for macro or aggregate business trends across different aspects or dimensions such as time, product lines, business unites & operating geographies.

BI is primarily concerned with “What happened and what trends exist in the business based on historical data?“. The typical use cases for BI include budgeting, business forecasts, reporting & key performance indicators (KPI).

On the other hand, Predictive Analytics (a subset of Data Science) augments & builds on the BI paradigm by adding a “What could happen” dimension to the data in terms of –

  • being able to probabilistically predict different business scenarios across thousands of variables
  • suggesting specific business actions based on the above outcomes

Predictive Analytics does not intend to nor will it replace the BI domain but only adds significant business capabilities that lead to overall business success. It is not uncommon to find real world business projects leveraging both these analytical approaches.

Data Science  –

So, what exactly is Data Science ?

Data Science is an umbrella concept that refers to the process of extracting business patterns from large volumes of both structured, semi structured and unstructured data. Data Science is the key ingredient in enabling a predictive approach to the business.

Some of the key aspects that follow are  –

  1. Data Science is not just about applying analytics to massive volumes of data. It is also about exploring the patterns,associations & interrelationships of thousands of variables within the data. It does so by adopting an algorithmic approach to gleaning the business insights that are embedded in the data.
  2. Data Science is a standalone discipline that has spawned its own set of platforms, tools and processes across it’s lifecycle.
  3. Data science also aids in the construction of software applications & platforms to utilize such insights in a business context.  This involves the art of discovering data insights combined with the science of operationalizing them at scale. The word ‘scale’ is key. Any algorithm, model or deployment paradigm should support an expanding number of users without the need for unreasonable manual intervention as time goes on.
  4. A data scientist uses a combination of machine learning, statistics, visualization, and computer science to extract valuable business insights hiding in data and builds operational systems to deliver that value.
  5. The machine learning components are classified into two categories: ‘supervised’ and ‘unsupervised’ learning. In supervised learning, the constructed model defines the effect one set the inputs on the outputs through the causal chain.In unsupervised learning, the ouputs are affected by so called latent variables. It is also possible to have a hybrid approach to certain types of mining tasks.
  6. Strategic business projects typically begin leveraging a Data Science based approach to derive business value. This approach then becomes integral and eventually core to the design and architecture of such a business system.
  7. Contrary to what some of the above may imply, Data Science is a cross-functional discipline and not just the domain of Phd’s. A data scientist is part statistician, part developer and part business strategist.
  8. Working in small self sufficient teams, the Data Scientist collaborates with extended areas which includes visualization specialists, developers, business analysts, data engineers, applied scientists, architects, LOB owners and DevOp. The success of data science projects often relies on the communication, collaboration, and interaction that takes place with the extended team, both internally and possibly externally to their organization.
  9. It needs to be clarified that not every business project is a fit for a Data science approach. The criteria that must be employed to understand if such an advanced approach is called for include if the business initiative needs to provide knowledge based decisions (beyond the classical rules engine/ expert systems based approaches), deal with volumes of relevant data, a rapidly changing business climate, & finally where scale is required beyond what can be supplied using human analysts.
  10. Indeed any project where hugely improved access to information & realtime analytics for customers, analysts (and other stakeholders) is a must for the business – is fertile ground for Data Science.

Algorithms & Models

The word ‘model‘ is highly overloaded and means different things to different IT specialities e.g. RDBMS models imply data schemas, statistical models are built by statisticians etc. However, it can safely be said that models are representations of a business construct or a business situation.

Data mining algorithms are used to create models from data.

To create a data science model, the data mining algorithm looks for key patterns in data provided. The results of this analysis are to define the best parameters to create the model. Once identified, these parameters are applied across the entire data set to extract actionable patterns and detailed statistics.

The model itself can take various forms ranging from a set of customers across clusters, a revenue forecasting model, a set of fraud detection rules for credit cards or a decision tree that predicts outcomes based on specific criteria.

Common Data Mining Tasks –

There are many different kinds of data mining algorithms but all of these address a few fundamental types of tasks. The pouplar ones are listed below along with relevant examples:

  • Classification & Class Probability Estimation– For a given set of data, predict for each individual in a population, a discrete set of classes that this individual belongs to. An example classification is – “For all wealth management clients in a given population, who are most likely to respond to an offer to move to a higher segment”. Common techniques used in classification include decision trees, bayesian models, k-nearest neighbors, induction rules etc. Class Probability Estimation (CPE) is a closely related concept in which a scoring model is created to predict the likelihood that an individual would belong to that class.
  • Clustering is an unsupervised technique used to find classes or segments of populations within a larger dataset without being driven by any specific purpose. For example – “What are the natural groups our customers fall into?”. The most popular use of clustering techniques is to identify clusters to use in activities like market segmentation.A common algorithm used here is k-means clustering.
  • Market basket analysis  is commonly used to find out associations between entities based on transactions that involve them. E.g Recommendation engines which use affinity grouping.
  • Regression algorithms aim to characterize the normal or typical behavior of an individual or group within a larger population. It is frequently used in anomaly detection systems such as those that detect AML (Anti Money Laundering) and Credit Card fraud.
  • Profiling algorithms divide data into groups, or clusters, of items that have similar properties.
  • Causal Modeling algorithms attempt to find out what business events influence others.

There is no reason that one should be limited to one of the above techniques while forming a solution. An example is to use one algorithm (say clustering) to determine the natural groups in the data, and then to apply regression to predict a specific outcome based on that data. Another example is to use multiple algorithms within a single business project to perform related but separate tasks. Ex – Using regression to create financial reporting forecasts, and then using a neural network algorithm to perform a deep analysis of the factors that influence product adoption.

The Data Science Process  –

A general process framework for a typical Data science project is depicted below. The process flow depicted below suggests a sequential waterfall but allows for Agile/DevOps loops in the core analysis & feedback phases. The process is also not a virtual one sided pipeline but also allows for continuous improvements.

DS_Process

                                           Illustration: The Data Science Process 

  1. The central pool of data that hosts all the tiers of data processing in the above illustration is called the Data Lake. The Data Lake enables two key advantages – the ability to collect cross business unit data so that it can be sampled/explored at will & the ability to perform any kind of data access pattern across a shared data infrastructure: batch,  interactive, search, in-memory and custom etc.
  2. The Data science process begins with a clear and contextual understanding of the granular business questions that need to be answered from the real world dataset. The Data scientist needs to be trained in the nuances of the business to achieve the appropriate outcome. E.g. Detecting customer churn, predicting fraudulent credit card transactions in the credit cards space, predicting which customers in the Retail Bank are likely to churn over the next few months based on their usage patterns etc.
  3. Once this is known, relevant data needs to be collected from the real world. These sources in Banking range from –
    1. Customer Account data e.g. Names,Demographics, Linked Accounts etc
    2. Transaction Data which captures the low level details of every transaction (e.g debit, credit, transfer, credit card usage etc),
    3. Wire & Payment Data,
    4. Trade & Position Data,
    5. General Ledger Data and Data from other systems supporting core banking functions.
    6. Unstructured data. E.g social media feeds, server logs, clickstream data & mobile application data etc.
  4. Following the planning stage, Data Acquisition follows an  iterative process of acquiring data from the actual sources by creating appropriate loaders choosing appropriate technology components. E.g. Apache NiFi, Kafka, Sqoop, Flume, HDFS API, Java etc
  5. The next step is to perform Data Cleansing. Here the goal is to look for gaps in the data  (given the business context), ensuring that the dataset is valid with no missing values, consistent in layout and as fresh as possible from a temporal standpoint. This phase also involves fixing any obvious quality problems involving range or table driven data. The purpose at this stage is also to facilitate & perform appropriate data governance.
  6. Exploratory Data Analysis (EDA) helps with trial & error analysis of data. This is a phase where plots and graphs are used to systematically go through the data. The importance of this cannot be overstated as it provide the Data scientist and the business with a flavor of the data.
  7. Data Analysis: Generation of features or attributes that will be part of the model. This is the step of the process where actual data mining takes place leveraging models built using the above algorithms.

Within each of the above there exist further iterative steps within the Data Cleansing and Data Analysis stages.

Once the models have been tested and refined to the satisfaction of the business and their performance been put through a rigorous performance test phase, they are deployed into production. Once deployed, these are constantly refined based on end user and system feedback.

The Big Data ecosystem (consisting of tools such as Pig, Scalding, Hive, Spark and MapReduce etc) enable sea changes of improvement across the entire Data science lifecycle from data acquisition to data processing to data analysis. The ability of Big Data/Hadoop to unify all data storage in one place which renders data more accessible for modeling. Hadoop also scales up machine learning analysis due to it’s inbuilt paralleism which adds a tremendous amount of value both in terms of training multiple parallel models to improve their efficacy. The ability to collect a lot of data as opposed to small samples also helps greatly.

Recommendations – 

Developing a strategic mindset to Data science and predictive analytics should be a board level concern. This entails

  • To begin with – ensuring buy in & commitment in the form of funding at a Senior Management level. This support needs to extend across the entire lifecycle depicted above (from identifying business use cases).
  • Extensive but realistic ROI (Return On Investment) models built during due diligence with periodic updates for executive stakeholders
  • On a similar note, ensuring buy in using a strategy of co-opting & alignment with Quants and different high potential areas of the business (as covered in the usecases in the last blog)
  • Identifying leaders within the organization who can not only lead important projects but also create compelling content to evangelize the use of predictive analytics
  • Begin to tactically bake in or embed data science capabilities across different lines of business and horizontal IT
  • Slowly moving adoption to the Risk, Fraud, Cybersecurity and Compliance teams as part of the second wave. This is critical in ensuring that analysts across these areas move from a spreadsheet intensive model to adopting advanced statistical techniques
  • Creating a Predictive Analytics COE (Center of Excellence) that enable cross pollination of ideas across the fields of statistical modeling, data mining, text analytics, and Big Data technology
  • Informing the regulatory authorities of one’s intentions to leverage data science across the spectrum of operations
  • Ensuring that issues related to data privacy,audit & compliance have been given a great deal of forethought
  • Identifying  & developing human skills in toolsets (across open source and closed source) that facilitate adapting to data lake based architectures. A large part of this is to organically grow the talent pool by instituting a college recruitment process

While this ends the current series on Data Science in financial services, it is my intention to explore each of the above Data Mining techniques to a greater degree of depth as applied to specific business situations in 2016 & beyond. This being said, we will take a look at another pressing business & strategic concern – Cybersecurity in Banking – in the next series.