This is second in a series of blogs on Data Science that I am jointly authoring with Maleeha Qazi, (https://www.linkedin.com/in/maleehaqazi/). We have previously covered some of the inefficiencies that result from a siloed data science process @ http://www.vamsitalkstech.com/?p=5046. All of the actors in the data science space can agree that becoming responsive to business demands is the overarching goal of the process. In this second blog post, we will discuss Model as a Service (MaaS), an approach to ensuring that models and their insights can be leveraged throughout a large organization.
Hardware as a Service (HaaS), Software as a Service (SaaS), Database as a Service (DBaaS), Infrastructure as a Service (IaaS), Platform as a service (PaaS), Network as a Service (NaaS), Backend as a service (BaaS), Storage as a Service (STaaS). While every IT delivery model is going the way of the cloud, does Data Science lag behind in this movement? In such an environment, what do Data Scientists dream of to ensure that their models are constantly being trained on high quality and high volume production grade data?… Models as a Service (MaaS).
The Predictive Analytics workflow…
The Predictive Analytics workflow always starts with a business problem in mind. For example: “A marketing project to detect which customers are likely to buy new products or services in the next six months based on their historical & real time product usage patterns” or “Detect real-time fraud in credit card transactions.”
In use cases like these, the goal of the data science process is to be able to segment & filter customers by corralling them into categories that enable easy ranking. Once this is done, the business can setup easy and intuitive visualizations to present the results.
A lot of times, business groups have a hard time explaining what they would like to see – both in terms of input data and output format. In such cases, a prototype makes things easier from a requirement gathering standpoint. Once the problem is defined, the data scientist/modeler identifies the raw data sources (both internal and external) which are pertinent to the business challenge. They spend a lot of time in the process of collating the data (from a variety of sources like Oracle/SQL Server, DB2, Mainframes, Greenplum, Excel sheets, external datasets, etc.). The cleanup process involves dealing with missing values, corrupted data elements, formatting fields to be homogenous in terms of format, etc.
This data wrangling phase involves writing code to join various data elements so that a complete dataset is gathered in the Data Lake from a raw features standpoint, at the correct granularity for the problem at hand. If more data is obtained as the development cycle is underway, the Data Science team has to go back & redo the process to incorporate the new data feeds. The modeling phase is where sophisticated algorithms come into play. Feature engineering takes in business concepts & raw data features and creates predictive features from them. The Data Scientist takes the raw & engineered features and creates a model by applying various algorithms & testing to find the best one. Once the model has been refined, & tested for accuracy and performance, it is ideally deployed as a service.
Challenges with the existing approach
The challenges with the above approach are:
- Business Scalability – Predictive analytics as highlighted above resembles a typical line of business project or initiative. The benefits of the learning from localized application initiatives are largely lost to the larger organization if you don’t allow multiple applications and business initiatives to access the models built.
- Lack of Data Richness – The models created by individual teams are not always enriched by cross organizational data constantly being generated by different business applications. In addition to that, the vast majority of industrial applications do not leverage all possible kinds of unstructured data & 3rd party data in their business applications. Enabling the models to be exposed to a range of data (both internal and external) can only enrich the insights generated.
- Cross Application Applicability – This challenge deals with how business intelligence insights from disparate applications (which leverage different models), to enhance business areas they weren’t originally created for. This could allow for customer centered insights in real-time. For example, consider a customer sales application and a call center application. Can cross application insights be used to understand that customers are calling into the call center because it has been hard to use the website to order products?
- Data Monetization – What is critical in the ability to create new commercial business models is agile analytics around existing and new data sources. If it follows that enterprise businesses are being increasingly built around data assets, then it must naturally follow that data as a commodity can be traded or re-imagined to create revenue streams off of it. As an example, pioneering payment providers now offer retailers analytical services to help them understand which products perform best and how to improve the micro-targeting of customers. Thus, data is the critical prong of any digital initiative. This has led to efforts to monetize on data by creating platforms that support ecosystems of capabilities. To vastly oversimplify this discussion, the ability to monetize data needs two prongs – to centralize it in the first place and then to perform strong predictive modeling at large scale where systems need to constantly learn and optimize their interactions, responsiveness & services based on client needs & preferences. Thus, centralizing models offer more benefits than the typical enterprise can imagine.
Enter Model As A Service…
The MaaS takes in business variables (often hundreds or thousands of inputs) and provides as output model results upon which business decisions can be made, & visualizations that augment decision support systems. As depicted in the above illustration, once different predictive models are built, tested and validated, they are ready to be used in real world production deployments. MaaS is essentially a way of deploying these advanced models as a part of software applications where they are offered as a software subscription.
MaaS also enables cleaner separation of the Application development process and the Data Science workflow.
Business Benefits from a MaaS approach
- Exposing models to different lines of business thus increasing their usefulness and opening them up to feedback to help increase their accuracy.
- MaaS opens the models to any application that wants to take advantage of them. This forces Data scientists to work with business teams that are much broader than they otherwise would have access to work with normally.
- The provision of dashboards and business intelligence across the organization becomes much easier than with a siloed approach.
- MaaS as an approach fundamentally encourages an agile approach to managing data assets and also to rationalizing them. For any MaaS initiative to succeed, timely access needs to be provided to potentially hundreds of data sources in an organization. MaaS encourages a move to viewing data as a reusable asset across the organization.
Technical advantages of the MaaS approach
- Separation of concerns : software & data feeds maintained by IT, models maintained by Data Scientists.
- Versioning of models can be separated from versioning of system(s) using models.
- Same models can be utilized by multiple software packages for consistency.
- Consistent handling of data sources: e.g. which “master” source provides what types of data for all the models so that a customer looks the same regardless of the model acting on the data for insights.
- Single point for putting a “watch” on the performance of a model.
- Controlled usage of model.
- MaaS ensures that the analytic process can be automated from a deployment standpoint.
MaaS can enable organizations to move their analytic practices and capabilities to the next level. It enables the best of both worlds – the ability to centralize the data science capabilities across an organization while keeping customer data securely inside the organization. Done right, it can enable the democratization of data science insights across a large enterprise.