A few weeks ago on the invitation of DZone Magazine, I jointly authored a Big Data Reference Architecture along with my friend & collaborator, Tim Spann (https://www.linkedin.com/in/timothyspann/). Tim & I distilled our experience working on IIoT projects to propose an industrial strength digital architecture. It brings together several technology themes – Big Data , Cyber Security, Cognitive Applications, Business Process Management and Data Science. Our goal is to discuss a best in class architecture that enables flexible deployment for new IIoT capabilities allowing enterprises to build digital applications. The abridged article was featured in the new DZone Guide to Big Data: Data Science & Advanced Analytics which can be downloaded at https://dzone.com/guides/big-data-data-science-and-advanced-analytics
How the Internet Of Things (IoT) leads to the Digital Mesh..
The Internet of Things (IoT) has become one of the four top hyped up technology paradigms affecting the world of business. The other usual suspects being Big Data, AI/Machine Learning & Blockchain. Cisco predicts that the IOT is expected to impact about 25 billion connected things by 2020 and affect about $2 trillion of economic value globally across a diverse range of verticals. These devices are not just consumer oriented devices such as smartphones and home monitoring systems but dedicated industry objects such as sensors, actuators, engines etc.
The interesting angle to all this is the fact that autonomous devices are already beginning to communicate with one another using IP based protocols. They largely exchanging state & control information around various variables. With the growth of computational power on these devices, we are not far off from their sending over more granular and interesting streaming data – about their environment, performance and business operations – all of which will enable a higher degree of insightful analytics to be performed on the data. Gartner Research has termed this interconnected world where decision making & manufacturing optimization can occur via IoT as the “Digital Mesh“.
The evolution of technological innovation in areas such as Big Data, Predictive Analytics and Cloud Computing now enables the integration and analysis of massive amounts of device data at scale while performing a range of analytics and business process workflows on the data.
According to Gartner, the Digital Mesh will thus lead to an interconnected data information deluge powered by the continuous data from these streams. These streams will encompasses classical IoT endpoints (sensors, field devices, actuators etc) sending data in a variety of formats – text, audio, video & social data streams – along with new endpoints in areas as diverse as Industrial Automation, Remote Healthcare, Public Transportation, Connected Cars, Home Automation etc. These intelligent devices will increasingly begin communicating with their environments in a manner that will encourage collaboration in a range of business scenarios. The industrial cousin of IoT is the Industrial Internet of Things (IIIoT).
Defining the Industrial Internet Of Things (IIoT)
The Industrial Internet of Things (IIoT) can be defined as a ecosystem of capabilities that interconnects machines, personnel and processes to optimize the industrial lifecycle. The foundational technologies that IIoT leverages are Smart Assets, Big Data, Realtime Analytics, Enterprise Automation and Cloud based services.
The primary industries impacted the most by the IIoT will include Industrial Manufacturing, the Utility industry, Energy, Automotive, Transportation, Telecom & Insurance.
According to Markets and Markets, the annual worldwide Industrial IoT market is projected to exceed $319 billion in 2020, which represents an 8% a compound annual growth rate (CAGR). The top four segments are projected to be manufacturing, energy and utilities, auto & transportation and healthcare.
Architectural Challenges for Industrial IoT versus Consumer IoT..
Consumer based IoT applications generally receive the lion’s share of media attention. However the ability of industrial devices (such as sensors) to send ever more richer data about their operating environment and performance characteristics is driving a move to Digitization and Automation across a range of industrial manufacturing.
Thus, there are four distinct challenges that we need to account for in an Industrial IOT scenario as compared to Consumer IoT.
- The IIoT needs Robust Architectures that are able to handle millions of device telemetry messages per second. The architecture needs to take into account that all kinds of devices operating in environments ranging from the constrained to
- IIoT also calls for the highest degrees of Infrastructure and Application reliability across the stack. For instance, a lost message or dropped messages in a healthcare or a connected car scenario may mean life or death for a patient, or, an accident.
- An ability to integrate seamlessly with existing Information Systems. Lets be clear, these new age IIOT architectures need to augment existing systems such as Manufacturing Execution Systems (MES) or Traffic Management Systems. In Manufacturing, MES systems continually improve the product lifecycle and perform better resource scheduling and utilization. This integration helps these systems leverage the digital intelligence and insights across (potentially) millions of devices across complex areas of operation.
- An ability to incorporate richer kinds of analytics than has been possible before that provide a great degree of context. This ability to reason around context is what provides an ability to design new business models which cannot be currently imagined due to lack of agility in the data and analytics space.
What will IIoT based Digital Applications look like..
Digital Applications are being designed for specific device endpoints across industries. While the underlying mechanisms and business models differ from industry to industry, all of these use predictive analytics based on a combination of real time data processing & data science algorithms. These techniques extract insights from streaming data to provide digital services on existing toolchains, provide value added customer service, predict device performance & failures, improve operational metrics etc.
Examples abound. For instance, a great example in manufacturing is the notion of a Digital Twin which Gartner called out last year. A Digital twin is a software personification of an Intelligent device or system. It forms a bridge between the real world and the digital world. In the manufacturing industry, digital twins can be setup to function as proxies of Things like sensors and gauges, coordinate measuring machines, vision systems, and white light scanning. This data is sent over a cloud based system where it is combined with historical data to better maintain the physical system.
The wealth of data being gathered on the shop floor will ensure that Digital twins will be used to reduce costs and increase innovation. Thus, in global manufacturing – Data science will soon make it’s way into the shop floor to enable the collection of insights from these software proxies. We covered the phenomenon of Servitization in manufacturing in a previous blogpost.
In the Retail industry, an ability to detect a customer’s location in realtime and combining that information with their historical buying patterns can drive real time promotions and an ability to dynamically price retail goods.
Solution Requirements for an IIoT Architecture..
At a high level, the IIoT reference architecture should support six broad solution areas-
- Device Discovery – Discovering a range of devices (and their details) on the Digital Mesh for an organization within and outside the firewall perimeter
- Performing Remote Lifecycle Configuration of these devices ranging from startup to modification to monitoring to shut down
- Performing Deep Security level introspection to ensure the patch levels etc are adequate
- Creating Business workflows on the Digital Mesh. We will do this by marrying these devices to enterprise information systems (EISs)
- Performing Business oriented Predictive Analytics on these devices, this is critical to
- On a futuristic basis, support optional integration with the Blockchain to support a distributed organizational ledger that can coordinate activity across all global areas that an enterprise operates in.
Building Blocks of the Architecture
Listed below are the foundational blocks of our reference architecture. Though the requirements will vary across industries, an organization can reasonably standardize on a number of foundational components as depicted below and then incrementally augment them as the interactions between different components increase based on business requirements.
Our reference architecture includes the following major building blocks –
- Device Layer
- Device Integration Layer
- Data & Middleware Tier
- Digital Application Layer
It also includes the following cross cutting concerns which span across the above layers –
- Device and Data Security
- Business Process Management
- Service Management
- UX Design
- Data Governance – Provenance, Auditing, Logging
The next section provides a brief overview of the reference architecture’s components at a logical level.
Device Layer –
The first requirement of IIIoT implementations is to support connectivity from the Things themselves or the Device layer depicted at the bottom. The Device layer includes a whole range of sensors, actuators, smartphones, gateways and industrial equipment etc. The ability to connect with devices and edge devices like routers, smart gateways using a variety of protocols is key. These network protocols include Ethernet, WiFi, and Cellular which can all directly connect to the internet. Other protocols that need a gateway device to connect include Bluetooth, RFID, NFC, Zigbee et al. Devices can connect directly with the data ingest layer shown above but it is preferred that they connect via a gateway which can perform a range of edge processing.
This is important from a business standpoint for instance, in certain verticals like healthcare and financial services, there exist stringent regulations that govern when certain identifying data elements (e.g. video feeds) can leave the premises of a hospital or bank etc. A gateway cannot just perform intelligent edge processing but also can connect thousands of device endpoints and facilitate bidirectional communication with the core IIoT architecture.
The ideal tool for these constantly evolving devices, metadata, protocols, data formats and types is Apache NiFi. These agents will send the data to an Apache NiFi gateway or directly into an enterprise Apache NiFi cluster in the cloud or on-premise.
A subproject of Apache NiFi – MiNiFi provides a complementary data collection approach that supplements the core tenets of NiFi in dataflow management. However due to its small footprint and low resource consumption, is well suited to handle dataflow from sensors and other IOT devices. It provides central management of agents while providing full chain of custody information on the flows themselves.
For remote locations, more powerful devices like the Arrow BeagleBone Black Industrial and MyPi Industrial, it is very simple to run a tiny Java or C++ MiNiFi agent for secure connectivity needs.
The data sent by the device endpoints are then modeled into an appropriate domain representation based on the actual content of the messages. The data sent over also includes metadata around the message. A canonical model can optionally be developed (based on the actual business domain) which can support a variety of applications from a business intelligence standpoint.
Apache NiFi supports the flexibility of ingesting changing file formats, sizes, data types and schemas. The devices themselves can send a range of feeds in different formats. E.g. XML now and based on upgraded capabilities – richer JSON tomorrow. NiFi supports ingesting any file type that the devices or the gateways may send. Once the messages are received by Apache NiFi, they are enveloped in security with every touch to each flow file controlled, secured and audited. NiFi flows also provide full data provenance for each file, packet or chunk of data sent through the system. NiFi can work with specific schemas if there are special requirements for file types, but it can also work with unstructured or semi structured data just as well. From a scalability standpoint, NiFi can ingest 50,000 streams concurrently on a zero-master shared nothing cluster that horizontally scales via easy administration with Apache Ambari.
Data and Middleware Layer –
The IIIoT Architecture recommends a Big Data platform with native message oriented middleware (MOM) capabilities to ingest device mesh data. This layer will also process device data in such a fashion – batch or real-time – as the business needs demand.
Application protocols such as AMQP, MQTT, CoAP, WebSockets etc are all deployed by many device gateways to communicate application specific messages. The reason for recommending a Big Data/NoSQL dominated data architecture for IIOT is quite simple. These systems provide Schema on Read which is an innovative data handling technique. In this model, a format or schema is applied to data as it is accessed from a storage location as opposed to doing the same while it is ingested. From an IIOT standpoint, one must not just deal with the data itself but also metadata such as timestamps, device id, other firmware data such as software version, device manufactured data etc. The data sent from the device layer will consist of time series data and individual measurements.
The IIoT data stream can thus be visualized as a constantly running data pump which is handled by a Big Data pipeline takes the raw telemetry data from the gateways, decides which ones are of interest and discards the ones not deemed significant from a business standpoint. Apache NiFi is your gateway and gate keeper. It ingests the raw data, manages the flow of thousands of producers and consumers, does basic data enrichment, sentiment analysis in stream, aggregation, splitting, schema translation, format conversion and other initial steps to prepare the data. It does that all with a user-friendly web UI and easily extendible architecture. It will then send raw or processed data to Kafka for further processing by Apache Storm, Apache Spark or other consumers. Apache Storm is a distributed real-time computation engine that reliably processes unbounded streams of data. Storm excels at handling complex streams of data that require windowing and other complex event processing. While Storm processes stream data at scale, Apache Kafka distributes messages at scale. Kafka is a distributed pub-sub real-time messaging system that provides strong durability and fault tolerance guarantees. NiFi, Storm and Kafka naturally complement each other, and their powerful cooperation enables real-time streaming analytics for fast-moving big data. All the stream processing is handled by NiFi-Storm-Kafka combination.
Appropriate logic is built into the higher layers to support device identification, ID lookup, secure authentication and transformation of the data. This layer will process data (cleanse, transform, apply a canonical representation) to support Business Automation (BPM), BI (business intelligence) and visualization for a variety of consumers. The data ingest layer will also providing notification and alerts via Apache NiFi.
Here are some typical uses for this event processing pipeline:
a. Real-time data filtering and pattern matching
b. Enrichment based on business context
c. Real-time analytics such as KPIs, complex event processing etc
d. Predictive Analytics
e. Business workflow with decision nodes and human task nodes
Digital Application Tier –
Once IIoT knowledge has become part of the Hadoop based Data Lake, all the rich analytics, machine learning and deep learning frameworks, tools and libraries now become available to Data Scientists and Analysts. They can easily produce insights, dashboards, reports and real-time analytics with IIoT data joined with existing data in the lake including social media data, EDW data, log data. All your data can be queried with familiar SQL through a variety of interfaces such as Apache Phoenix on HBase, Apache Hive LLAP and Apache Spark SQL. Using your existing BI tools or the open sourced Apache Zeppelin, you can produce and share live reports. You can run TensorFlow in containers on YARN for deep learning insights on your images, videos and text data; while running YARN clustered Spark ML pipelines fed by Kafka and NiFi to run streaming machine learning algorithms on trained models.
A range of predictive applications are suitable for this tier. The models themselves should seek to answer business questions around things like -Asset failure, the key performance indicators in a manufacturing process and how they’re trending, insurance policy pricing etc.
Once the device data has been ingested into a modern data lake, key functions that need to be performed include data aggregation, transformation, enriching, filtering, sorting etc.
As one can see, this can get very complex very quick – both from a data storage and processing standpoint. A Cloud based infrastructure with its ability to provide highly scalable compute, network and storage resources is a natural fit to handle bursty IIoT applications. However, IIoT applications add their own diverse requirements of computing infrastructure, namely the ability to accommodate hundreds of kinds of devices and network gateways – which means that IT must be prepared to support a large diversity of operating systems and storage types
The tier is also responsible for the integration of the IIoT environment into the business processes of an enterprise. The IIoT solution ties into existing line-of-business applications and standard software solutions through adapters or Enterprise Application Integration (EAI) and business-to-business (B2B) gateway capabilities. End users in business-to-business or business-to-consumer scenarios will interact with the IIOT solution and the special- purpose IIoT devices through this layer. They may use the IIoT solution or line-of-business system UIs, including apps on personal mobile devices, such as smartphones and tablets.
The topic of Security is perhaps the most important cross cutting concern across all layers of the IIoT architecture stack. Needless to say, each of the layers must support the strongest data encryption, authentication and authentication capabilities for devices, users and partner applications. Accordingly, capabilities must be provided to ingest and store security feeds, IDS logs for advanced behavioral analytics, server logs, device telemetry. These feeds must be constantly analyzed across three domains – the Device domain, the Business domain and the IT domain. The below blogpost delves into some of these themes and is a good read to get a deeper handle on this issue from a SOC (security operations center) standpoint.
It is evident from the above that IIoT will enormous opportunity for businesses globally. It will also create layers of complexity and opportunity for Enterprise IT. The creation of smart digital services on the data served up will further depend on the vertical industries. Whatever be the kind of business model – whether tracking behavior, location sensitive pricing, business process automation etc – the end goal of IT architecture should be to create enterprise business applications that are ultimately data native and analytics driven.