Industrial Data Platform Capability Map (v1)

Which capabilities do I need when building a state-of-the-art Industrial Data System? We have identified the 7 most important capabilities from Connectivity to Data Sharing and everything in between.

Nov 20, 2024

This article will help you identify the capabilities to build a modern industrial data system. As you’ll have with any capability map, it will most likely not be complete (feel free to leave your thoughts in the comment section!). You can use this list of capabilities to start your request for information (RFI) or request for proposal (RFP) process.

This article is a first release, it’s not perfect, it’s not complete, but it’s the best way to get the conversation started. Any thoughts? Make sure to comment here or contact us as we will release a second version in a couple of months.

Some initial things to note:

We deliberately do not put weights/importances on the different categories, you need to figure out which ones are more important to you than others.
For now, we have decided not to map these capabilities onto vendors as this would require in-depth knowledge from our side or we need to trust the vendors to map themselves (and they will obviously be able “to do it all” 😀). We will however list the names we know at the end of the article for you to review.
It’s not an article about the Unified Namespace, because that is just a part of a bigger discussion;

Now, we’ve been thinking about how to name this article for a while… We wanted to avoid terms such as ‘Historian’ (because out-dated), IoT Platform (because there is more to the OT world than IoT), or ‘DataOps something’ (because that term is too “IT/Data” focused and is less known by OT folks).

So to keep in line with our previous parts on Data Platforms, we’ll call this the “Industrial Data Platform Capability Map (v1)”. 🎉

Be sure to review the previous parts for essential background information if you are new to our blog: Part 1 (The IT and OT view on Data), Part 2 (Introducing the Operational Data Platform), Part 3 (The need for Better Data), Part 4 (Breaking the OT Data Barrier: It's the Platform) and Part 5 (The Unified Namespace).

An Industrial Data Platform with 7 main capabilities: Connectivity, Context and Data Management, Data Broker and Store, Data Quality, Analytics, Visualization and Data Sharing. Supported by Cyber Security, User Management and Life Cycle Manangement — This diagram depicts the Central Data Platform with its 7 main capabilities. Each capability can be an independent function/product and has its Deployment Model (DM) and Supporting Capabilities (S). Source: IT/OT Insider, feel free to reuse with reference to the author.

🎉 You can use and adapt our map!
To help with the adoption process of the Capability Map, we have now intentionally labelled it as CC BY-SA 4.0 which means that everyone can freely use, share and adapt it as long as you attribute us and you distribute your work again under the same license.

1 - Connectivity

We need a secure and scalable connectivity layer to integrate different data sources into the Industrial Data Platform. Especially in our OT world, it is important to take into account the need to connect to older assets using legacy protocols as well as newer assets equipped with the latest shiny bells and whistles. In some cases you might need to favor high-throughput, redundancy, the need to work offline and buffer data, or all combined.

Identify your Data sources, for example:

Local Time Series Data Sources: Data from Historians, SCADA systems and PLCs,
Cloud Data: Data from IIoT devices and other cloud-based sources,
MOM Data: MES, Quality System, Planning Systems, and many more
Engineering & GIS Data: Includes digital twins and GIS systems,

Identify your required Protocols, for example:

OPC (DA, HDA, UA),
MQTT (with or without Sparkplug B),
Machine/Sensor protocols (Profibus, Modbus, I/O Link, etc..)
Database Connections,
REST API’s (for the new and shiny stuff),
Parse and get text files (such as CSV, JSON and alike) from a certain end-point.

Identify your additional requirements, for example:

Real-time streaming and/or working event driven
Buffering / Backfilling capabilities
Redundancy

NEW! We just launched our ITOT.Academy. Learn the language and architecture of IT and OT to push past “just a POC” in our live online academy. We’ll explain everything there is to know about the Capability Map, UNS, Data Management and more!

Discover more and save your seat!

2 - Contextualization & Data Management

At this point, our (time series) data is still raw and unstructured. We want to make life for our users as easy as possible, that means giving them the right context they need without the hassle of finding data from different sources themselves. Here are some context examples:

Asset Context gives us insight into the physical assets in our plant, it is a rudimentary Digital Twin. This data can often be found in Engineering or Master Data systems.
Production Context allows us to link the data to the actual manufacturing process which took place. This data can often be found in a Manufacturing Execution System (MES). It helps us to identify the product which was made, the order/batch it belongs to, the materials used, the operator/shift/team who worked on it and much more.
Maintenance Context can give direct insights in the OEE of our equipment. Understanding the relationship between planned and unplanned maintenance to certain process conditions can again be the starting point of a data exploration project.

Identify your data management requirements

Asset Context and Data Management is in general rather static. It doesn’t change too much in a typical manufacturing plant. However, in the IIoT world we do need ways to automatically add new devices to the asset model once they are online. In many cases, these assets can then announce themselves to your platform.

Modeling, management, viewing, versioning capabilities to build your ontology, based on standards, vendor provided templates, your own templates or ‘Build-your-own’ schema;
Ways to manually input or automatically ingest and process metadata which leads to the model;
Do you need a very simple straightforward model, or do you need a complex - object oriented style - of modeling?
Identify ways to query the model, e.g. using GraphQL

Identify Production / Maintenance and other related contexts

This is typically another way to ingest data, in this case, the information needs to be found in other data sources which are highly dynamic. For example: batch records will update continuously. We often refer to “slicing and dicing” functionality, that means that we want the ability to slice a set of measurements into parts.

Do you want to only link to these data sources? Or do you want to cache and/or store the relevant events in the data platform? This answer will typically depend on the performance of these data sources and the complexity of understanding where your ‘single source of truth’ can be found.
Do you need certain ETL (extract-transform-load) functionality to map the data from the source system into a format you can use to contextualize your data.

PS 1: More about Context in this previous IT/OT Insider article:

The Operational Data Platform: How the future of OT Data will look like - Part 2 on Industrial Data Platforms

David Ariens and Willem van Lammeren

October 24, 2023

Read full story

PS 2: This article on Rhize’s website describes the Ontology concept very well!

PS 3: We plan to dedicate a new article to this capability in the near future, you don’t want to miss this one, so make sure to subscribe 🙂!

3 - Data Quality

The new kid on the block is Data Quality, or more specifically Sensor Data Quality. Data issues, somewhere in the pipeline from sensor to final report, are difficult to detect and correct. You might just sum up data containing an outlier, resulting in totally wrong conclusions. Or a sensor might have been given a flatline reading for days, weeks or even months without your knowledge. Depending on your use case, some basic data quality checks might be sufficient, in other cases you might need more advanced functionality.

Identify your data quality requirements:

What data types & sources are the input
What are your data monitoring requirements (out of bounds, NaN, spikes, data gaps, drift, under sampling, over sampling, calibration issues, missing metadata, incorrect metadata, etc).
Do you have custom/specific data monitoring requirements?
How do you want to expose these observations to your users? Do you want to store quality information in your data platform as a new context layer, do you want to be able to integrate data quality metrics into your BI reports, etc…
Do you need a solution to clean and augment your data? Who will be responsible for cleaning data? Do you need a user interface? An API?
Do you want to automatically send cleaned data towards a “silver” data store?
Do you need to calculate data quality KPIs ?

PS: More on Data Quality in this previous article:

The Need for Better Data: Why Data Quality is Essential - Part 3 on Industrial Data Platforms

David Ariens and Thomas Dhollander

February 6, 2024

Read full story

4 - Data Broker & Store

We need a way to receive and store the resulting data sets (both the raw data, but also prepared and cleaned data sets) in a system which can handle sensor data in context at scale. But storing is just one side of the equation, getting it back is equally important which means that we need ways to subscribe to data and to query it at scale.

Identify Data Store Capabilities:

Time-Series Data Handling: The system must be designed for time-series data, supporting continuous streams from industrial sensors with low-latency access and storage capacity for vast datasets.
Event and Alarm Storage: Store events (typically in a relational database format) and potentially also alarms.
Publish-Subscribe (act as a Broker): A central capability of a broker is to handle data in real-time via a publish-subscribe model, enabling data producers (e.g., sensors) to transmit updates as they occur, while consumers (e.g., analytics systems) subscribe to relevant data streams (most popular: an MQTT Broker)
or you could still go for a more traditional polling model in which the platform polls data from defined sources (eg, reads data from an OPC server).
Data Lifecycle Management: Supports “hot-warm-cold” data management for efficient storage (especially useful when using Cloud storage):
- Hot: Immediate, short-term storage for real-time access.
- Warm: Semi-archival storage, optimized for recent but less frequently accessed data.
- Cold: Long-term storage of archival data, often moved to cheaper, high-capacity storage options like object storage.
Multi-Layered Storage (Bronze/Silver/Gold - Follows the Delta Lake principles):
- Bronze: Raw data, stored as-is from the source.
- (optional) Silver: Cleaned and validated data, processed for initial insights.
- (optional) Gold: Finalized, prepared data ready for analytics and reporting, eg to be used in a Power BI Dashboard.

Identify Querying and Data Retrieval Capabilities:

Subscription and Query Capabilities: Allows users and systems to subscribe to specific data streams or datasets, ensuring relevant data retrieval at scale.
API Accessibility: Data should be accessible through APIs, both simple (e.g., REST) for general use cases and advanced (e.g., GraphQL) for complex queries with specific data constraints.
Contextual Data Retrieval: Enables queries to access data within the operational context (e.g., time range, location, or specific production batch) to support more effective decision-making.

5 - Analytics

In many cases you want the possibility to run analytics directly on the data platform or even on the edge. Examples can include virtual tags (values which are calculated in the platform, but not measured), or running algorithms at the Edge to pre-process data before it is sent to the platform (e.g. create some statistics per second on high frequency data from a vibration sensor). This capability is not to be confused with Data Sharing in which we make the data available to data users and applications outside the platform.

Identify the need for Analytics on top of the Data Platform:

(Real-time and Batch) Analytics, Advanced Calculations
Edge computing capabilities

Optional: Identify the need for data preprocessing at the Edge:

The need to have certain analytics capabilities at the edge to preprocess events/data streams before they are consumed further in the stack, that might be for example to run machine learning models at the edge to preprocess video feeds into certain features, or to sample high frequency data into statistical features.

6 - Visualization

Detailing out this section will be done in a future release (subscribe to get informed!).

When building a Data Platform, you want to create value for all users in your organization. Most people will only be exposed to this capability. This is where your data, of good quality, in context should find its way to your users. Giving everyone in your organization - from operators to management - easy access will make or break your project.

Visualization, Dashboarding
Sharing and collaboration possibilities of these visuals and dashboards

7 - Data Sharing

Detailing out this section will be done in a future release (subscribe to get informed!).

The final capability is making sure the platform is open for the outside world. That might be individual users wanting to link their systems to the platform, it might be other applications.

Incl API’s, SDK’s, etc
Data Querying Capabilities (including full/raw, cleaned, prepared, contextualized etc…)

NEW! We just launched our ITOT.Academy. Learn all the details of this map in our live online academy, brought to you by David and Willem!

Discover more and save your seat!

⭐ Important remark regarding open systems

We do not state that all components can or should be based on one single vendor or one single product. Quite the contrary: we believe that for the different capabilities, different solutions could and should be used. Some vendors are really strong in one capability, while others might shine in other domains.
You want to look for interoperability, open protocols, good documentation, proven track records, and standardized implementations as much as possible (which results in interchangeable components instead of vendor lock-ins).

Additional Capabilities for Management & Orchestration

Detailing out this section will be done in a future release (subscribe to get informed!).

Can be different for each capability, but if you are mixing and matching software products, you need to make sure to have some kind of overarching management model as well !

Deployment model
- Edge/On-Prem first
- Edge/On-Prem first + Cloud capable = Hybrid
- Cloud first + Edge/On-Prem Capable = Hybrid
- Cloud first
Supporting capabilities
- Cyber Security
- User Management
- Life Cycle Management (Monitor, Deploy, Update)

Make sure to compare apples to apples

Short side step: when comparing prices, make sure to compare apples to apples. Here are some questions to ask:

What are the ongoing license costs? On what factors do they depend (number of users, number of data points, number of servers, amount of storage, amount of data consumed…)
Open source is never free, not because of the license cost, but because every product needs know-how and support. Are there support models you can buy? Are there partners who are knowledgeable about the product and can provide you a support model?
How many people do you need to run this thing? (internal and/or external)
What is the infrastructure cost? What about Cyber Security requirements?
Especially when you run workloads in the cloud: what are the expected costs for storage, compute and in some cases even data ingress and egress.
And probably many more ;)

Where is my UNS (Unified Namespace) in this capability map?

“Where’s the UNS?”, you might ask… Remember from our previous article on UNS that the Unified Namespace is a concept to bring data together in an organized and contextualized way in a central data broker as it is right now. Also remember that MQTT is just one of the potential protocols to use to achieve that.

If we map the capabilities in the previous paragraph to our drawing, we can immediately see that capability 1 (connectivity), 2 (context) and 4 (central data broker) are the capabilities linked directly to the core concept of a UNS. We deliberately changed the line to the storage part into a dotted line, as this is not required by the core concept.

We thus like to state the Unified Namespace - in the context of a data platform - can only be valuable, when it is part of a larger concept, but it doesn’t replace it.

PS: This video from Flow Software is a great additional resource you might want to review.

This diagram shows the most important capabilities linked to the Unified Namespace in an Industrial Data Platform. Source: IT/OT Insider, feel free to reuse with reference to the author.

List of potential vendors

As stated in the introduction: at this point in time, we do not want to map our capabilities against the functionalities offered by these vendors. It is just too time consuming. We welcome all input. If any of these vendors listed want to reach out to us and share their own map, please contact David directly.

Updates after publication:

HAI Cloud Data Platform

That’s it for now!

Make sure to subscribe to receive future releases and don’t forget to review our other content :)

The IT/OT Insider

The Operational Data Platform: How the future of OT Data will look like - Part 2 on Industrial Data Platforms

The Need for Better Data: Why Data Quality is Essential - Part 3 on Industrial Data Platforms

Discussion about this post