The Operational Data Platform: How the future of OT Data will look like - Part 2 on Industrial Data Platforms
The Operational Data Platform is how we envision the future of working with OT data: a cloud native, contextualized data platform bridging the gap between IT and OT.
The Operational Data Platform is how we envision the future of working with OT data: a cloud native, contextualized data platform bridging the gap between IT and OT. Welcome to this Part 2 of this series on IT and OT Data. In Part 1 we introduced you into Data Lakes and Warehouses. We also took some time to review the OT view on time series data. To conclude, we talked about the importance of different forms of context. Discover all others parts in this series: Part 1 (The IT and OT view on Data), Part 2 (Introducing the Operational Data Platform), Part 3 (The need for Better Data), Part 4 (Breaking the OT Data Barrier: It's the Platform), Part 5 (The Unified Namespace)
In Part 2, we will be discussing the concept of an Operational Data Platform.
As you will remember, a Data Lake stores raw data from different sources whereas a Data Warehouse holds processed/structured data. In the OT world, it’s all about storing raw sensor data into a so-called Historian which typically runs on a local server.
Let’s go back to Part 1’s cookie factory - Sweet Harmony Treats.
The biscuits were baked and the aroma was to die for… John, the process engineer, has been tasked to optimize the backing process. Emma, the maintenance engineer, might be using other data points to detect faulty settings, or maybe even predict upcoming equipment failures.
How? Emma and John are most likely using the Historian to access their sensor data. They will probably use a spreadsheet to tweak and knead the data until the result is good enough. For example, Emma could extract from her logbooks the times when her technicians performed a maintenance action. She could now try to look up those times in the historian. John on the other hand might be using data from the plant’s MES system to find the start and end times of the last production batches.
Although both of them are trying to do the best possible job, manually linking data to events is not only time consuming, it’s error prone and impossible to scale.
They are missing out on something very important: context.
Suppose John is asked to answer which type of cookie has the highest energy consumption (and is thus the most expensive to make). Or maybe he wants to compare cookies baked during winter time with those baked on very hot days in the summer. Both examples require John to have different types of context:
- The Batch and/or Order number of the product you are now producing,
- The cookie type / product name which gets baked,
- The identifiers of the raw materials used in this batch,
- Quality data entered by a lab technician performed at cookies from this batch,
- Maintenance reports submitted by Emma’s team, …This type of data is typically available in a Manufacturing Operations Management system, which is a relational database. The big challenge for John and Emma is to find a way to combine both data sources to each other in order to answer questions like:
- What was the average temperature during batch 2023.45A.11?
- How much energy was consumed to produce brownies during the last month?
- Which shift team has the lowest amount of rework?
In The Cloud? Or not?
The number of cases where Cloud native Data Platforms are used for OT data is still limited today. The primary operations use cases are typically not the first to move here, but rather the use cases that
involve less massive time series data (subsets or aggregations)
can benefit from additional data integration / contextualization
can benefit from the open ecosystem that is available around these cloud systems (eg. machine learning).
There are many reasons why adoption is slow:
(1) Added Value
When talking about The Cloud, vendors typically start their reasoning with the “unburden” argument: “We will take care of everything”. While this might be true, end-users are not looking for this. They want added value: “What can I do now, which I couldn’t do before with my on-prem system?”
Don’t forget: you might have 2 data scientists in your company eager to start using a cloud native OT data store, but you still have 99% other users as well. What’s in it for them?
(2) IT/OT Convergence (or the lack thereof)
We need to name the elephant in the room. Cloud is typically an “IT thing”, historians are typically “OT Territory”. Both worlds need to come together. We have written this article about 8 different cooperation models you might want to review.
(3) Technological
Although exceptions exist, most IT cloud systems are still based on either relational data models or require interpolated/equidistant data sets (the purple dotted lines in the above graph). Cloud native Time Series stores do exist, but are typically stand alone solutions. That might solve some of your questions, but in essence it is now only holding a copy of your on-prem data in the Cloud. There is not so much added value.
(4) Other: Cost/Security/Other…
A while ago, we published the article “OT in the cloud”. Take a look for other considerations.
Where is The Context?
Similar to a Data Warehouse, we need to introduce the power of context into our OT environment. It will help us to make sense of time series data. In the above infographic, we can now easily see which product we were producing, in which batches and also in which stage.
For example: having the context of our cookie factory, we would be able to compare the energy consumption in every “heat” step for the last days/weeks/years. It would just be a one click action.
So, what types of Contexts would be interesting for an Industrial environment?
Asset Context gives us insight into the physical assets in our plant, it is a rudimentary Digital Twin. This data can often be found in Engineering or Master Data systems.
Example from the cookie factory: a temperature sensor has a specific location in a particular section of the oven, which might be part of a series of ovens in one of the plants.Production Context allows us to link the data to the actual manufacturing process which took place. This data can often be found in a Manufacturing Execution System (MES). It helps us to identify the product which was made, the order/batch it belongs to, the materials used, the operator/shift/team who worked on it and much more.
Example: as we now have the start and end time of a batch, it becomes possible to calculate the average temperature and total gas consumption of 1 specific batch.Maintenance Context can give direct insights in the OEE of our equipment. Understanding the relationship between planned and unplanned maintenance to certain process conditions can again be the starting point of a data exploration project. We do need to note that today lots of technicians still use pen and paper. Accurate data, correct timestamps, linked assets and the elimination of free text are essential prerequisites.
Other dimensions are possible too!
We might have a Financial Context (eg: direct input on the prices of energy, raw materials and so on), a Quality Context (eg: what is the correlation between product quality and process parameters) and many others.
Let’s put everything together into one diagram:
Crystal ball: How the future of an Operational Data Platform might look like
The diagram above depicts a possible coexistence between existing Enterprise Data Platforms (Lake/Warehouse/Factories) and Operational Data Platforms. In the long run, we expect a full convergence but let’s take it step by step:
Now
Most systems holding data are still on an island somewhere. Furthermore, OT systems are typically on premise (local) systems whereas IT systems are almost always cloud based. Many companies are trying to introduce more powerful OT systems/historians capable of creating part of this context. However, we don’t see real “platforms” yet.
When companies use their existing Data Lake for OT use cases, we typically see that only a small part of the data gets duplicated (e.g. for just a handful of use cases). As many existing IT systems are not capable of dealing with time series data, interpolating data is often necessary (the purple dotted line in the first infographic). This implies that you already have made several assumptions on how the data will be used later on.
What is even more troublesome, is that current IT systems cannot deal with typical OT-style contextual information such as Asset Context, Meta Data (such as limits or units) and definitely not the more advanced contexts as introduced in the previous section.
Mid Term (1-5 year)
The Operation Data Platform has to be Cloud native (private and public). Not because there is no future for on-premise software, but because it is the only way to exponentially scale future use cases (easily share reports over several locations, use open-source tooling for data science, share data with 3th parties etc…).
Several vendors are today offering solutions which somewhat go into the direction of an “Operational Data Platform”. Today, those platforms are still in a premature phase. They typically focus on a limited number of use cases. Introducing context is often missing or available in a limited way. However, we expect that this will make or break future solutions.
Integration between IT and OT platforms (e.g. access ERP data) will be key in getting the platform accepted by all communities. However, fully integrating the IT infrastructure with OT data sources is still a step too far. Providing the choice between public and private cloud will be a critical acceptance factor for the more security-sensitive industries.
Future (+5 years)
IT and OT technologies and cultures will merge. We don’t know when, but at a certain point in time the differences between IT and OT will really go away. Still sci-fi today, but who knows what the future will bring ;)
Outlook
A fully operational ODP is not available yet, but several vendors and even individual end-users are taking steps into this direction. First use cases can already be built and the best way for you to figure out how the availability of an OTP will help you is by just diving into it! How far you will go will depend on your use cases.
However, once you start, you will find yourself going down the rabbit hole very fast. This is because:
Dealing with data at scale implies dealing with a lot of different people, different use cases and conflicting requirements. This makes Data Governance (including Master Data Management) an absolute must.
Storing data is easy, but once you start using it you will without doubt run into Quality and Observability issues: stale data, noise, gaps, wrong units of measurement and many others. How will you make sure that reports which get generated based on this data are trustworthy?
The balance between duplicating data (e.g. copying it from a source system towards a data platform) versus pointing directly to the source system is still an issue. Duplicating data always creates issues: for example when late data comes in, how do you synchronize it? On the other hand, source systems might be slow or you don’t want people to access them directly because of security reasons.
Solving these problems is possible too, and will be covered in Part 3 of this series on Data. Stay tuned and make sure to subscribe!
Discover all parts: Part 1 (The IT and OT view on Data), Part 2 (Introducing the Operational Data Platform), Part 3 (The need for Better Data), Part 4 (Breaking the OT Data Barrier: It's the Platform), Part 5 (The Unified Namespace)
Can I use your content?
Yes! All our work on this blog is licensed under CC BY-SA 4.0. That means that you can share and adapt, but you must give appropriate credit.
Simplified Diagram
Additional Sources
https://blog.lnsresearch.com/rub-a-dub-dub...its-all-about-the-data-hub
https://www.linkedin.com/feed/update/urn:li:activity:7079852202390917120
https://www.linkedin.com/feed/update/urn:li:activity:6975438989725990912
https://www.linkedin.com/feed/update/urn:li:activity:7026217534919954432
https://www.linkedin.com/feed/update/urn:li:activity:7115986033166438400/
https://www.linkedin.com/feed/update/urn:li:activity:7112064191766622209/
https://www.linkedin.com/feed/update/urn:li:activity:6978356795073327105/ (industry cloud platforms on the rise)
Read this very interesting article on UNS by United Manufacturing Hub here, there is quite some overlap ! https://learn.umh.app/lesson/chapter-1-the-foundations-of-unified-namespace-in-ot/