Monitoring complex applications requires blocks of data that can measure changes occurring in real time, known as time series data. Therefore, it is time series databases (TSDBs) that manage time series data, which are optimised software systems.
TSDBs are built to manage long streams of data, a tool for tracking bits that flow through websites, applications or IoT devices. These databases act in a way that adds algorithms for fast statistical analysis queries.
In recent years, their use for financial technology was well established, but they are seeing a considerable increase in use in the industrial sector to store and manage real-time data due to the rise of IoT connected devices in supply and production chains.
Increasing digitisation and internet connectivity means that everything has a sensor that generates data and information, so that a stream of metrics or time series data is constantly being output. Therefore, databases have required an evolution to support these new workloads. Time series databases, TSDBs, which are scalable and high performance, are necessary for such situations.
Thus, these databases are designed to record and store data associated with a specific moment in time or using a timestamp. Their purpose is to write data quickly with a compression algorithm and a query engine that is faster than other traditional databases.
They are also especially useful for researching and collecting patterns in which a dataset moves, as well as for detecting data anomalies in stable environments. This is because the platform is natively organised.
As a result of these main characteristics, TSDBs have been assumed to be ideal for industrial applications, enabling massive real-time data monitoring with instantaneous speed and high storage efficiency.
In addition, maintenance tasks are constant and automatically regulated in this type of database, as well as the removal of old data while delivering new statistics or even to give a specific lifetime to groups of data.
TSDBs have recently begun to incorporate specialised compression functions to store time series data in less space, as well as the ability to not store copies of data if the reading has not varied from millisecond to millisecond.
A company that decides to make use of a TSDB will need to develop a data retention policy to automatically delete information that is not relevant. It should also be noted that this type of database requires more complex code, so the staff working with them will need to be more qualified.
In summary, TSDBs are easier to use and provide better write speeds, along with higher query performance than other types of databases, attempting to use a relational or NoSQL database for time series data would result in much slower and less efficient performance.
New companies are emerging as they see a growth opportunity to make queries faster and more efficient. Therefore, traditional providers such as analytics engines are also including tools for time series data or, different cloud services have also started to add this type of storage, as is the case of AWS with Timestream. Some of these services are presented below:
InfluxDB is an open source database management system created by the company InfluxData. It is a free tool that, in turn, offers a paid version that is complemented with maintenance contracts and client access controls.
In its latest version, it introduced a new programming language, Flux, used to optimise the ETL process in this type of database, making it more compact. With this system, only the source of the data, the value itself and the corresponding timestamp are stored.
In addition to the change of programming language, it now works in the cloud without needing its own server infrastructure, but there is the possibility of maintaining its local version.
Timestream is Amazon’s fast, scalable, serverless TSDB approach for operational and IoT applications. It is complemented by tools that make it easy to find patterns and trends in real time, as well as simultaneously access historical and recent data.
Its properties include keeping recent data in memory while historical data is moved to optimised storage layers and can be accessed during queries without the need to specify where it is located.
In addition, it facilitates faster storage and analysis of events than other databases. At a very low cost, it ensures encryption of time series data both in transit and at rest.
TimescaleDB is an engine integrated with PostgreSQL, for managing relational tables and time series data. The advantage of natively supporting SQL language allows developers to adapt to this TSDB without the need to learn a new programming language. It is therefore a relational database managed solely for time series data.
It is also defined as a multi-cloud database service that stores this type of data quickly and easily. This database helps to identify industrial performance anomalies and presents a performance improvement by carrying high data compression rates.
This TSDB is responsible for storing all time series data by returning a query language, PromQL. It allows different modes of data visualisation, as well as storing time series in memory in custom formats. This tool is often used in conjunction with Kubernetes.
In short, Prometheus stores data automatically by providing a set of standard queries to analyse it.
The future trend for this type of database is increasingly to be able to handle larger data streams with more complicated analyses in an efficient way, i.e. to save space while still providing useful summaries.
In addition, the presentation layer of TSDBs is to be extended by graphical dashboards, such tools are gradually being implemented. For example, Grafana is implemented with different TSDBs, for industrial and home automation analysis using visualisation functions.
Finally, a debate has started about the language that should be used to write queries. The SQL language is experiencing a notable increase in its use for analysing time series data, as it is known by a very large community, but there are still languages developed specifically for TSDBs, such as the aforementioned Flux.
With the advent of the public cloud some time ago, data generation has grown massively. In addition, the need to analyse this data in real time has arisen with the implementation of IoT in different sectors, as it is necessary to know patterns and be able to optimise processes on an ongoing basis.
Therefore, TSDBs collect time series data in real time for storage and computation, sending the information to a front-end monitoring system to monitor IoT devices. In short, pattern tracking is key to the personalisation that companies are looking for today to provide excellent and unique CX.
Despite the existence of these databases for years in the banking sector for behavioural prediction, a considerable increase is expected in other sectors to detect disturbances and failures in production systems, as well as general data behaviours through all sensors that are connected to the Internet.