Have you ever struggled to store, process and visualize very large amounts of complex sensor data?

Recently we faced the problem that we needed to give visual access to sensor data collected by a network of chemical sensors of different technologies to a number of partners. We use different sensing technologies and high sampling rates, i.e. complexity and size of the data is somehow high, and things are difficult to handle and process (approx. 10^9 (= 1.000.000.000) values every 2 days). The sheer amount of data made it impossible to use the usual R&D suspects like CSV files and Matlab.

Therefore, we tried out something else, namely a database solution called TimescaleDB. TimescaleDB (https://www.timescale.com/ – we used the Community edition) is a time-series database layer on top of PostgreSQL (https://www.postgresql.org/), a widely used open-source SQL database. And this really was an eye-opener: Very fast access to our time-series data while at the same time being able to use the usual SQL-centric toolset transparently in our Java/Python environments. Even some time-related analytics could easily be done as simple SQL queries, using PostgreSQL statistical functions (e.g. correlation) and the time-bucketing features of TimescaleDB.

The downside of using full-grown databases in R&D environments often is the complexity of running the necessary infrastructure, i.e. setting up a server, providing access and training to everyone including the non-IT guys (mathematicians, physicists). However, that actually also was quite simple due to Docker (https://www.docker.com/) and the existing pre-configured image. After some initial trial-and-error (robustness, access, storage, etc.) we ended up using local Docker images for development on Windows (yes, works like a charm) and Linux and several Linux-based images for centralized internal and external access (technical consortium partners).  

To provide a simple visualization of the data, we now also looked at the “monitoring community” and found Grafana (https://grafana.com/) to be very helpful. Grafana is a web-based, open-source analytics and monitoring solution, which supports several data-sources, including PostgreSQL/TimescaleDB. Grafana comes with a nice 2D visualization component for time-series data. Defining an XY-Plot essentially consists of writing down the SQL-query (with some special time handling) and some formatting. Interactively browsing the data after this initial setup works perfectly, even with huge datasets. However, one must be aware that some degree of grouping and aggregation (over time) is involved to achieve this. Again setting up and running a web-based solution can get a bit tedious. But also for Grafana, there is a pre-configured Docker image. So not a problem at all, as soon as some network access topics have been clarified (host.docker.internal not working on Linux…). In summary, the combination of TimescaleDB/PostgreSQL together with traditional SQL development environments (just use it as a usual PostgreSQL database) is really helpful. And with Grafana we got visualization, which we would not be able to build ourselves within the available time frame. Finally, yet importantly, the Docker platform was a fundamental building brick to keep things easy to set up and run.