Data Minimalism: A new philosophy in the era of Big Data

Companies, at a global level, have made great progress in the collection of information and data in the last two years, and have learned the great value of having objective and classified information in order to make better decisions.

As a result, both the quantity and value of data has skyrocketed. In fact, 90% of the data, at a global level, has been created in the last two years. The increase in the implementation of IoT devices has been one of the reasons for this huge growth, as more and more connected devices are constantly collecting, analysing and sharing information. In fact, it is estimated that by the end of 2020 there will be over 50 million connected smart devices in the world.

Over the next 5 years, the global volume of available data is expected to grow at a rate of 40% per year reaching 175 zettabytes by 2025. The consideration of data as the most critical asset for a company continues to grow, but despite the large investments made to manage the increasing input of data, the vast majority of companies are still unable to retrieve valuable information that would allow them to take advantage of the potential created by all that data.

The main reason for this is the amount of data that is constantly accumulating and growing, and this struggle to collect as much data as possible means that the costs of infrastructure, storage, processing and analysis are increasing while the quality of that analysis is decreasing. In fact, most companies only analyse 12% of the data they have.

This large accumulation of data in warehouses and databases is done with the fixed idea that the more data you have the better, but overlooking the fact that the more information you collect the more noise, redundant and obsolete data there will be, and the more difficult it will be to analyse.

For all these reasons, there is an increasing need for good planning in the information collection process, creating a range of strategies to ensure that the data being collected is used, is clean and well managed, to maximise the value of that information and the data as a strategic asset. This is one of the basic principles of Data Minimalism or, in some contexts also called Minimum Viable Data (MVD), a philosophy that companies should consider implementing.

Data Minimalism A new philosophy in the era of Big Data

The concept of Data Minimalism

Data Minimalism comes at a time when the capabilities of Big Data, Cloud Computing, data processing and analytical tools are constantly growing; as well as at a time when companies are trying to generate and store as much data as possible, whether they need it or not.

This is a new philosophy that is beginning to permeate businesses as it helps improve decision-making, and also improves the quality and sustainability of systems. The key to Data Minimalism is to collect only the information that is needed, in order to maximize the return on a company’s investment in all analytical tools and platforms, as well as to implement data collection strategies that are aligned with business objectives.

Data Minimalism also creates the need for standardised data exchange, including greater use of open APIs and new technological solutions such as Artificial Intelligence or Deep Neural Networks. These neural networks allow suppliers to create better individual offers for customers using less data, or even using only the data that the customers themselves allow them to process.

On the other hand, having a Data Governance plan will be fundamental to carry out this change of strategy, as it will ensure that the quality of the data obtained is taken care of, that all the data collected and processed in the organisation is within a specific context (operational, regulatory, etc.), and with an objective always in mind.

Benefits of Data Minimalism

Implementing Data Minimalism can be a challenge, but at the same time very beneficial for companies. The amount of data that is recorded every day means that companies are unable to really differentiate between data that has value and data that does not, making it difficult to make decisions as the information that has been collected may have no real value.

With Data Minimalism this would not be the case, as firstly specific objectives and strategies are established, and as a result the necessary data is then collected and analysed. This philosophy can bring benefits to companies in many ways:

  • Profitability: Although storing data is relatively inexpensive, computational costs increase considerably as large amounts of data are entered and analysed. These costs can be direct, such as the price to be paid for having cloud storage; or indirect, such as energy and time costs. Both can be minimised by reducing the size of the incoming data set.
  • Social and environmental responsibility: On the one hand, all the data that is transferred unnecessarily creates security and privacy risks. On the other hand, every hour that the CPU spends processing redundant data constantly causes unnecessary power usage and therefore unnecessary CO2 transmissions.
  • Higher and better system quality: The widespread belief that lack of data is a symptom of poor system performance is wrong, as redundant and non-valuable data can corrupt system performance. Data Minimalism removes such toxic data from the equation and focuses on data that only provides value and is beneficial to the system.
  • More stability: Using large amounts of data requires more sophisticated sampling techniques, and while these algorithms often show good and stable performance, sometimes stability is not desired. With Data Minimalism there are no risks, as less data is collected, less sampling is done and better stability is always achieved.

As you can see the advantages of implementing a Data Minimalism philosophy are important, although they can be summarised as the most important thing which is to collect only the data that is really necessary to mitigate the time and money associated with maintaining that information.

This was echoed by researchers at the Department of Energy’s Lawrence Berkeley National Laboratory (Berkeley Lab) who developed a new approach to Machine-Learning aimed at analysing experimental images. In conventional methods, ML models depend on tens or thousands of images. For this new approach, a ‘Mixed-Scale Dense Convolution Neural Network’ (MS-D) was developed that required fewer images for the system to learn faster. The system produced high resolution images at a higher speed.

Data privacy

Data Minimalism is also a response to privacy and data security, as well as to the problem of GDPR. It is therefore fundamental to addressing customer data, especially in light of increasing regulation and concern about the privacy and security of their data. This context is creating a need for organisations to collect only the data necessary to enable them to offer their products and services, and to be transparent with their customers.

Customer trust in data is becoming a very important and paramount issue for most companies, which must ensure that they only collect the data necessary to design or improve their products, and in a transparent and reliable manner.

This is what has been making the difference between companies in recent years, as on the one hand users increasingly want to have more control over their data, and on the other hand companies do not collect their data massively, but only collect the data they need, and also specify what they are going to do with it. This gives value to the company and increases user confidence.

For example, with a Data Minimalism philosophy and a Blockchain-based system, users can sell their own data, retain ownership of it and decide whether to share it with a company or not. On the other hand, the company can request access to some specific data and use it to create or improve its services.

Conclusions

The next few years will be dominated by Data Minimalism which allows companies to focus on the data they really need to deliver the best customer experience.

Providing less data means that both time and money associated with maintaining the information is less, as the more information that is collected the more data to analyse and maintain will also be greater, significantly increasing expenditure. Therefore, reducing the number of data also reduces maintenance costs.

For example, one of the big problems in a Business Intelligence project is the preparation of the data, the amount and state of the data makes this phase the most expensive. With a “less data, more efficiency” approach it is easier to reduce costs, while at the same time ensuring consistency and reducing the risk of information exposure

Data Minimalism makes it possible to make low-cost and highly effective decisions using a small set of data, and this is often not fully understood. The belief that Big Data and indiscriminate data collection can provide a great deal of valuable information for a company is completely wrong. Companies, in the vast majority of cases, do not know what to do with such a large amount of data, and this causes many of the decisions they make to be ill-founded.

With Data Minimalism companies properly select the data they really need with a specific objective in mind, thanks to this companies can make better informed decisions, focus on ensuring that customers have a good experience, as well as gain their trust by ensuring the security and privacy of their data by providing transparent and clear information on what data they are collecting, and what they are doing with it.