At the height of the Big Data boom and with ever-increasing demand, Deep Learning is growing and evolving rapidly thanks to all the applications it has to revolutionise AI. Computer vision is one of these fields of artificial intelligence, which aims to enable and configure machines to try to see the environment as humans do and use the knowledge to perform different tasks, such as image recognition, image analysis and classification.
Deep Learning neural networks is a market with a global market size of around $120 million, with a forecast of $296 million by 2024 . Thus, introducing Deep Learning algorithms for computer vision is bringing major advances. For example, Convolutional Neural Networks (CNN) have revolutionised this aspect. Thanks to CNNs, facial recognition in social networks or the detection of diseases using visual images, among other applications, is possible.
An example might be an image where there is a specific variety of flower that is to be identified. To do this, it is necessary to feed the pixels of the image in the form of matrices to the input layer of the neural network. Hidden layers perform feature extraction by performing computations. One of these hidden layers is the convolution layer, which extracts features from the image, allowing us to know, in this example, whether the flower we want to identify exists in the image and how many flowers of this type there are.
A CNN can thus be defined as a feedback neural network used for the analysis of visual images by processing data with a grid-like typology to detect and classify objects in an image.
This type of neural networks are a subset of Machine Learning, Deep Learning algorithms, composed of layers of nodes that have an input layer, hidden layers and an output layer, where all nodes are connected with an associated value. So, if the output of an individual node has a value above the specified threshold value of each node, that node is activated and sends the information to the next layer of the network.
CNNs provide a more scalable approach to object identification in images than other types of neural networks, taking advantage of the principles of linear algebra by multiplying matrices for pattern identification.
The algorithm of this type of neural network is different from others in that it has a higher performance capacity for image, voice or audio inputs. To understand how they work, it is necessary to know the existence of the three layers that make up this network.
The first is the convolutional layer, which can be followed by additional convolutional layers or grouped layers. The grouped layer is the layer that connects to the final layer. These layers focus on simple features, such as colours or margins. As the image data progresses through the CNN layers, the algorithm is able to recognise more complex elements until it finally recognises the desired object.
The convolutional layer is therefore the central core of a CNN and requires different components such as input data, a filter and a feature map. The filter is usually a 3×3 matrix that is applied to an image area and a scalar product calculation is performed between the pixels of the input data and the filter. This scalar product is applied to an output matrix and it is the filter that moves through the image performing these calculations until it does the same for the entire image. The result of the series of scalar products of the input and the filter is known as the feature map.
This output matrix does not need to be related to each input value, so they are referred to as partially connected layers. A rectified linear unit transformation (ReLu) must then be applied to this convolution, i.e. it is applied to the feature map giving linearity to the model. In short, the convolutional layer converts an image into numerical values that make it easier for the neural network to interpret and extract relevant patterns.
The clustering layer performs dimensionality reduction to reduce the parameters in the input. Similar to the convolution layer, it is this filter that sweeps the entire input for a selection of values to be sent to the output matrix, reducing complexity, improving efficiency and limiting the risk of overfitting.
Finally, pixel values from the input image are not directly connected to the output layer in partially connected layers. The fully connected layer refers to the fact that each node of the output layer is directly connected to a node of the previous layer. This layer performs the classification function based on the features extracted through the previous layers and their different filters. Convolutional and clustered layers use ReLu functions, while these fully connected layers take advantage of a softmax activation function to classify the inputs appropriately.
The main use of convolutional neural networks is image recognition and classification for the purpose of deconstructing an image to find a distinguishing feature, using a supervised Machine Learning classification algorithm. Another option is the reduction of the credential description, but in this case it uses an unsupervised algorithm. These methods can be applied in the following areas:
Esta función parte del algoritmo más básico de clasificación de imágenes. El etiquetado de imágenes consiste en describir las imágenes para que sean más fáciles de localizar, para la búsqueda visual y el reconocimiento de objetos o análisis de tonos de las imágenes.
Visual search is based on the comparison of input images with a visual database. So it evaluates the image and searches for more images that have comparable credentials.
CNN image recognition for making suggestions is also suitable, especially for products matched according to visual criteria. For example, Pinterest employs this type of CNN recognition, focusing on visual matching of credentials, such as all those containing red objects.
There is a subset of image recognition that deals with face recognition or complex images. The distinction between image recognition and face recognition is based on operational complexity, as an additional layer of work is required in face recognition because the face and its features must be recognised first, followed by basic object recognition.
Facial recognition is used in social media platforms to speed up the process of tagging people in photos. It is also used to use snapchat or Instagram filters that start from a basic auto-generated design of the face and add new elements or effects. In surveillance it is becoming an essential method due to its efficiency and speed.
Using CNN image recognition in medicine allows its application to detect anomalies in X-ray images more accurately than the human eye, for example. The classification of medical images is based on massive data such as public health records, which serve as the basis for training algorithms.
On the other hand, the HRA system can be used as a predictive application to calculate the probability of specific events in relation to the health and risks of individuals, such as disease progression or complications. An example to better understand this is that this type of neural network studies solar activity every day to determine the threat level of radiation.
Finally, within this application, CNNs can work in drug discovery, as during drug development there is a large amount of data to consider: analysis of observed medical effects, detection of anomalies, and so on. These neural networks facilitate the process of drug coverage at critical stages, reducing the time needed to develop drugs by performing predictive analytics.
Although CNNs bring great results when it comes to recognising patterns and small details that may go unnoticed by the human eye, when it comes to understanding the content of an image it fails. For example, in an image of several people a CNN is able to differentiate ages and features, but when a human being looks at the same image it is able to establish scenarios and situations. Especially when it comes to practical applications, such as blocking inappropriate images on social networks, it encounters difficulties. One case was a 30,000-year-old nude statue on Facebook that was blocked by a CNN.
However, even though CNNs present these difficulties, they have led to a breakthrough in artificial intelligence, as they are used in many computer vision applications such as facial recognition, search and image editing. The augmented reality and virtual reality industries are increasingly exploring this type of neural network, although it is still far from replicating the behaviours of human intelligence.