What is anomaly detection?
Anomaly detection in the context of data science is detecting a data sample that is out of the ordinary and does not fit into the general data pattern (or an outlier). This deviation can result from a rare occurrence or an outlying event.
Identifying these samples, called anomaly detection, is an integral part of any monitoring system. Anomaly detection has been traditionally done manually by inspection, which is a tedious process typically done by experts with significant domain knowledge.
Real-life Use Cases of Anomaly Detection
Anomaly detection is used in a wide variety of applications. The following are just a few examples:
- Credit card processing companies widely use anomaly detection to identify the fraudulent transactions
- The field of cybersecurity is heavily reliant on anomaly detection to discern between normal behavior and malicious attempts to compromise the system
- eCommerce vendors use anomaly detection to identify unusual shifts in various performance metrics, such as sale conversion, bounce rate, and unusual counts of positive or negative customer reviews
- Detection of spam emails
Critical Factors in Anomaly Detection
An effective anomaly detection model must address the following factors and requirements:
- Minimize the prediction latency
- Minimize false positive predictions
- Effectively deal with highly imbalanced training datasets
For example, an anomaly detection model tasked to discern between legitimate and fraudulent credit card transactions must be able to render prediction in real-time and deny fraudulent transactions before it is too late. In the same context, erroneously misclassifying a valid transaction can lead to poor customer experience and embarrassment. Aside from these challenges, the model has to be trained using a severely imbalanced dataset since fraudulent transactions are extremely rare.
Machine Learning and Anomaly Detection
Searching for anomalies using manual inspection of data in the “big data” age is impossible. Fortunately, machine learning techniques for anomaly detection have proven to be highly effective since they can handle massive datasets and meet the latency requirements of most anomaly detection use cases.
Machine Learning Comes to the Rescue
Machine learning algorithms can recognize complex patterns from massive volumes of data and solve detection tasks that might be impossible to accomplish manually. In the context of spam detection, these algorithms can accurately discern between spam and non-spam emails by examining the content of the messages.
Various machine learning models such as Artificial Neural Networks (ANN), Support Vector Machines (SVM), Autoencoders, and K-Nearest Neighbors have been used for anomaly detection. They each have their strengths and weaknesses. One key criterion for model selection is the time and effort needed to prepare the data for that particular model. Given this metric, ANNs are the most effective since they require the least data preprocessing.
Artificial Neural Networks (ANN) for Anomaly Detection
ANNs are among the most popular machine learning models for anomaly detection and various other use cases, such as Computer Vision and Natural Language Processing. They are inspired by how neurons work in the brain to enable us to perceive our senses. One of the critical advantages of ANNs is their ability to tackle various degrees of model complexities by merely accepting additional layers of neurons, aiding the model to deal with increasing data pattern complexity and data granularity. This phenomenon is referred to as “Deep Learning.”
Anomaly detection has enjoyed immense improvement following the introduction of Deep Learning models. Over the past few years, several deep algorithms for anomaly detection have been developed capable of solving previously impossible problems.
To better understand deep anomaly detection algorithms, academic researchers have attempted to categorize them into fine-grained categories. A common categorization process is based on how the deep learning model expresses the anomaly score (the anomaly level). Three distinct types can be made:
1. Deep Learning for Feature Extraction
These algorithms use a deep learning framework to extract features (independent variables) from the raw data and transform the input into a new representation. The term “features” is simply the characteristic of a given data stream. In this representation, anomalies can often be detected using a traditional anomaly detection algorithm. Essentially, in this category, deep learning and anomaly detection are fully isolated.
This category of methods is instrumental when dealing with high-dimensional input data. Deep models can map the high dimensional data into a lower dimensional one in which anomalies are easier to detect. Furthermore, many of these algorithms can also work with unannotated and unlabeled data.
The feature extraction methods themselves can also be grouped into three categories:
- Supervised learning algorithms can only work with labeled data. They train a classifier to predict the label and discard the last layer. The layer before the last one should ideally learn an efficient feature set from the data that guides the model in identifying its class. Since, in most cases, access to labeled information is challenging, these methods are less popular than other learning techniques.
- Unsupervised learning methods can extract data representation even when no labels are available. To this end, they often attempt to fit a probability distribution to the data (aka generative models) or build a model with a smaller capacity to reconstruct the input (aka reconstruction-based models). An anomaly detector can efficiently find abnormal samples by looking at the former’s probability distribution or the model’s bottleneck in the letter.
- Self-Supervised: Self-Supervised learning is a new methodology in deep learning and is considered the state-of-the-art technique for learning good representations from unlabeled data. Just like unsupervised methods, they can learn from unannotated data and eliminate the expensive and time-consuming labeling process. However, to learn a good representation, these models define proxy tasks, generate pseudo-labels, and try to predict these labels in the output. By learning to solve this proxy task, which is often unrelated to the final task (e.g., anomaly detection or object classification), the model can learn meaningful patterns present in the data that indirectly benefits the downstream task. For instance, if we have an unlabeled dataset of animal images, the network can learn a good representation of them by rotating them in different directions and trying to predict the rotation angle. Since the network itself applies the rotation, it has the angle label and can build a supervised classifier to predict it. Empirically, recent research has shown that such a trained model can learn a representation in which different animals can be reality identified with the help of a simple classifier.
2. Learning Feature representations of Normality
This category of algorithms uses deep learning models to measure the data sample abnormality level indirectly. They can often learn an efficient score from unlabeled data by assuming the relationship of a network’s output with the degree of abnormality of the samples. An example of such algorithms is “autoencoders”; they, too, are deep models that attempt to reconstruct their input. Suppose we assume that the autoencoder is only trained on normal data; therefore, it will not be able to reconstruct abnormal samples well. In that case, we can use the reconstruction error to measure the degree of sample abnormality.
It is worth noting that autoencoders can be used as an independent anomaly-scoring models that operate very efficiently and can precede other anomaly detection algorithms to augment their functionality.
3. End-to-End Anomaly Score Learning
Finally, the last category of deep learning algorithms in anomaly detection are those that can learn a direct anomaly score from the input data. Unlike the second category, this score directly reflects the degree of input abnormality and is often defined by an expert. Therefore, all models in this category can be considered supervised, as they require costly date-set labeling making them less popular and only used when data sets are already labeled.
The recent surge in the popularity of deep learning has significantly improved anomaly detection algorithms. They opened doors for addressing problems that were previously perceived to be impossible to solve. It is fully expected that further improvements in deep learning models will also benefit the field of anomaly detection.
About the Author
Al Gharakhanian, Machine Learning Development Director, Cognityze
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW