How do you calculate precision and recall in a machine learning model?
Learn from the community’s knowledge. Experts are adding insights into this AI-powered collaborative article, and you could too.
This is a new type of article that we started with the help of AI, and experts are taking it forward by sharing their thoughts directly into each section.
If you’d like to contribute, request an invite by liking or reacting to this article. Learn more
— The LinkedIn Team
How do you calculate precision and recall in a machine learning model? If you are working with artificial intelligence (AI) or data science, you probably encounter these terms frequently. They are two common metrics for evaluating the performance of a machine learning model, especially for classification problems. In this article, you will learn what precision and recall mean, how to calculate them, and how to interpret them.
Precision is the ratio of correctly predicted positive instances to the total number of predicted positive instances. In other words, it measures how accurate your model is when it classifies something as positive. For example, if your model is trained to identify spam emails, precision is the percentage of emails that are correctly labeled as spam out of all the emails that are labeled as spam by your model. A high precision means that your model has a low false positive rate, which means it does not misclassify many non-spam emails as spam.
-
Santiago Valdarrama
I tell stories about technology and teach hard-core Machine Learning at ml.school.
There are 4 types of outcomes from a classification model: * True Positives (TP): The number of positive instances correctly predicted as positive. * True Negatives (TN): The number of negative instances correctly predicted as negative. * False Positives (FP): The number of negative instances incorrectly predicted as positive. * False Negatives (FN): The number of positive instances incorrectly predicted as negative. Precision is the ratio of the number of true positive predictions to the total number of positive predictions: Precision =TP/(TP+FP) Recall is the ratio of the number of true positive predictions to the total number of actual positive instances: Recall = TP / (TP + FN)
-
Sergio Altares-López
Top Linkedin Community AI • Quantum AI Researcher @CSIC • Executive Board Member @CITAC • Senior Data Scientist & AI - ML Engineer • AI Innovation
It refers to how many correct predictions a model makes compared to the total number of predictions. It is calculated as the Number of correct predictions / Total number of predictions. In the confusion matrix, this corresponds to TP / (TP + FP).
-
Frank D. Lawrence, Jr.
AI-Powered UX Designer | Expertise in Generative AI & Conversational AI tools | Prompt Engineer Certified | Content Strategist | Emerging Technology, Software & Data Researcher
Accuracy alone is not enough! ❌ Here are two examples that show why precision should be balanced with recall based on your use case and priorities: For critical applications like cancer detection, high precision is important to avoid unnecessary procedures 🏥 While in anomaly or fraud detection, high precision avoids flagging normal behavior as suspicious 🕵🏾♀️ Precision tells you how reliable a positive classification is 📈 or what percentage of positive identifications are actually correct. A high precision means the model is conservative about labeling positives 🔍
Recall is the ratio of correctly predicted positive instances to the total number of actual positive instances. In other words, it measures how complete your model is when it identifies positive cases. For example, if your model is trained to identify spam emails, recall is the percentage of emails that are correctly labeled as spam out of all the emails that are actually spam. A high recall means that your model has a low false negative rate, which means it does not miss many spam emails.
-
Raghu Etukuru, Ph.D., FRM, PRM
Principal AI Scientist | Author of four books including AI-Driven Time Series Forecasting | AI | ML | Deep Learning
Recall, also known as sensitivity or true positive rate, is a metric that measures the proportion of actual positives that are correctly identified as such, Recall = True Positives / (True Positives + False Negatives). True positives (TPs): These are the cases in which the model predicted positive and the true label was also positive. In other words, the model correctly predicted the positive class. False negatives (FNs): These are the cases in which the model predicted negative, but the true label was positive. Essentially, the model incorrectly predicted the positive class. It gives the percentage of actual positive instances that the model correctly identified.
-
Dakshinamurthy Sivakumar
Drug Discovery Scientist | AI-driven Drug Discovery Enthusiast
Recall is a metric used to measure the ability of a model to identify all positives correctly from the total instances (which may contains both true positives and false negatives). In simpler terms true positives (actually positive or correct one) has to be identified as correct but not as wrong. So that a high recall value that the model js good at identifying the right one as right one. Significance of the recall or precision or which one to use purely depends on the application.
-
Harshitha Mohanraj Radhika
Actively Looking for Full-Time Job Opportunities | Graduate Student at San Jose State University
In machine learning, recall is a performance metric that measures the ability of a model to identify all relevant instances in a dataset. It is also known as sensitivity, hit rate, or true positive rate. Recall is calculated using the following formula: Recall = TP/TP+ FP where: True Positives (TP) are the instances that are actually positive and are correctly identified as positive by the model. False Negatives (FN) are the instances that are actually positive but are incorrectly identified as negative by the model.
To calculate precision and recall, you need to compare the predictions of your model with the true labels of the data. You can use a confusion matrix to summarize the results of your model. A confusion matrix is a table that shows the number of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) for your model. Here is an example of a confusion matrix for a binary classification problem:
| | Predicted Positive | Predicted Negative |
|------------|--------------------|--------------------|
| Actual Positive | TP | FN |
| Actual Negative | FP | TN |
To calculate precision, you divide TP by the sum of TP and FP. To calculate recall, you divide TP by the sum of TP and FN. For example, if your model has 80 TP, 20 FP, 10 FN, and 90 TN, then your precision is 80 / (80 + 20) = 0.8 and your recall is 80 / (80 + 10) = 0.89.
-
Karthik K
AI Engineer @ Litmus7 | AI & Automation | Linkedin Top ML Voice 2023| Public Speaker |
Calculating precision and recall involves comparing your model's predictions with the true labels using a confusion matrix. This matrix summarizes the model's performance, categorizing results into true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). To calculate precision, divide TP by the sum of TP and FP. For recall, divide TP by the sum of TP and FN. For instance, if your model has 80 TP, 20 FP, 10 FN, and 90 TN, then precision is 80 / (80 + 20) = 0.8, and recall is 80 / (80 + 10) = 0.89. These metrics offer valuable insights into your model's accuracy and completeness.
-
Ritesh Kumar
Co-founder of UnReaL-TecE LLP | Fellow at Council for Strategic and Defense Research | Assistant Professor at Dr. Bhimrao Ambedkar University
One point missing here is the calculation of Precision and Recall for multiclass classification tasks. These are calculated by using either micro or macro averaging. For both of these we start by iteratively taking any one class as Positive class and all other classes as Negative class. Now there are two ways of calculating and averaging the scores - 1. We calculate the True/False Positives and True/False negatives at each iteration, add them up and calculate Precision and Recall - this is micro averaging. 2. We calculate Precision and Recall at each iteration and then calculate an arithmetic mean of all of those - this is macro averaging. Macro averaging could also be weighted by the number of samples in each class to get a better picture.
-
Davide Camera
Decoding AI for Everyone - Making LLM & GenAI Understandable and Fun - Head of Financial Institutions & Info Management
A confusion matrix is a tabular visualization of a model's performance. It contrasts actual vs. predicted values, revealing true/false positives and negatives. This clarity is instrumental for fine-tuning and steering models towards more accurate outcomes.
Precision and recall are two complementary metrics that reflect different aspects of your model's performance. Depending on your goal and the nature of your problem, you may want to optimize one or both of them. For example, if you are building a model to detect fraud transactions, you may want to have a high recall, because you do not want to miss any fraudulent cases. On the other hand, if you are building a model to recommend products to customers, you may want to have a high precision, because you do not want to annoy your customers with irrelevant suggestions.
However, there is often a trade-off between precision and recall. If you increase one, you may decrease the other. For example, if you lower the threshold for classifying something as positive, you may increase your recall, but you may also increase your false positives, which lowers your precision. Therefore, you need to balance precision and recall according to your needs and preferences.
-
Sergio Altares-López
Top Linkedin Community AI • Quantum AI Researcher @CSIC • Executive Board Member @CITAC • Senior Data Scientist & AI - ML Engineer • AI Innovation
To correctly interpret the accuracy results of the model, it's important to consider the balance of the dataset. If there is a significant imbalance during the training phase, the metric can appear high, predicting only one of the classes. That's why it's recommended to perform data splitting in a stratified manner.
-
MIRACLE OLAPADE
Chemist||Data Scientist||Project Manager
Precision and recall are two important metrics used to evaluate the performance of classification models. Precision measures the proportion of positive predictions that are actually correct, while recall measures the proportion of actual positives that are correctly identified. True Positives (TP): The number of instances that were correctly classified as positive. False Positives (FP): The number of instances that were incorrectly classified as positive. True Negatives (TN): The number of instances that were correctly classified as negative. False Negatives (FN): The number of instances that were incorrectly classified as negative Precision: TP / (TP + FP) Recall: TP / (TP + FN)
-
Kim Killian
Driving Innovation, Business Growth, and Competitive Advantage through ✨AI-driven Solutions
While precision is all about trust in our model's flagging ability, recall makes sure our model doesn't miss any spam. And together, they give a well-rounded view of our model's performance.
Precision and recall are not the only metrics that you can use to evaluate your machine learning model. There are other metrics that combine precision and recall in different ways, such as accuracy, F1-score, and ROC curve. Accuracy is the ratio of correctly predicted instances to the total number of instances. F1-score is the harmonic mean of precision and recall, which gives more weight to low values. ROC curve is a plot of the true positive rate (recall) versus the false positive rate (1 - precision) for different thresholds, which shows how well your model can discriminate between positive and negative classes. You can use these metrics to compare different models or to tune your model parameters. However, you should always understand what each metric means and how it relates to your problem and goal.
-
Karthik K
AI Engineer @ Litmus7 | AI & Automation | Linkedin Top ML Voice 2023| Public Speaker |
Precision and recall are just part of the metrics toolkit for assessing machine learning models. Accuracy offers an overall performance view, while the F1-score balances precision and recall. The ROC curve visualizes how well your model separates positive and negative classes. These metrics are handy for model comparison and parameter tuning, but always select the one that aligns with your specific problem and goals, understanding its relevance to your context.
-
Sergio Altares-López
Top Linkedin Community AI • Quantum AI Researcher @CSIC • Executive Board Member @CITAC • Senior Data Scientist & AI - ML Engineer • AI Innovation
When working with imbalanced data, recall is an important metric as it measures a model's ability to identify true positives. However, due to the imbalance, the model may have high recall in the majority class but low recall in the minority class. To address this, it is useful to consider additional metrics like the F1-Score and AUC-ROC, analyze the confusion matrix, implement strategies such as oversampling or undersampling, and adjust classification thresholds to balance performance in both classes. These actions enable a more effective evaluation and improvement of the model's performance on imbalanced data.
-
Raghu Etukuru, Ph.D., FRM, PRM
Principal AI Scientist | Author of four books including AI-Driven Time Series Forecasting | AI | ML | Deep Learning
The individual Precision and recall metrics are not so effective for imbalanced data, leading to the precision-recall trade-off issue. The F1 score is instrumental when there is uneven class distribution (imbalanced data), as it seeks a balance between precision and recall. The F1 score is typically employed as a comprehensive metric to encapsulate precision and recall harmoniously. F1 score combines precision and recall. F1-Score = 2 * (Precision * Recall) / (Precision + Recall) Precision = True Positives / (True Positives + False Positives) Recall = True Positives / (True Positives + False Negatives) The F1 score ranges from 0 to 1, where 1 indicates perfect precision and recall, and 0 indicates that either precision or recall is zero.
-
Sergio Altares-López
Top Linkedin Community AI • Quantum AI Researcher @CSIC • Executive Board Member @CITAC • Senior Data Scientist & AI - ML Engineer • AI Innovation
It's important to note that these metrics from the confusion matrix are used for classification models. However, while it is true that a confusion matrix cannot be directly applied to continuous data, it is possible to categorize your continuous data into ranges and then generate a confusion matrix based on these value ranges, treating each range as a separate class.
-
Kipngeno Kirui
Data Scientist
It is important to note that these metrics also depend on the data you use and also the nature of the problem that you are classifying. For instance, in loans classification the problem is to make sure you classify a lot of true positives (customers who are indeed defaulters and the model has predicted that they are defaulters). This would be a similar case for heath problems like classifying customers if they have a tumor. We must also mention that the distribution of samples, whether there is an imbalance, affects these metrics a lot. Hence a challenge/trade-off data scientists try to balance each and every time they guild classification models.
-
Kim Killian
Driving Innovation, Business Growth, and Competitive Advantage through ✨AI-driven Solutions
I think its good to remember that the goal is to find the appropriate balance among all these metrics in light of the specific goals and constraints of a project. At the end of the day, these are all tools to solve real-world problems. Understanding the real-world implications and cost of the model's predictions is important. This will often guide which metric is most important to focus on.