Wine Quality Prediction Machine Learning
- Priank Ravichandar
- Sep 16, 2024
- 1 min read
Assigning scores to wines by implementing machine learning algorithms to evaluate the factors influencing wine quality.

Summary of Machine Learning
Dataset
The data was downloaded from the UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/wine+quality
The two datasets are related to red and white variants of the Portuguese “Vinho Verde” wine. Find more details here.
These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are many more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.
Goal
The objective is to classify the data into the various quality score categories. Three machine learning models will be trained and tested to determine which will yield the best results:
Support Vector Machines (SVM)
Decision Trees
Random Forest
Tools
Python, Support Vector Machines (SVM), Decision Tree, Random Forest
Insights
Given a random sample of 100 wines from each dataset, the Random Forests model performed the best on the filtered wine dataset across all categories, which makes sense since it was trained on a subset of that data.
Although the model excludes wines with quality scores of 3 and 9, those wines can be considered outliers since there is insufficient data to accurately predict those quality scores using a machine learning model.
Overall, the Random Forests model does a good job of predicting wine quality, regardless of the type of wine.