Wine Quality Prediction Machine Learning

Sep 16, 2024
5 min read

Updated: Dec 7, 2025

Assigning scores to wines by implementing machine learning algorithms to evaluate the factors influencing wine quality.

View Project File

Context

Product teams often use insights from product analytics to inform product strategy. This project explores how AI can be applied to improve the quality of insights generated from standard data analysis and visualize the results in a dashboard.

Objective

The overall objective is to use AI to uncover overlooked patterns, deepen analytical rigor, and produce more actionable insights with minimal human context. This is a low-context data analysis meant to understand what type of insights and visualizations AI can generated when given limited context on the data analysis goals (why the analysis is being done) and the analysis methodology (what to analyze and how).

Analysis Overview

Dataset scope: 6,497 wines (1,599 red, 4,898 white) from the UCI Machine Learning Repository, each with 11 physicochemical properties and expert quality ratings from 0-10.
ML techniques applied: Claude Code implemented three advanced gradient boosting algorithms (XGBoost, LightGBM, CatBoost) alongside traditional baseline models. The analysis included three problem formulations: multi-class classification (predicting exact scores 4-8), binary classification (Good vs Not Good), and regression (continuous quality prediction).
AI's analytical approach: Without being told what matters, the AI conducted systematic exploratory analysis, identified and removed outliers using statistical methods, engineered derivative features (total acidity, sulfur ratios, alcohol-to-density), and applied SHAP values to explain model predictions. It also discovered natural wine groupings through unsupervised clustering with PCA and UMAP dimensionality reduction for visualization.

Key Insights

1. Gradient boosting dramatically outperforms traditional methods.

LightGBM achieved 72% accuracy on multi-class prediction, beating the original Random Forest baseline by 4 percentage points. This improvement may seem modest, but in wine quality prediction—where expert tasters often disagree—it represents meaningful progress. Binary classification (distinguishing "Good" wines from the rest) reached 80% accuracy, showing that simpler categorization tasks benefit even more from modern algorithms.

2. Alcohol content dominates quality prediction.

The AI identified alcohol as the single strongest predictor across all models, with higher alcohol wines consistently receiving better ratings. This wasn't programmed into the analysis—Claude Code discovered this pattern independently through SHAP value analysis. Volatile acidity emerged as the second most important feature, but with negative correlation: wines with higher volatile acidity scored worse.

3. Four distinct wine profiles exist in the data.

Unsupervised clustering revealed natural groupings that weren't visible in the raw quality scores. These clusters represent wines with fundamentally different chemical compositions—for example, Cluster 0 averaged higher alcohol and lower volatile acidity, while Cluster 2 showed the opposite pattern. Interestingly, cluster membership didn't perfectly align with quality ratings, suggesting that multiple "paths" to quality exist depending on wine style.

4. Model interpretability matters for practical application.

SHAP analysis revealed that predictions don't rely on single features—they integrate multiple factors. A high-quality wine typically combines elevated alcohol, low volatile acidity, moderate sulfur dioxide levels, and higher sulphates. The AI generated waterfall plots showing exactly how each feature contributed to individual predictions, making the model's reasoning transparent rather than treating it as a black box.

5. Regression captures nuance that classification misses.

While multi-class classification achieved 72% accuracy, the regression approach produced predictions with RMSE of 0.58—meaning most predictions fell within half a quality point of actual ratings. This continuous prediction captures subtle quality differences that discrete classification obscures, particularly useful for borderline cases between quality tiers.

Patterns & Correlations

Sulfur dioxide shows a "sweet spot" effect. Rather than a simple linear relationship, wines with moderate free sulfur dioxide levels (roughly 20-40 mg/L) outperformed those with very low or very high concentrations. This suggests sulfur dioxide's role in wine preservation creates an optimization challenge—too little fails to protect the wine, too much affects taste perception.
Wine type (red vs white) influences which features matter most. The analysis combined both wine types, but correlation patterns differ between them. Red wines show stronger relationships between quality and tannin-related compounds, while white wines depend more heavily on residual sugar and acidity balance. The AI captured these nuances through interaction effects in the boosting models without being explicitly told about wine type differences.
Outliers tell a story about exceptional wines. Before removing statistical outliers, the AI identified wines with extreme feature values—some had residual sugar 10x the median, others had volatile acidity approaching vinegar levels. These weren't necessarily bad wines; some high-sugar wines received good ratings, suggesting dessert wine classifications that the dataset doesn't explicitly capture.
Feature engineering revealed hidden relationships. Claude Code created derived features like "total acidity" (combining fixed and volatile) and "alcohol-to-density ratio." The alcohol-to-density feature proved particularly informative—it essentially captures the fermentation efficiency, with higher ratios indicating more complete sugar-to-alcohol conversion. This wasn't in the original dataset but emerged from the AI's exploratory phase.

Takeaways

Low-context AI analysis produces technically sound but strategically shallow insights. Claude Code executed sophisticated ML methodology—proper train-test splits, cross-validation, multiple model comparison, interpretability analysis—without being told these were best practices. However, the insights lack business context.
Feature importance reveals what matters, not what can be changed. The top predictors—alcohol content, volatile acidity, chlorides—include some features winemakers can readily manipulate (alcohol via fermentation duration) and others that reflect raw material quality or fermentation conditions harder to control.
SHAP interpretability solves the "why" problem for gradient boosting. One criticism of complex ensemble models is opacity—you get accurate predictions but can't explain them. SHAP analysis addresses this directly by quantifying each feature's contribution to individual predictions. This transforms gradient boosting from a black box into a glass box, crucial for applications where predictions require justification (regulatory compliance, quality disputes).
Unsupervised learning reveals structure that quality ratings obscure. The four-cluster solution suggests wines group by chemical profile independent of their quality scores. This implies quality isn't a single dimension—a wine can be "high-quality red" or "high-quality white" or "high-quality sweet," and these represent different chemical strategies. The clustering captured this multimodal structure that supervised learning targeting a single quality score would miss.
AI-generated visualizations prioritize comprehensiveness over narrative. Claude Code produced 10+ interactive charts covering every angle of the analysis—correlation matrices, cluster dendrograms, SHAP plots, prediction residuals. This data-dump approach differs from human-curated visualization, which would select 2-3 charts telling a specific story. The AI optimized for completeness rather than communication efficiency.
The regression approach deserves more attention than it received. While classification gets the headline accuracy numbers, regression's continuous predictions (RMSE 0.58) might be more useful for real-world applications like blending operations or quality control, where fine-grained distinctions matter. The AI treated all three problem formulations equally, but a domain expert would likely prioritize regression for its practical utility.

Low-context analysis works well for exploratory phases, less well for decision-making. This experiment shows AI can autonomously discover patterns, implement best practices, and generate comprehensive analysis without human direction.

Priank Ravichandar