Multiple Regression Models Of Wine Rating

4.2 Multiple Regression

A regression model was run to predict the wine rating from different features (fixed.acidity, volatile.acidity, citric.acid, residual.sugar, chlorides, free.sulfur.dioxide, total.sulfur.dioxide, density, pH, sulfates, alcohol) of wine in R. According to the initial analysis on the distributions of the features, many of those features are right-skewed and thus require log transformation. Forward Selection Algorithm was used to find the best predictive models for wine quality where each feature is added to the model one at a time; at each step, each variable that is not already in the model is tested for inclusion in the model. then the most significant of these variables is added to the model, so as long as it 's P-value …show more content…
PCA creates new features vectors based on the sorting order defined by a specific parameter. The 11 feature vectors were recreated in decreasing order of eigenvalues i.e. the highest eigenvalue was in the first column and the lowest one in the last column. After selecting different ranges of feature vectors for svm analysis, the best training and test accuracy was found when svm was trained on the first 6 features, i.e the first 6 columns with highest eigenvalues. Table4 gives a complete table with all the applied methods and …show more content…
However this led to significant decrease in data from 1599 wine samples to just 280. Hence this data training was considered best to be ignored. So the best test set accuracy for support vector machine was obtained when PCA was implemented beforehand to provide feature vectors sorted in the order of highest eigenvalues. Since PCA is highly influenced by the scaling of the original variables, it retains characteristics of datasets that contributes most to its variance[2]. Although PCA was able to provide a better test accuracy, it is only 2% higher than the test accuracy provided had the data been trained only with ‘alcohol’ and ‘sulfates’ attributes. While PCA is known to give better results by increasing the accuracy of the data when it is trained later, if many features aren’t present, reducing dimensionality may not influence the fitting of the data as much[2], as is seen

Related Documents

Mark Nobel Case Study

Mark Nobel Case Study

Friday Night Lights Research Paper

Friday Night Lights Research Paper

Welch's Grape Juice Marketing Analysis

Welch's Grape Juice Marketing Analysis

Blood Alcohol Concentration Essay

Blood Alcohol Concentration Essay

Drinking Age Analysis

Drinking Age Analysis

Red Wine Research Paper

Red Wine Research Paper

Keeping The Drinking Age Analysis

Keeping The Drinking Age Analysis

Whole Grain Cereal Lab Report

Whole Grain Cereal Lab Report

Sugar Dissolve Lab Report

Sugar Dissolve Lab Report

Cranberry Juice Experiment Lab Report

Cranberry Juice Experiment Lab Report

Caravan Insurance Case Study

Caravan Insurance Case Study

Ice Melts Essay

Ice Melts Essay

Taguchi Model Of Quality

Taguchi Model Of Quality

Theoretical Analysis For Hphx's Performance

Theoretical Analysis For Hphx's Performance

Case Study: Classification Of Product Families

Case Study: Classification Of Product Families

Related Topics

Ready To Get Started?

Discover

Company

Follow