A regression model was run to predict the wine rating from different features (fixed.acidity, volatile.acidity, citric.acid, residual.sugar, chlorides, free.sulfur.dioxide, total.sulfur.dioxide, density, pH, sulfates, alcohol) of wine in R. According to the initial analysis on the distributions of the features, many of those features are right-skewed and thus require log transformation. Forward Selection Algorithm was used to find the best predictive models for wine quality where each feature is added to the model one at a time; at each step, each variable that is not already in the model is tested for inclusion in the model. then the most significant of these variables is added to the model, so as long as it 's P-value …show more content…
PCA creates new features vectors based on the sorting order defined by a specific parameter. The 11 feature vectors were recreated in decreasing order of eigenvalues i.e. the highest eigenvalue was in the first column and the lowest one in the last column. After selecting different ranges of feature vectors for svm analysis, the best training and test accuracy was found when svm was trained on the first 6 features, i.e the first 6 columns with highest eigenvalues. Table4 gives a complete table with all the applied methods and …show more content…
However this led to significant decrease in data from 1599 wine samples to just 280. Hence this data training was considered best to be ignored. So the best test set accuracy for support vector machine was obtained when PCA was implemented beforehand to provide feature vectors sorted in the order of highest eigenvalues. Since PCA is highly influenced by the scaling of the original variables, it retains characteristics of datasets that contributes most to its variance[2]. Although PCA was able to provide a better test accuracy, it is only 2% higher than the test accuracy provided had the data been trained only with ‘alcohol’ and ‘sulfates’ attributes. While PCA is known to give better results by increasing the accuracy of the data when it is trained later, if many features aren’t present, reducing dimensionality may not influence the fitting of the data as much[2], as is seen