### Introduction

Clustering or unsupervised learning techniques are powerful methods to find underlying patterns that may not be obvious and lead to, for example, customer segments that can be implemented in future marketing campaigns. These algorithms can be, in general, broken down into centroid-based methods, or distribution-based methods, or density-based clustering. Each algorithm or class of algorithms excels under certain circumstances. A good 2-dimensional visualization of these algorithms can be found on the scikit-learn page.

One underlying similarity to most of these algorithms is that pair-wise distance metrics are computed to quantify the *closeness* or *similarity* of independent data points. The smaller the distance metric between independent data points, the more *similar* these data points appear.

### Effect of Multicollinearity

The existence of strong correlation between or amongst the features of a design matrix can lead to non-optimal clustering results. Why might this happen? Well, as discussed above, the clustering algorithms are measuring a distance metric or *similarity* between data points, which are in turn used to create groups or clusters of similar data points. When two or more features are highly correlated, those features have a stronger influence on the distance calculation than they should and can effect the grouping.

Multicollinearity should be removed from the design matrix prior to clustering. Additionally, the features of the design matrix should be standardized. Non-standardized design matrix will also lead to a non-optimal clustering results in which one or more of the features dominate the distance calculations. On a side note, similar issues occur in convex optimizations, such as gradient descent.

One method to remove multicollinearity from data is via Principal Component Analysis (PCA). This technique creates new features from linear combinations of the original features, such that the collinearity is removed. Typically, one or more of the new features is predominately noise and can be removed from the transformed design matrix; this is why PCA is referred to as a feature reduction technique.