Unsupervised Learning

• After supervised learning, the most common form of machine learning is unsupervised learning. In unsupervised learning, we are given data without any output labels $$y$$.
• Data comes with inputs $$x$$ but no outputs $$y$$ and the algorithm has to find structure in this data.
• Let’s take our earlier breast cancer prediction problem for an example.

• We’re not asked here to predict whether the tumor is malignant or benign because we are not given any labels of which tumor is which.
• Instead, our job is to find some pattern, or some data, or just something interesting within this unlabeled dataset.
• The reason this is called unsupervised learning is that we are not asking the algorithm to give us a “right answer”.
• In this example, our unsupervised algorithm might decide there are two clusters, with one group here and one there.
• This is a specific type of unsupervised learning algorithm called clustering algorithm because it places the unlabelled data into different clusters.

• Clustering algorithm

• Clustering groups similar data points together.
• Clustering has many use cases:
• It is used in Google News! Google News looks at 100’s of stories every day and clusters them together.

• It is used in DNA microarray clustering. The red here might represent a gene that affects eye color, or the green here is a gene that affects how tall someone is.
• You can run a clustering algorithm to group different types of individuals together based on categories the algorithm has automatically decided.
• It is used in grouping customers in different market segments to better understand a company’s consumer base. This could help in improving marketing strategies for each group.

Anomaly detection algorithm

• Find unusual data points. This could be used to detect fraud in bank transactions.

Dimensionality reduction

• Compress data using fewer numbers.