What is unsupervised learning?
Unsupervised learning differs from supervised learning in that it does not have a known output, therefore no mapping function. We say that the data are “unlabeled”, compared to “labeled” data in supervised learning. “Labeled” in the sense that we already know the output of the data. Back to our previous example (cf. previous post, Supervised Learning), we know that this handwriting labels a “A”, a “B”, etc. In unsupervised learning, we do not know the output of the data we implement the computer with. Therefore, people who use an algorithm of unsupervised learning want the computer to find a structure in the data, i.e. to figure a pattern, as commonly said.
An example would be if you have a lot of data but are only interested in some aspects of them. Let’s say you have information for the last 3 years, day by day, all over EU, about the weather, the date, the oil price, the real-estate market, the growth of each European countries, how many products (from all sorts of veggies, glasses, heart pacemakers, sleeping pills, clothes and shower gel, etc.) have been bought. You might not be interested by the correlations between all these data variables, also called “dimensions” of the data.
Therefore, you could write a program to group these data into similar categories, this procedure is called “clustering” and the sub-groups of data are called “clusters”. For instance, one could group/cluster all the material bought products into “products”. You could also decide to write a program which would only analyze 2 or 3 dimensions of your data, in this case it is called “reduction of dimensionality”. In this case, one could choose, for example, to only consider the following dimensions of the data: temperature, time of the year and real-estate market and only analyze data according to these 3 variables (3 dimensions).
Finally, you could also decide to analyze whether a common rule can describe a large part of your data, i.e. whether the whole dataset would have data following the same rules. In other terms you could ask whether a given variable X defined as being correlated to A would also be correlated to B. This means that whenever X increases, A either decreases (in that case we say that both variables A and X are negatively correlated) or increases as well (in this case we say that both variables A and X are positively correlated). For example, you could look whether a given dimension of the data, increasing or decreasing with the weather, would also be correlated with another dimension of the data, or whether one type of people who tend to buy this product also tend to buy another product. This type of learning also belongs to what is called “Machine learning”.
Here you have reached the end of this post. Hope you liked it. Do not hesitate to share any comment if you wish or ask any question! To read more about “Machine Learning” you can click here.