Among the hot topics in the (small) world of data science, we hear “Machine Learning” everywhere. So much so that it becomes almost difficult for those who don’t know what the hell is going on to ask: say, actually, what is machine learning? So let’s take the bull by the horns, and ask ourselves once and for all.
It is on everyone’s lips, under all pencils and in all revolutions… We are of course talking about “Machine Learning”, that unmistakable expression for anyone who spends a little time on the web. But concretely, what is it all about?
Machine Learning, A bit of history
Machine learning is also known as the language of Molière (but we must admit that Machine Learning sounds better, doesn’t it?), This method was however born out of the last rain – we can trace back to mid-18th century and to Thomas Bayes the first foundations of statistical learning.
Why do we speak of “machine learning”? If you’ve turned to Wikipedia for an answer, it’s not sure you’ve found what you are looking for. To put it simply, we talk about learning in the sense that a machine is able to create a model to predict information in the face of new data from stored data that is supplied to it as input (in some ways, like the brain human).
And the more it is offered, ie the higher the basic quantities, the more the machine manages to produce a relevant response when it is faced with an unprecedented datum – the mathematicians of a few centuries ago not being equipped tools as powerful as those available to data scientists today, it is easy to understand why machine learning methods have been booming since the 2000s…
Machine Learning, Normally
There are two main types of algorithms in machine learning: supervised or unsupervised (yes, there are others, but we will not cover them in this post). We speak of supervised algorithms when the input elements are already classified into categories or groups (in other words, there are already examples of relationships between data).
A supervised algorithm will be provided, upstream, with the shape-color correspondence classification, with the database. Thus, when he meets a new form, he will be able to classify it in the appropriate category. An unsupervised algorithm, on the contrary, does not have the labels in advance, and will classify according to the color, or the shape, without being able to label the shapes other than by proximity groups. An unpublished circle will therefore be categorized according to its greater or lesser resemblance to the shapes of the starting groups.
Machine Learning, What for?
As you can imagine, the challenges of machine learning go beyond the simple cataloging of color shapes … Commonly, the most common ML models are considered to be grouped into three categories:
– Classification : a technique which makes it possible to predict the category in which to place new data, for example using a decision tree.
– Regression : here, it is a question of finding the function which best describes the relation between variables of a data set. We give it to you on the spot, linear regression is part of this class.
– Clustering : a family of algorithms allowing to create groups of data so that all the elements of a group are similar, and that the separated groups differ as much as possible. Here is (among others) the k-means method.
Clearly, a vast subject that machine learning, and the theme is likely to gain in popularity over the years: the giants of the web see (and rightly so) a promise of a glorious future. A good reason to spend a little more time on the web and read books to learn machine learning!