A History of AI
We chose key moments in the history of AI and Machine Learning. The first was "Deep Learning" that launched a decade ago that made it accessible and easy to use.
The others were "Visual Machine Learning" and "Machine Learning" to create new methods that could have a better understanding of the neural network.
It is exciting that this is the case. It is a new and innovative approach that will eventually be used on the public-health market.
Note: This intro text is generated with the GPT-2 Model by OpenAI, based on the first sentence: 'We chose key moments in the history of AI and Machine Learning.'
Collected and Compiled by Willem Hendriks & Robin van Tilburg, as part of the Data & AI Summit IBM 2020 - BeNeLux. We had great fun collecting and deciding the key moments, and hope you will have in reading about them.
Disclaimer: Many important persons and events didn't make it on this final short-list, we are aware and chose compactness over completeness.
It is difficult to exactly pinpoint where in history the first Linear Regression Model was applied, and who was responsible for this.
An examination of publications of Sir Francis Galton and Karl Pearson revealed that Galton's work on inherited characteristics of sweet peas led to the initial conceptualization of linear regression.
Galton's first regression line was presented at a lecture in 1877. Only under Pearson's later treatment did 'r' come to stand for the correlation coefficient (Pearson 1896).
In 1896, Pearson published his first rigorous treatment of correlation and regression in the Philosophical Transactions of the Royal Society of London. In this paper, Pearson credited Bravais (1846) with ascertaining the initial mathematical formulae for correlation. Pearson noted that Bravais happened upon the product-moment (that is, the “moment” or mean of a set of products) method for calculating the correlation coefficient but failed to prove that this provided the best fit to the data.
Is it really Machine Learning, when a computer/algorithm learns to fit a line through points on a flat surface? Or maybe if its dimension is 3, 10, or 100?
Many times we call systems intelligent when it is able to provide us the solution, in a large space of possible answers (search). The right move in a game, the right object on a photo, the right answer to some question formed in natural language.
Fitting the best line through a set of points, is perhaps the most clean and simple version of this, and maybe therefore the reason many Machine Learning courses start with Linear Regression.
The first prototype of an electronic device to perform statistics calculation was built by John Atanasoff Statistical computing was mostly done in statistical labs, using mechanical tabulators with punched cards.
This first electronic device had a lot of similarities with modern computers. It was electronic and had a memory unit, a central processor, and binary arithmetic. The same as your smart-phone!
Atanasoff built this machine to solve linear equations. An often told story is that Atanasoff set off on a long drive across Iowa to think about this problem sometime during the winter of 1937-1938.
Several hundred miles later, at a roadside bar in Illinois, the basic elements for a machine to solve systems of linear equations were conceived.
Did you know that Linear Regression comes down to solving a linear equation? This machine is the first Linear Regression Solver Machine.
I propose to consider the question, ‘Can machines think?’ This should begin with definitions of the meaning of the terms ‘machine’ and ‘think’...
This is how Alan Turing started his famous paper in 1950. The 'machine' part of his question has advanced extremely since then, but the 'think' is still difficult to grasp, and we have made tiny steps to define and understand it, if at all...
To avoid the difficulty of defining 'think', Alan introduced the Imitation Game, later branded as the Turing Test; A test which challenges a subject to guess if he interacts with a machine, or a real human.
Ever doubted if the person on the other side is a human or machine? Try teach it something new, because learning something new with little information is something humans still excel at, at least, for now...
The first man using the term Machine-Learning was Arthur Samuel, an IBM employee who in 1959 used this term.
“field of study that gives computers the ability to learn without being explicitly programmed”.
Humans are able to understand and explain basic arithmetic and math to computers. But so far no human was able to lay down simple rules that make a computer see the difference between a cat and a dog. The computer learned this herself.
To less extend the same goes for complex games, like Chess, Go, and Poker, where the impossible task of transferring some masters intuition to rules into a computer rises.
Intuition is something we cannot explain, and the same time the very thing we want a computer be able to to. Humans can do so much, and can explain so little. This paradox of being capable of far more than you can explain is Polyani paradox.
Arthur's definition is still valid today, many times the computer's learning ability amazes us. Games like Poker, Go, and complex computer games all are no more a challenge. Fortunately there is enough left to learn for computers, like empathy and love for instance, and perhaps the most dangerous and interesting; learn to learn new things by itself...
Can this be considered the first (commercial) Machine-Learning software?
The point where statistics becomes machine learning is not exact, but for sure ML and AI is built on statistical concepts, hence we want to mention the first commercial software packages. Before the 70's! Before the internet!
SAS was developed at North Carolina State University from 1966 until 1976, when SAS Institute was incorporated. SAS was further in the 1980s and 1990s with the addition of new statistical procedures, additional components and the introduction of JMP.
SPSS released its first version in 1968 as the Statistical Package for the Social Sciences (SPSS) after being developed by Norman H. Nie, Dale H. Bent, and C. Hadlai Hull.
Early versions of SPSS Statistics were written in Fortran and designed for batch processing on mainframes, including for example IBM and ICL versions, originally using punched cards for data and program input.
Neural Networks have a long history, going back to before 1950. Mostly experimenting, and creating the mathematical foundation to make them work and learn.
1986 was the year David Rumelhart, Geoffrey Hinton, and Ronald Williams published their paper on Back propagation. Back propagation is the most popular way to make artificial neural networks learn, and will probably be for the next coming years.
Around the 90's many important concepts for neural networks were formed, the lack of compute power was only holding neural networks back to do great things.
Many others have independently discovered back propagation or similar learning algorithms. The paper mentioned above is perhaps the most famous and key point in history.
Yann LeCun used and mentioned the back propagation learning algorithm in his famous PhD thesis one year later, 1987, where he explained Convoluted Neural Networks (CNN), and use CNN based networks to recognize postal code characters, with an extreme high accuracy!
Modern Neural Networks (often called Deep Learning), contain much more layers, and come in many variations; VGG, Inception, ResNet, ResNeXt, DenseNet, the list is endless! They are all examples of post 2010 architectures. All have 2 things in common, they are trained with back propagations, and in the many layers you can find CNN layers.
LSTM, a special kind of neural network which had the notion of a memory, was discovered about 10 years later by Sepp Hochreiter and Jürgen Schmidhuber. Notice how all important foundations for modern Deep Learning were done before 2000.
Yann LeCun and Yoshua Benjio, and Geoffrey Hinton received the Turing Award in 2018, about 30 years later, as recognition for their contributions.
In classical statistical methods like linear regression, we have a fixed dataset, and a fixed hypothesis to test.
In the era of data mining, we let the computer explore and mine in the data itself, usually by crawling subsets in databases. What will she find?!
The step from a classical methods like linear regression, to data mining, can be seen as the first dimension explosion, the space the computer needed to explore and find parameters for got bigger, and our hardware and software could support the needed resources. Databases, CPUs, and software were ready in the 90's.
Association rule learning, DBSCAN Clustering, and SVM are examples of algorithms invented in the 90's, and are often explained as common data mining techniques.
Many moments from this point, we see a jump in complexity of the methods, and innovation in computer hardware to support the methods. We see the same evolution in animals.
We have difficulty to assign the term intelligence to a single cell organism, but for sure we call our own species intelligent, or at least the brighter few among us.
Is data mining Machine Learning? Is Data Mining AI?
The first GPU (Graphics processing unit), which was widely promoted as a 'GPU' on the consumer market, was developed by Nvidia in 1999. This was the GeForce 256 and was used for vector calculations in video games, shooting with guns in a 3D space!
This technology gave an enormous boost to the graphic processing performance and years later it was discovered by scientists and engineers that it would be very useful in speeding up their scientific calculations.
In 2012 Alexnet showed the value of the GPUs calculation powers.
AlexNet was a fast GPU-implementation of a CNN to win an image recognition contest. AlexNet competed in the ImageNet Image Recognition Challenge on September 30, 2012. It achieved a top-5 error of 15.3%. Humans None-Expert level error is around 5%, which only took a few more years to achieve, in 2016. Again, with GPUs, like pretty much all its successors as well.
GPUs are very valuable for machine learning. They can have around 200 times more processors per chip than CPUs. From 2017, Nvidia started to develop GPU's with tensor cores, especially for machine learning.
Who would ever predicted, that shooting in a 3D world, has so many computational similarities with Deep Learning, used in AI? Both are examples of letting a computer doing many relatively small matrix multiplications. Which one do you prefer?
Machine Learning in Practice is maybe invented, or at least made famous by Leo Breiman.
Leo Breiman observed two cultures within the Statistics community, and saw how some of his peers took a very mathematical and formal approach towards predicting, and some a practical approach.
In his paper, he described what are now standard methods at typical 'Machine Learning' courses. He is also the creator of one of the most popular Machine-Learning Algorithms, the Random Forest.
Not many know, Leo Breiman also was involved in founding the ideas for another popular Machine Learning Model, the XGBoost. The idea to improve a weak model, along the derivative of the loss function, by adding a correcting 'booster' on top, was partly from Leo.
If you want to pin point a place in history, where classical statistics and modern machine learning deviate, this paper could be that point, back in 2001.
Such a simple task, let a computer answer a question, but interpreting human language is far more difficult for computers, than it seems at first sight.
In February 2011, IBM’s Watson computer competed on Jeopardy! against the TV quiz shows two biggest all-time champions and won. This was the first time AI could win at a game which required language skills.
The challenge was in the language: The questions on this show are full of subtlety, puns and wordplay. Before this, computers were able to answer straight forward questions, but this combination of understanding language in it's full subtlety and to find the right answer from several options in a knowledge bank was too much asked, until IBM Watson.
IBM Watson was a powerful computer software called Deep QA, developed by IBM Research.
Big Data and yellow elephants. Hadoop 0.1.0 was released in April 2006, and can be coined as the start of the Big Data Hype. Big Data was the magic word for many years. It took a while before people realised we might be part of a classic hype.
It is difficult to exactly point where the Big Data hype peak was at its highest, we chose March 2014, when Horton works raised 100M Dollar in preparation to go public.
According to Hadoops co-founders, Doug Cutting and Mike Cafarella, the genesis of Hadoop was the Google File System paper that was published in October 2003. This paper spawned another one from Google – "MapReduce: Simplified Data Processing on Large Clusters".
Back in 2014, there were many Hadoop distributions/versions. Horton and Cloudera were considered competitors, and are now merged. IBM's Hadoop Distribution, Big Insights, is gone.
2014 is also the year the Big Data Expo started in Utrecht, a yearly event where Hadoop vendors and related companies gathered to show their capabilities.
Years later, the Big Data Expo still exists, but you won't find any Hadoop vendors any more. It has matured, and the tone is not any more on the tooling itself, but more on patterns and how to make a team and company effective at applying Machine Learning.
Spark extended the Hadoop Hype, as an easy to use extension to apply Machine Learning on Big Data. It made it even easier to handle GB, TB, PB of data. Kids nowadays are so spoiled with Spark, the early Hadoop adaptors had to write MapReduce() in .java code! How hardcore!
In the paper "Hidden technical debt in Machine learning systems" an experienced group of ML Practitioners from Google summarized the challenges of operationalizing ML. It turns out it is not as easy as it sounds. Terms like MLOps are introduced covering methods to support a smooth way to create, deploy, and maintain ML assets.
Docker, a container technology, made it possible to nicely package a piece of software, like machine learning, in a consumable unit. THe ML-Ops Filosophy is often supported with container technology. Kubernetes and OpenShift offer container orchestration, making it possible to create, update, and scale any container workload on demand.
Tools, libraries, and methods come and go, but the challenge to operationalize IT easy and robust will always stay, and Machine Learning is no exception. Thinking of pleasant ways to operationalize ML and AI is still a growing topic, and probably will be for the next coming years.
XGBoost algorithm was developed as a research project at the University of Washington. Tianqi Chen and Carlos Guestrin presented their paper at SIGKDD Conference in 2016 and caught the Machine Learning world by fire.
Since its introduction, this algorithm has not only been credited with winning numerous Kaggle competitions but also for being the driving force under the hood for several cutting-edge industry applications.
Kaggle, is the World Championship for Data Science, you compete with other analysts on a clear task given a dataset. In 2017, it reached 1 Million users.
Kaggle is competitive Data Science, and this is a different game than solving real world problems. For once, the dataset and specific task is given to you.
But for sure, XGBoost winning competition after competition, made people curious and apply it on real world problems as well. XGBoost became and still is a very popular model for Machine Learning, powerful and versatile.
It belongs to the family of Boosters, where weaker models stacked on each other produce very sharp end results, as each model has the ability to slightly correct the mistake of the previous.
Remember we used to say... "Computers will never be able to....[INSERT CHALLENGE]". Well, for many years Go was the example to cite as a task AI will probably not be able to do at Human expert level in the coming 10 years. The folks from Deepmind did it, and used Neural Networks as a key component for their AI, together with self learning capabilities.
Checkers, Chess, Go, Poker, Starcraft,... All were seen impossible at some point in time for computers to master, for some reason.
If a expert is among the strongest of the world in a mind game, we call that person Intelligent. In 1997 DeepBlue defeated Garry Kasparov. The definition of Machine Learning doesn't fit DeepBlue well, all the rules were hard-coded, and it was the quantity and exploration that made Deepblue seem strong and smart, Artificial Intelligence.
In this year, a twitter AI bot was released online, and learned to talk with humans. Just after 16 hours, the bot needed to be shut-down. It was too good at learning from humans, and humans had far more fun teaching it bad manners, than having a serious conversation.
This showed us 2 things: Humans can be mean. If AI learns from humans, AI can be mean. Our History hasn't always been pretty and fair, and if we are not careful, AI will simply be a human history amplifier...
Cathy O'Neil observed this, and unfortunately not only theoretical. Her book, weapons of Math Destruction, was published in 2016 as well. She explains the dangers of massively apply Machine-Learning in our society, with shameful examples of things already happened.
The examples where ML and AI goes wrong, are endless. Fortunately there is good news, AI can be taught good manners, perhaps even easier than teaching humans! Let's makes use of that!
AI Ethics & Bias detections are the first steps in this process. In 2018, IBM Released AI Fairness 360, a Open Source software package to detect and mitigate Bias in AI.
Shortly after Cathy's book, a resume candidate recommender had to be taken offline, it was biased against woman.
Also, hackers shown self-driving cars can be fooled with stickers on the road, invisible for humans, making self driving cars take sudden turns.
Another problem with AI, it becomes so good mimicking human behaviour, it can generate realistic human faces, and can even impersonate humans like Presidents, and celebrities.
Is that LinkedIn Request a real person? Was the Photo perhaps machine generate? AI can Fool us!
The other way around is also true, Humans can fool AI. By changing a image invisible for a human, the image of a cute cat can become a truck in the eyes of a AI model. So....
Humans can fool AI, AI can fool humans. Maybe it is fair after all?
Cathy was just finished with her book about the dangers of technology, and people massively paid to put a personal assistant in their home. We certainly are a weird species!
One of first type of Virtual Assistants was Smarter Child (released in 2001). It was text only and could play games, check the weather and look things up. The conversation capability was limited.
The first modern virtual assistant was Siri, which was released in 2011. It was able to set the time, assist at text messages and dial somebody. Later on it became able to perform more complex tasks, like giving directions or advice on a restaurant.
The competition followed swiftly. Google introduced Google Now in 2012, Alexa (Amazon) and Cortana (Micosoft) were introduced in 2014. In 2016/2017, Google assistant was released. Virtual assistants became quite common to consumers, especially on mobile phones and home devices.
The child's imagination of what a Robot was, became reality when in 2019 Boston Dynamics made its first commercially available Robot, spot!
Different than a Robot arm doing repetitive task in a factory, Spot can self balance, avoid or climb over obstacles, and has some sense of her surroundings.
If someone gives you a dataset, asking you to predict the 'PRICE' column, it can be quite a challenge. Automated machine Learning can exactly do this, and they became better and accepted as a way of working in 2019.
In a way, Automated Machine Learning, is letting a computer automatically understand a 'phenomenon' through data, like weather, or grasp some human 'intuition', through data.
'TPOT' is well known Open-Source Automated Machine Learning library. IBM has named hers 'AutoAI'. There are many more, and the list is growing...
Automated Machine learning can return very sharp predictive models, given a fixed dataset. What she can not (yet) do, is automatically combine relevant data from elsewhere, create a richer context so to speak.
Q: What is your favorite animal?
A: My favorite animal is a dog.
A: Because dogs are loyal and friendly.
Q: What are two reasons that a dog might be in a bad mood?
A: Two reasons that a dog might be in a bad mood are if it is hungry or if it is hot.
Q: How many eyes does a giraffe have?
A: A giraffe has two eyes.
Q: How many legs does a frog have?
A: A frog has four legs.
Q: Are there any animals with three legs?
A: No, there are no animals with three legs.
What you saw, was a conversation with the GPT-3 model. It can generate text based on a single sentence, do Q&A like above, translate and transform, actually quite a lot! It is Version 3, the latest and greatest, larger and stronger than all its predecessors. Version 3 is called 'Scary Good!' by some.
GPT-2 is quite OK, and possible to run on a fairly fast computer. For version 3, you need a computer that is at least $100K. Not the average desktop computer in our living room.
But if history repeats, you have the same computer power in the palm of your hands, within a few years.
Will we start relationships with AI like in the movie Her? Will we start a war like in the Terminator?
Or, maybe both?
Will we have Artificial General Intelligence - computers that actually think, learn, and act themselves - in our lifetime?
Fortunately, AI and Machine Learning have already shown to be useful without being self-aware (whatever that may be?). And if you are afraid of wars, killing, and destruction of our world: humans have shown to be pretty good at this without AI.
Maybe what we are really scared of, is something on Earth, that is more intelligent than us. Would we still call it artificial intelligence at that point?