ML is getting automated and easy: What does it mean for your career?

Adrien
4 min readJun 1, 2021

I was looking around for material for people to move to ML. I came across the CS50 from Harvard on EDX and I thought I would try it.

Doing most of the projects only take a couple of hours at most. But the one that struck me is the German traffic sign recognition. Like many people, I learned TensorFlow when it was difficult to program. When coding neural networks was more complex and every single parameter needed to be defined and controlled.

CS50 project Now

Harsh truth is that building ML has become fairly easy. Completing this project:https://cs50.harvard.edu/ai/2020/projects/5/traffic/ will not take more than 20 minutes to someone who knows TensorFlow.

The project is simply about building an algorithm that recognises German road signs and classifies them. When building this, I realised that there is no more complexity in building this. The code fits in 50 line using keras and you can achieve great accuracy.

latest code

The progress of Keras and TensorFlow made this project really easy and compact.

As shown in the notebook the github project, accuracy is easy to reach even with limited knowledge of CNN or deep learning. https://github.com/adfr/traffic.

TF code

I remembered when I was first learning deep learning in 2015, the code was definitely not as neat. Tensorflow needed a lot of variables to be defined and the same model would look like the screenshot on the left side [credit to https://github.com/hparik11/German-Traffic-Sign-Recognition]

and that creating it from scratch and hyper-focused on 1 problem, you can look here:https://gist.github.com/jamesloyys/ff7a7bb1540384f709856f9cdcdee70d for a very simple neural network.

More or less, most projects you find online can be completed in such an easy manner and there is no real need for:

  • Deep understanding of the algorithms
  • Deep understanding of the mathematics
  • Deep understanding of the engineering

GCP, AWS embedded ML

Going even further, one does not even need to code at all thanks to the use of cloud service providers. Most of them allow for codeless machine learning.

Since you can outsource most of the thinking to the cloud service provider and they build highly scalable solutions, why would we even need to have data scientists trained to build and analyse these algorithms?

Truth is, many of the “AI software” can detect correlations and other variables better than data analysts and data scientists and they can often propose a model for it.

What does it mean for Data Science?

The value is no longer in applying an algorithm that can be learned on one online course. There are so many people who can do this. Sometimes it feels that everyone is doing an ML certificate, which is good but it does mean that the value of data science is elsewhere.

You must find the value elsewhere

  • Scalability still poses a problem, especially when dealing with complex systems. Despite all the advancement of the platform to handle scalability, many still have a fair amount of complexity when it comes to the implementation in the systems especially with legacy or other. In addition, the problem above does not deal with building a descent Data and ML pipeline that should be required to deal with this kind of data set.
  • Business understanding. Being able to drive a data strategy and see the value when driving the business is useful and create business value
  • Mathematical and statistical accuracy. Having almost anyone able to run an ML algorithm does not mean the application is correct. Multiple issues can still happen. Some are very classical but there are many cases where untrained people won’t be able to detect mistakes. In addition, many models require a more sophisticated approach that needs a deeper mathematical understanding. This is where mathematicians and statisticians still shine in the ability to ensure the correctness of what is happening.
  • In-depth knowledge is still required. Most projects in the online course are easy… True. Let think of the example above. Was the project useful? yes, it recognized the traffic signs with 98% accuracy but does it make it useful? The answer is obviously no. Pictures were well-framed data, data fairly limited, quality is good. If you were to use this in real life just for capturing the speed limit for instance, the car will have to handle real-time processing and the camera will see a million things on the road. It is not because you solve a data science problem that the problem is solved

What does it mean for Data Scientists & Data Analysts?

Sharpen your skills. You cannot just be a guy doing nice notebooks out-of-the-box models.

Don’t get stuck in the middle

--

--

Adrien

Strategy/Data/Leadership head of DS at OCBC ~~ exTwitter ~~ ex-gojek