In 2001, William S. Cleveland combined data mining with computer science to take advantage of the computing power to expand the possibilites of data mining, he called this combination “Data Science“.
Data Science has been a trending field in recent times due to the amount of data that we create constantly in this digital age coupled with the computing power that became available with the advancements of technology.
Jobs in data science have one of the best industry growth rates over the next decade and remain one of the most difficult positions for employers to fill.
The question is now is how to become a data scientist?
The Journal of Data Science describes data science as almost everything that has something to do with data: collecting, analyzing, modeling…… yet the most important part is its applications.
Data scientists are a new breed of analytical data experts who have the technical skills to solve complex problems – and the curiosity to explore what problems need to be solved.
According to Jonathan Ma data scientist at Facebook, “being a good data scientist is not about how advanced your models are, but how your work can solve problems.”
To understand the role of a data scientist, we have to understand the data science hierarchy of needs.
As you noticed, at the bottom of the pyramid we have collecting, storing and transforming. These three data science needs fall into the roles of a data engineers and software developers.
Data Engineers are the data professionals who prepare the “big data” infrastructure to be analyzed by Data Scientists. They and software engineers design, build, integrate data from various resources, and manage big data.
Data Scientists normally have three main tasks, which are to analyze, aggregate and optimize the data for the company. Although there has been a lot of hype about AI and deep learning, these tasks are normally undertaken by a research scientist or a machine learning engineer.
If you want to be a data scientist, here are some of the skills and topics that you would have to pick up.
SQL is a critical tool to use in data science, mostly to prep and extract datasets. Structured Query Language (SQL) is important to extract and analyze data stored in databases.
With SQL you can extract data, join tables together, and perform aggregations. At NEXT Academy’s Full Stack Web Development bootcamp you will learn to write SQL queries to create, read, update and delete data from a database.
Data scientists need to understand business metrics for data-driven companies.
They need to be able to analyze and explore the data better and readily map it to possible increase in revenue, profitability or reducing the risk.
There are three metrics: Revenue, Profitability and Risk Metrics. Revenue metric is related to sales and marketing, Profitability is related to efficiency of operations, Risk is related to sustainability given present cash-flow conditions.
A/B testing is a useful tool to leverage this power to uncover causal relationships between product changes and business metrics. The statistical methods are straightforward, and the range of applications are diverse.
A/B testing is important because in some cases, external factors like day of the week or traffic inflow can have a greater impact on business metrics than specific product changes.
The benefits of this process is why A/B testing is a crucial part of how data scientists make decisions. Data scientists at Scribd used A/B testing to help the product, design and engineering teams leverage that power to improve their service.
It is important to learn R or Python to manipulate, visualize, and interpret data. The two most popular programming tools for data science work are Python and R according to a Data Science Survey conducted by O’Reilly.
Data science is only a small portion within the diverse Python ecosystem. Python’s suite of specialized deep learning and other machine learning libraries includes popular tools like scikit-learn, Keras, and TensorFlow, which enable data scientists to develop sophisticated data models that plug directly into a production system.
We spoke to Dr. Gloria Teng, data researcher who is also a NEXT Academy alumni! Gloria is passionate about improving the teaching of statistics with the use of technology, her research includes data analytics and statistical modelling. If anyone is new to programming and would like to venture into Machine Learning, AI or data science, she would recommend Python as the first programming language to learn.
Dr. Gloria Teng recommends mastering Python to learn data science for the following reasons:
1) Learning Python as a general-purpose high level programming language helps to develop logical and computational thinking.
2) Python has complete libraries with less statistical jargon that are easy for beginners to learn.
3) Python is being increasingly used in designing machine learning algorithms.
If you are deciding between Python and R for data science, learning both tools and using them for their respective strengths can only improve yourself as a data scientist.
Start by learning Python syntax and important concepts in programming such as Object-oriented programming as well as encapsulation at NEXT Academy’s Full Stack Web Development bootcamp.
Effective communication skills are essential to become a data scientist, being able to communicate with multiple stakeholders using data is a key attribute.
“Being at the intersection of business, technology, and data, data scientists need to be adept at telling a story to each of the stakeholders.” says Anand Rao, global artificial intelligence and innovation lead for data and analytics at consulting firm PwC.
This includes communicating the business benefits of data to business executives, the challenges with data quality, privacy and confidentiality, as well as other areas of interest to the organization.
“A great data scientist involves finding someone who has somewhat contradictory skill sets: intelligence to handle data processing and create useful models; and an intuitive understanding of the business problem they’re trying to solve, the structure and nuances of the data, and how the models work.” says Lee Barnes, head of Paytronix Data Insights at business software provider Paytronix Systems.
After you have mastered these skills it’s time to go out there and apply for a data science position.