by Dr Aditya Narvekar, Assistant Professor, Data Science at S P Jain School of Global Management.
Data science is a sunrise field that everyone from students to industry experts are talking about. Data science is a broad field encompassing multiple fields such as machine learning, natural language processing and Big Data to name a few. However when most people refer to data science they are generally referring to machine learning which is also loosely called “Artificial intelligence”. Artificial intelligence or AI is another broad term which includes several fields. In this article we will focus on machine learning. Machine learning is a basically collection of algorithms that when combined with big data leads to models that learn patterns that are not easily discernible to humans or even to the data analytics software’s that operate on smaller data sets. Machine learning algorithms in most organisations are built, tested and deployed by a new type of employee called the “Data Scientist”. A data scientist is a part computer scientist, part mathematician and part statistician. If this sounds difficult then you are right because it is difficult to find people who can master all 3 fields and that is the reason, why data scientist roles are some of the highest paying roles in the industry. The U.S. Bureau of Labor Statistics sees strong growth for data science jobs skills in its prediction that the data science field will grow about 28% through 2026.With rapid advancements in technology, even non-technology firms are now able to use machine learning to improve their business processes and decision. This means two things: increase in the demand for data scientists and associated roles and increase in the demand for industry specific roles. The increased demand is already evident in the job market with data scientists having an average salary of $111,000. While these trends are reported in the United States, the same trends apply in all major economies of the world. Data Science and its applications are no longer the preserve of Big Tech firms. Non tech firms have also begun to apply machine learning in their industries.
This increased demand among non-technology firms such as pharma, airlines means that data scientist who specialize in a particular industry are likely to earn a premium for their skills. The question is how educational institutions can train students to be ready for such a job market. We look at this problem from the perspective of undergraduate and post graduate students. Undergraduate students are in the process of earning their first degree while post graduate students are likely to have spent a few years in the industry.
With undergraduates, we cannot expect students to know which industry they want to get into and build a career. Our experience with undergraduates has been that students are open minded and are in the process of forming their opinions at the undergraduate level. Therefore, it is imperative to create a course curriculum which is broad while being focused on common skills demanded by the data science industry. As mentioned earlier, a data scientist is someone who has advanced knowledge of mathematics and statistics and their application in world of data science so an undergraduate must learn not only the basics of mathematics and statistics but also the application of these concepts in the industry. Such application can be demonstrated by using case studies and examples based on application of the concepts in a real-world setting. A curriculum should include not only the theoretical concepts but also include the opportunity to apply these in the industry. Capstone projects are an effective way to allow students to apply their theoretical learnings to a real-world problem. Similarly, interns included in the curriculum are also an effective way to gain experience with applying machine learning. The advantage of such as industry-based curriculum is that gives undergraduates an exposure to application of common concepts across several real-world problems. For an industry perspective, hiring students who have some experience of application helps reduce the time required to train fresh graduates to a level where they become productive employees.
At the master’s level, students are expected to have spent some time in the industry and therefore are a bit clearer about their career goals and about the industry they prefer to work in. Therefore, curriculum for master’s level programs must include several electives that cover major industries such as finance, pharma, marketing. Having industry specific electives with capstone projects will ensure that students apply the concepts learnt in class in a real world setting where they are interested. While undergraduate students are not expected to know anything about a specific industry graduate student are expected to know a bit about the industry they want to work in or interests them. The curriculum must facilitate a mapping between the industry that the student is interested in and the application of data science in that industry. Once again, case studies, capstone projects and internships are imperative to enable this mapping.
Irrespective of whether the student is a graduate or undergraduate student, curriculum should be based on skills that can be directly applied to industry problems. The data science industry tends to an extremely hands on industry. Knowing the mathematics and statistics and computer science that power machine learning algorithms is obviously important but not enough by itself. Data science practitioners are expected to be able to apply these concepts to build software applications that demonstrate the ability to find obscure patterns in the data. This application must also work in a real world setting and deliver tangible benefits to business. For example, a clustering application should be able to find natural clusters in a firm’s customer database so that the firm can launch marketing campaigns that are custom designed for those group of customers. Therefore, a curriculum which is hands on and provides student the ability to build such products that can be applied across industry over several different problem domains is a must. Curriculums which include the latest tools such as sci-kit learn for machine learning and TensorFlow or Pytorch for deep learning are a must for enabling students to not only get hired in the industry but also to transition smoothly into a productive employee.
visit EasyShiksha for skill development