Data science practitioners course

Introduction

Data science is the practice of extracting knowledge from massive amounts of data, using methods such as statistics, machine learning, data mining, and predictive analytics.

IBM SkillsBuild for Academia

Overview SA DA Practitioner

This course challenges you to take on the different roles involved in a data science team, solving end-to-end real-world scenarios across different industries.

Objectives

Data science practitioners

Use advanced Data science methods and tools, leveraging statistical sciences, machine learning technologies and industry-specific datasets, to implement unique data models that can solve challenging problems across all industries.

Learning objectives:

  • Understand the evolution and relevance of Data science in the world today
  • Explore end-to-end Data science industry use cases using the data analytics lifecycle
  • Understand the scientific method employed in projects, and the Data Science team’s key roles
  • Acquire technical expertise using popular open-source Data Science frameworks including Jupyter notebooks and Python
  • Gain a competitive edge using low-code cloud-based platform for Data Science – IBM Watson Studio
  • Understand data engineering and data modeling practices using machine learning
  • Explore Data science industry case studies: transportation, automotive, human resources, aerospace, banking, and healthcare
  • Experience teamwork and agile industry practices using design thinking
  • Engage in role-playing challenge-based scenarios to propose real-world solutions.
Overview SA DA Practitioner Objective Article

Data science is revolutionizing the way organizations solve problems and gain competitive advantage.

What is Data science?

In the Data Science domain, solving problems and answering questions through data analysis is standard practice. Often, data scientists construct a model to predict outcomes or discover underlying patterns, with the aim to gain better insights.

Organizations can incorporate these insights to act and improve future outcomes. There are numerous rapidly evolving technologies to help analyze data and build models. In a remarkably short time, there has been rapid progress from desktops to hosting massive parallel warehouses with huge volumes of data; this way, there
is a palpable transformation from in-database analytics functionalities in relational databases to unstructured big data tools.

Analytics on unstructured or semi-structured data is becoming increasingly important to incorporate sentiment and other useful information written in natural language into predictive models; this often leads to significant improvements in model quality and accuracy.

Emerging analytics approaches seek to automate the steps in model building and application, making machine learning (ML) technology a necessary evolution towards modern Data science.

Successful ML projects require a combination of algorithms + data + team, and a very powerful computing infrastructure.

Data Scientist ranks among the top three emerging jobs

Although Data science as a field has existed for several decades, the rapid growth of artificial intelligence (AI) in business in the last five years has generated a demand for data scientists that far surpass the availability of trained professionals. Today, 63% of executives cite a lack of talent as a prime barrier to adopting AI technology[1]. This talent gap is an opportunity for aspiring professionals and a challenge for companies striving for a competitive advantage in the market.

According to the LinkedIn Emerging Jobs report[2], 2020, Data Scientist has topped the ‘Emerging Jobs’ list for three years running and is projected to grow at 37% annually. It’s a specialty that’s continuing to grow significantly across all industries, attributed to the evolution of previously existing jobs and increased emphasis on data in academic research.

What skills does a Data scientist need to be successful?

Data Science is a cross-disciplinary set of skills found at the intersection of statistics, computer programming, and domain expertise. It comprises three distinct and overlapping areas:

  • Statistics, to model and summarize data sets
  • Computer science, to design and use algorithms to store, process and visualize data
  • Domain expertise, necessary to formulate the right questions and to put the answers in context
  • Other skills often missed are:
    1. Leadership
    2. Teamwork
    3. Communication

[1] Francesco Brenna, Giorgio Danesi, Glenn Finch, Brian Goehring and Manish Goyal. “Shifting toward Enterprise-grade AI: Resolving data and skills gaps to realize value.” IBM Institute for Business Value, September 2018. https://ibm.com/downloads/cas/QQ5KZLEL

[2] “LinkedIn U.S. Emerging Jobs Report”, LinkedIn, 2020. https://business.linkedin.com/content/dam/me/business/en-us/talent-solutions/emerging-jobs-report/Emerging_Jobs_Report_U.S._FINAL.pdf

Case study

Wunderman Thompson + IBM: Elevating Machine learning with data and AI

Advertising giant, Wunderman Thompson engaged IBM to help employ machine learning for better discovery of human insights – insights that help increase ROI for their clients. With the aid of IBM Watson Studio and open-source tools, the company and its clients now spend more time in discovery and hypothesis creation and less time in mundane tasks.
Wunderman Thompson + IBM

A customized algorithm to detect potential fraud

With AI on IBM Cloud driven fraud-detection algorithm, Thélem assurances, a France-based insurance company was able to detect five times more potential fraud. This resulted in reduced costs, greater flexibility, and the ability to pre-empt any possible fraudulence. ibm.com/case-studies/thelem-assurances-hybrid-cloud-services

Journey

  • Expanding the knowledge and understanding of the topic through lectures, training, examples, videos, and quizzes.
    Lectures: approx. 90 min.

    Lecture 1 – Data science landscape

    • Data science overview
    • Data science domains
    • Data science roles

    Lecture 2 – Data science methodology

    • Data analytics in practice
    • Data analytics methodologies
    • Data science method

    Lecture 3 – Data science on the cloud

    • Integrated environment for Data science projects
    • Cloud-based Data science lifecycle
    • Data science capabilities on the cloud

    Lecture 4 – Explore and prepare data

    • Business understanding
    • Explore data
    • Prepare data
    • Understanding data

    Lecture 5 – Represent and transform data

    • Statistics and representation techniques
    • Data transformation
    • Represent and transform unstructured data
    • Data transformation tools

    Lecture 6 – Data visualization and presentation

    • Decision-centered visualizations
    • Fundamentals of visualizations
    • Common graphs
    • Common tools

    Lecture 7 – Data modeling

    • Overview of modeling techniques
    • Machine learning techniques
    • Accuracy, precision, and recall
    • Model deployment

    Lecture 8 – Machine learning algorithms

    • About machine learning
    • From regression to neural nets
    • Decision tree classifier
    • Machine learning framework
  • Actual implementation of the concepts learned through simulations, hands-on labs, and games.
    Lab session: approx. 120 min.

    Lab 1 – IBM Cloud and Watson Studio

    • Create an IBM Cloud account
    • Create a Watson Studio service

    Lab 2 – Explore and understand data

    • Explore data with Watson Studio
    • Understand data with Data Refinery

    Lab 3 – Data preparation and conversion

    • Cleanse data
    • Prepare and transform data

    Lab 4 – Visualization

    • Data Refinery visualization

    Lab 5 – Building and deploying with AutoAI

    • AutoAI process

    Lab 6 – Analyze insurance fraud

    • Create Jupyter notebook
    • Import libraries
    • Understand the data
    • Feature engineering

    Lab 7 – Predict insurance fraud

    • Work with visual recognition
    • Use prebuilt models
    • Understand custom classifiers
    • Utilize object detection
  • Realization of real-world impact of topics covered through exposure to industry case studies.
    Group work session: 16 hrs Use case.

    Use of Enterprise Design Thinking for a Data science project.

    Use case: San Francisco Crime Statistics Scenario Steps:

    • Ideation
    • Empathize with three personas:
      1. Retail Manager
      2. Crime Analyst
      3. PoliceLieutenant
    • Envision the solution

    Industry Scenarios

    Banking

    Ensure loan fairness

    How can a bank formulate fair and unbiased predictions about prospective loan applicants? Train a machine learning model to identify and overcome bias to help generate a fair outcome.

    Analyze bank marketing data

    How can a bank predict if a customer will buy a certificate of deposit? Use ML-trained algorithms to anticipate a customer’s banking behavior.

    Advertising

    Discover hidden Facebook usage insights

    How can an advertising firm stay up to date about public perceptions of its advertising methodologies? Enrich Facebook data to extract keywords, entities, sentiment, and tone in posts mentioning the ads.

    Natural disasters

    Predict wildfire intensity

    How can the intensity of a wildfire be predicted in order to determine where firefighting assets should be staged? Use NASA satellite data and machine learning to map wildfire occurrences and indicate where firefighting equipment should be placed.

Tools

This course uses the following tools:

  • AutoAI
  • IBM Cloud
  • IBM Data Refinery
  • IBM Object Storage
  • IBM Watson Machine Learning
  • IBM Watson Studio
  • IBM Watson Visual Recognition
  • Jupyter Notebook
  • Matplotlib
  • Node.js
  • NumPy
  • Pandas
  • PixieDust
  • Python
  • scikit-learn
  • XGBoost

Prerequisites

Instructor prerequisites

Facilitators delivering this course have taken the course previously and successfully passed the exam.

  • Avid speaker with good presentation skills
  • Pedagogical group management skills
  • Encourage critical thinking and domain exploration
  • Experience handling data sets and IP copyrights

Learner prerequisites

Individuals with an active interest in applying for entry-level jobs to work in cybersecurity-related fields.

  • Familiarity with statistics
  • Basic IT Literacy skills*

*Basic IT Literacy – Refers to skills required to operate at the user level a graphical operating system environment such as Microsoft Windows® or Linux Ubuntu®, performing basic operating commands such as launching an application, copying and pasting information, using menus, windows and peripheral devices such as mouse and keyboard. Additionally, users should be familiar with internet browsers, search engines, page navigation, and forms.

Digital credential

Practitioner Certificate

IBM Data Science Practitioner Certificate

IBM Data Science Practitioner Certificate

See badge

About this certificate

Through validated Data science instructor-led training, this credential earner has acquired the skills and understanding of data science foundational concepts and technologies. They have demonstrated proficiency and understanding of Data Science technical topics and design thinking. The earner has gained the ability to apply the concepts and technology of Data Science with the applicable open source tools that are relevant to real world Data science scenarios, suitable for educational purposes.

Skills

Collaboration, Communication, Data cleansing, Data collection, Data engineering, Data operations, Data refinery, Data Science, Data science foundations, Data science methodology, Data visualization, Data wrangling, Deep learning, Design Thinking, Empathy, Experience design, IBM Cloud, IBM Watson, Ideation, Machine learning, Matplotlib, Model deployment, Model visualization, Natural language understanding, pandas, Personas, Problem-solving, Storyboarding, Teamwork, Use cases, User-centered design, User-centric, User experience, User research, UX, Visual recognition, Watson discovery, Watson Studio.

Criteria

  • Must attend a training session at a higher education institution implementing the IBM Skills Academy program.
  • Must have completed the instructor-led Data science practitioners training.
  • Must have earned the Enterprise Design Thinking Practitioner Badge.
  • Must pass the Data science practitioners exam and satisfactorily complete the group exercise.

Instructor Certificate

IBM Data Science Practitioner Certificate Instructor Badge

IBM Data Science Practitioner Certificate: Instructor

See badge

About this certificate

Through an IBM instructor-led workshop, this credential earner has gained skills in data science concepts, technology, and use cases. They have demonstrated proficiency in these topics: Data science foundations, Data gathering, Data Understanding, Data Modeling and Optimization, Design Thinking for Data Science, and Data Science industry use cases. The earner demonstrates a capacity to teach the data science course by applying pedagogical skills to drive the group work using challenged-based scenarios.

Skills

Advisor, Communication, Data cleansing, Data collection, Data Engineering, Data operations, Data refinery, Data Science, Data science foundations, Data science methodology, Data visualization, Data wrangling, Deep learning, Design Thinking, Empathy, Experience design, IBM Cloud, IBM Watson, Ideation, Lecturer, Machine learning, Matplotlib, Model deployment, Model visualization, Natural language understanding, pandas, Personas, Problem-solving, Storyboarding, Teamwork, Trainer, Use cases, User-centered design, User-centric, User experience, User research, UX, Visual recognition, Watson discovery, Watson Studio.

Criteria

  • Must be a designated instructor from a higher education institution that has or is implementing the IBM Skills Academy program.
  • Must have completed the IBM Data science practitioners – Instructors workshop.
  • Must have earned the Enterprise Design Thinking Practitioner Badge.
  • Must fulfill the requirements of IBM Skills Academy teaching validation process.