Modeling: Introduction
Table of Contents

A machine learning model is a mathematical representation of the values contained in a dataset and their relationships to each other. Often a simple file, it is the output of a “training” process in which a machine learning algorithm is optimized to extract certain patterns from a dataset. In supervised learning the model is used to make predictions on new data while in unsupervised learning it contains the natural groupings (clusters, associations, etc) in the data.

Since it is impossible to know in advance which machine learning algorithm will perform the best for a given problem the only way is to try out as many algorithms as possible. This is known as the “No Free Lunch Theorem”.

Goals #

  • Train the best performing model possible within the constraints of the project, e.g., available data, IT resources, time, etc
  • Build a reusable software artifact that produces reliable results in the future

Some algorithms impose certain requirements on the form of data, often making it necessary to go back to the data preparation phase.

Tasks #

  • Decide type of machine learning problem, i.e, supervised learning, unsupervised learning or reinforcement learning
  • Devise set of modeling experiments including model validation procedure
  • Train and evaluate models
  • Compare competing models
  • Select, tune and debug the most suitable model
  • Save deployable artifacts, e.g., data preparation pipeline, models, etc

Roles #

  • Data Scientist
  • Data Engineer
  • Data Scientist
  • Business User
  • Domain Expert
  • Project Sponsor
  • Follow on LinkedIn here
  • Join Facebook group here

Powered by BetterDocs

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top