The first step in a machine learning project is to understand the business (organizational) problem in its totality. This is essential because it reduces the possibility of spending a lot of effort producing the correct answers to the wrong questions. This step corresponds to the “Business Understanding” phase of the CRISP-DM model.
The goal is to define the business problem (or to identify a data-intensive business problem), propose a potential machine learning solution and build a preliminary implementation plan.
- Clearly and concisely state the problem from a business perspective, e.g., “The business is losing $60m annually due to fraudulent claims.”
- Define the objective of the project from a business perspective, e.g., “Reduce fraud by 20% in the coming 6 months.” Business objectives typically fall in the categories of increasing revenue, reducing risk, reducing costs, and gaining competitive advantage.
- Acquire the domain (subject matter) expertise necessary to creating a strategy for solving the problem
- Determine whether machine learning is the best method to solve the problem: some problems can be easier solved with traditional software or a spreadsheet while others require equipment and expertise that may be out of reach
- Re-state the business problem in a way that can be solved with machine learning methods (see Types of Problems That Machine Learning Can Solve), e.g., the business objective “Reduce fraudulent medical insurance claims” can be re-sated as “Predict if a medical insurance claim is fraudulent, given the procedure performed, diagnosis, prescribed drugs, size of claim, and patient data (age, BMI, occupation, gender)”; in this case, classification would be a candidate machine method. This task helps identify an initial set of machine learning methods for solving the business problem.
- Define success and failure criteria from business perspective as well as from a machine learning perspective. These help teams avoid wasting effort and remain aligned with the project goals. Criteria must be specific and readily measurable, e.g., “Reduce fraud by 18% by the end of 6 months”; however, they can be described in subjective terms, e.g., “Find useful insights from this data”, in which case the person who will make the subjective decision must be identified. Machine learning criteria includes choice of performance metric and minimum acceptable performance, e.g., “Precision of 0.85”.
- Perform a survey of similar use cases in your industry or adjacent industries
- Define how the solution will be implemented in the workflow
- Develop a high-level implementation plan that can be iterated on as more information is discovered including project schedule, tools and techniques, risk assessment, team members, and resource allocation
- Project Sponsor
- Domain Expert
- Business User
- Database Administrator
- Data Scientist/Machine Learning Engineer
- Data Developer/Software Engineer
The size of the team and its actual constitution depend chiefly on the scope of the project, the capabilities of the team members and available software, hardware and data.