Unlike theoretical models, small projects or data science competitions, enterprise machine learning is carried out on machine learning platforms. Platforms increase productivity, boost team collaboration, improve time-to-value and security, and provide full lifecycle tools for creating, deploying and managing machine learning models.
There are three main categories of enterprise machine learning platforms: code-first, automation-focused and workflow-focused. The choice of platform should be guided by business needs, use cases and machine learning talent strategy. The platforms are not mutually exclusive; it is common for organizations to use platforms in each category and from different vendors at the same time. It is also common for a platform to have characteristics of two or three categories.
Code-first Platforms #
These tools are built primarily for data scientists and use programming interfaces (known as notebooks) as the primary interface. They provide workbench tools for coding in R, Python, and other programming languages using open source Jupyter, Zeppelin, RStudio, or a proprietary interfaces that makes coding more efficient.
Pros: they allow users to optimize each step of their model lifecycle and to incorporate the latest machine learning algorithms as soon as they are available, enabling them to build the most sophisticated models immediately
Cons: they require programming skill that takes significant time and effort to learn
Major vendors include Microsoft, Cloudera, Domino Data Lab, Google, MathWorks, Amazon Web Services, RStudio, Oracle, Anaconda, OpenText, Databricks, and Civis Analytics.
Automation-focused Platforms #
Also referred to as automated machine learning (AutoML), these tools implement machine learning best practices in software, enabling users to build and deploy models by configuring parameters in each step instead of coding, visual programming or drawing a complex workflow.
Pros: they empower non-data scientists to do end-to-end machine learning while boosting the productivity of existing data scientists; very short time-to-value – users with prepared data can train, evaluate and deploy models in less than one hour
Cons: they limit users to the capability built in by the tool developer (however, some provide code interfaces that they allow data scientists to incorporate their own code into workflows or to customize some of the processes within the workflow)
Major vendors include DataRobot, H2O.ai, dotData, EdgeVerve and Aible, Big Squid, Bell Integrator, Squark, and DMway Analytics.
Workflow-focused Platforms #
These platforms provide a workbench that includes tools such as a graphical user interface, configuration wizards, data science steps package in graphical nodes, automation of data science steps, and coding environments that allow users to build the end-to-end life cycle of a machine learning project through visual programming or drawing workflows. They allow non-data scientists to build data pipelines and machine learning models as well as collaborate with data scientists.
Pros: similar to automation-focused platforms; in addition they allow greater control of the processes within a workflow
Cons: workflow-focused platforms users require more skill and take longer to build models than with AutoML
Major vendors include IBM, SAS, RapidMiner, Dataiku, TIBCO Software, Alteryx, KNIME, Samsung SDS, BigML, Altair and Minitab.
Selecting an Enterprise Machine Learning Platform #
With thousands of vendors in the world of machine learning, AI and data as shown in the landscape by Matt Turck, choosing the right platform can be a daunting task. The ideas presented above can be used as a starting point.
Product roadmap: Vendor has a strong product roadmap for innovation and financial resources to support growth. Check sources such as Crunchbase, Gartner Magic Quadrant for Data Science and Machine Learning Platforms, Forrester Wave™: Multimodal Predictive Analytics And Machine Learning, and Forrester Wave™: Notebook-Based Predictive Analytics And Machine Learning.
Support for diverse users: Tools for engineers, executives and business users to collaborate and be productive across the entire machine learning lifecycle
Support complete machine learning lifecycle: From data access to model management
Best practice: Inbuilt testing and validation; produces transparent, clear, and easy to interpret machine learning models
Scalable: Handles from small data to big data to accommodate current and future data
Flexible: Runs on both on-premise and in the cloud; works on commodity hardware
Open source: Leverages open source innovation and takes upgrades seamlessly
Algorithms: Has up-to-date set of algorithms to accommodate different types of data and different use cases
Quick deployment: Enables quick deployment of models to maximize business value
Solution accelerators: Has inbuilt business solutions (templates) that make it easier and faster to build machine learning solutions
Introduction to Machine Learning Platforms: Videos #
The videos listed below illustrate the steps involved in building machine learning models using enterprise machine learning platforms.
RapidMiner Studio: https://youtu.be/Gg01mmR3j-g
RapidMiner Auto Model: https://rapidminer.com/products/auto-model/?wvideo=jd8esbqgfh
Dataiku Data Science Studio: https://youtu.be/kHF-vFxzNGE
KNIME Analytics Platform: https://youtu.be/RpkSGvQ5voo
Microsoft Azure Machine Learning: https://youtu.be/csFDLUYnq4w
Jupyter Notebook: https://youtu.be/jZ952vChhuI
Cloudera Machine Learning Platform: https://youtu.be/ELyipT8IpLg
Orange Data Mining – though not an enterprise platform, it is great for learning and understanding machine learning: https://www.youtube.com/channel/UClKKWBe2SCAEyv7ZNGhIe4g
Online references were accessed on 7 March 2022