What algorithms are available to build machine learning models on the Engine?

The AI & Analytics Engine offers a variety of machine learning algorithms for each problem type available within the Engine.

Learn more about machine learning problem types: clustering, classification and regression.

The model templates are grouped by:

Supervised learning
Unsupervised learning

Supervised learning

Supervised machine learning is an approach using labeled datasets. These datasets are designed to train or “supervise” algorithms for classifying data or predicting continuous outcomes. By using paired inputs and labels, these models can be properly evaluated and improved with more data available over time.

Regression

AdaBoost regressor
Bayesian ridge regression
Decision tree regressor
Extremely randomized trees (Extra-trees) regressor
Gradient boosting regressor
K-Nearest neighbors (KNN) regressor
LightGBM regressor
Linear regression (with support of GPU, multi-GPU)
Ridge regression (with support of multi-GPU)
Mini-batch SGD (stochastic gradient descent) regressor (with support of GPU)
Random forest regressor (with support of GPU, multi-GPU)
XGBoost Regressor (with support of GPU, multi-node, multi-GPU)

Classification

AdaBoost classifier
Random forest classifier (with support of GPU, multi-GPU)
K-Nearest neighbors (KNN) classifier
Logistic regression (with support of GPU)
Mini-batch SGD (stochastic gradient descent) classifier (with support of GPU)
Decision tree classifier
Extremely randomized trees (Extra-trees) classifier
Gradient boosting classifier
LightGBM classifier
Gaussian Naive Bayes
XGBoost classifier (with support of GPU, multi-node, multi-GPU)

Unsupervised learning

Unsupervised learning is a branch of machine learning where algorithms (model templates) are trained on unlabeled data to identify hidden patterns. The main types of problems under this category are clustering, anomaly detection, association, dimensionality reduction, and topic modeling. The Engine supports the following algorithms (templates):

Clustering

Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) with dimensionality reduction achieved via Uniform Manifold Approximation and Projection (UMAP)
Gaussian Mixture Model (Spark ML)
K-Means (Spark ML)