The AI & Analytics Engine offers a variety of machine learning algorithms for each problem type available within the Engine.
Learn more about machine learning problem types: clustering, classification and regression.
The model templates are grouped by:
-
Supervised learning
-
Unsupervised learning
Supervised learning
Supervised machine learning is an approach using labeled datasets. These datasets are designed to train or “supervise” algorithms for classifying data or predicting continuous outcomes. By using paired inputs and labels, these models can be properly evaluated and improved with more data available over time.
Regression
-
AdaBoost regressor
-
Bayesian ridge regression
-
Decision tree regressor
-
Extremely randomized trees (Extra-trees) regressor
-
Gradient boosting regressor
-
K-Nearest neighbors (KNN) regressor
-
LightGBM regressor
-
Linear regression (with support of GPU, multi-GPU)
-
Ridge regression (with support of multi-GPU)
-
Mini-batch SGD (stochastic gradient descent) regressor (with support of GPU)
-
Random forest regressor (with support of GPU, multi-GPU)
-
XGBoost Regressor (with support of GPU, multi-node, multi-GPU)
Classification
-
AdaBoost classifier
-
Random forest classifier (with support of GPU, multi-GPU)
-
K-Nearest neighbors (KNN) classifier
-
Logistic regression (with support of GPU)
-
Mini-batch SGD (stochastic gradient descent) classifier (with support of GPU)
-
Decision tree classifier
-
Extremely randomized trees (Extra-trees) classifier
-
Gradient boosting classifier
-
LightGBM classifier
-
Gaussian Naive Bayes
-
XGBoost classifier (with support of GPU, multi-node, multi-GPU)
Unsupervised learning
Unsupervised learning is a branch of machine learning where algorithms (model templates) are trained on unlabeled data to identify hidden patterns. The main types of problems under this category are clustering, anomaly detection, association, dimensionality reduction, and topic modeling. The Engine supports the following algorithms (templates):
Clustering
-
Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) with dimensionality reduction achieved via Uniform Manifold Approximation and Projection (UMAP)
-
Gaussian Mixture Model (Spark ML)
-
K-Means (Spark ML)