Create Batch Predictions

Make batch predictions on an unlabeled dataset

Overview

You can make a large number of predictions on an unlabeled dataset through the Prediction resource. Creating a Prediction instantiates a workload that downloads your dataset from a shared location, and runs each row of it through a specified Predictor. The workload can save the results to a file on on a Virtual Bucket or directly to a database table.

Create a new Prediction by navigating to the Inference section on the Data Product sidebar.

Upload Data

Select the remote location of the dataset that will be predicted on.

The source for your dataset can be a flat-file or a SQL statement to be executed on a database table. When the Prediction is created, the prediction workload will download a snapshot of the dataset and begin the batch prediction.

Output Location

Select the output location, which can be a file or database table.

After the Prediction’s workload is complete, it will upload the results to the output location with the specified file type. This file can be downloaded by using the Download method of the Prediction API, or you can access the file on the bucket directly.
All output datasets can also be downloaded through the Modela frontend UI at the Prediction list page.

Result Processing

Specify how the prediction results are processed before storage.

  • Update Strategy: If the output location already exists, the update strategy denotes how to modify the file/database table
    • Update/Insert: Insert new records and update existing ones (update all)
    • Insert: Insert new records and do not update existing ones
    • Update: Update existing records and do not insert new ones
  • Create Table If Not Exist: If the output location is a database table, denotes if it should be created if it does not exist
  • Add Features to the Output: Indicates if the features of the input dataset will be included in the output dataset. If false, the output dataset will include only the prediction result.
  • Add Explanation: If true, SHAPley values for each predicted row will be included as as an additional column of the dataset
  • Detect Outlier: If true, an additional column will be included indicating if each predicted row is an outlier

Scheduled Predictions

The Cron Prediction resource provides an interface for creating repeatable Prediction workloads using chronic time intervals or API-based triggers. When the Cron Prediction fires, it creates a new Prediction resource based off of the PredictionTemplate field included in its specification. The resource will track the execution of the Prediction and will save the results to it’s status.

Create Cron Prediction

To create a new Cron Prediction, navigate to the Cron Predictions tab the Data Product sidebar. Creating a Cron Prediction is the same as creating a normal Prediction, with the added option to define the schedule that the Cron Prediction will be executed on.

Specify the schedule for the Cron Prediction to be executed, based on a cron string.

CronPredictionSchedule