Data Pipeline Run

Data Pipeline Run custom resource

The Data Pipeline Run resource represents an execution of a single Data Pipeline. When a Data Pipeline Run resource is created, the system will create a Job which executes the pipeline. This resource is used internally by the Data Pipeline resource controller and should not be created by an end-user; instead, use the DataPipelineService API.

Relationships to Other Resources

The execution of a Model Pipeline Run creates one or more Recipe Run resources.
A Data Pipeline Run is created for the execution of a Data Pipeline.

DataPipelineRun API Reference

DataPipelineRun represent one execution of the data pipeline

Name Type Description Required
apiVersion string data.modela.ai/v1alpha1 true
kind string DataPipelineRun true
metadata object Refer to the Kubernetes API documentation for the fields of the `metadata` field. true
spec object DataPipelineRunSpec defines the desired state of a schema
true
status object DataPipelineRunStatus defines the observed state of DataPipelineRun
false

DataPipelineRun.spec

↩ Parent

DataPipelineRunSpec defines the desired state of a schema

Name Type Description Required
aborted boolean Set to true to abort the pipeline run

Default: false
false
datapipelineName string The data product

Default:
false
labRef object The Lab where the data pipeline run.
false
owner string The owner of the run, set to the owner of the pipeline

Default: no-one
false
paused boolean Set to true to pause the pipeline run

Default: false
false
priority enum The priority of this data pipeline. The default is medium.

Enum: low, medium, high, urgent
Default: medium
false
resources object Specify the resources for the data pipeline run
false
versionName string The data product version of the run

Default:
false

DataPipelineRun.spec.labRef

↩ Parent

The Lab where the data pipeline run.

Name Type Description Required
apiVersion string API version of the referent.
false
fieldPath string If referring to a piece of an object instead of an entire object, this string should contain a valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. For example, if the object reference is to a container within a pod, this would take on a value like: "spec.containers{name}" (where "name" refers to the name of the container that triggered the event) or if no container name is specified "spec.containers[2]" (container with index 2 in this pod). This syntax is chosen only to have some well-defined way of referencing a part of an object. TODO: this design is not final and this field is subject to change in the future.
false
kind string Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
false
name string Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
false
namespace string Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/
false
resourceVersion string Specific resourceVersion to which this reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency
false
uid string UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids
false

DataPipelineRun.spec.resources

↩ Parent

Specify the resources for the data pipeline run

Name Type Description Required
cpuImage object Reference to the managed CPU trainer image, used internally
false
gpuImage object Reference to the managed GPU trainer image, used internally
false
requirements object The custom resource requirements for the workload, which are used if `WorkloadName` is not set
false
workloadName string If this resource is based on the workload, this field contain the name of the workload. The name of a WorkloadClass. The system will use the resource requirements described by the WorkloadClass
false

DataPipelineRun.spec.resources.cpuImage

↩ Parent

Reference to the managed CPU trainer image, used internally

Name Type Description Required
apiVersion string API version of the referent.
false
fieldPath string If referring to a piece of an object instead of an entire object, this string should contain a valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. For example, if the object reference is to a container within a pod, this would take on a value like: "spec.containers{name}" (where "name" refers to the name of the container that triggered the event) or if no container name is specified "spec.containers[2]" (container with index 2 in this pod). This syntax is chosen only to have some well-defined way of referencing a part of an object. TODO: this design is not final and this field is subject to change in the future.
false
kind string Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
false
name string Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
false
namespace string Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/
false
resourceVersion string Specific resourceVersion to which this reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency
false
uid string UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids
false

DataPipelineRun.spec.resources.gpuImage

↩ Parent

Reference to the managed GPU trainer image, used internally

Name Type Description Required
apiVersion string API version of the referent.
false
fieldPath string If referring to a piece of an object instead of an entire object, this string should contain a valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. For example, if the object reference is to a container within a pod, this would take on a value like: "spec.containers{name}" (where "name" refers to the name of the container that triggered the event) or if no container name is specified "spec.containers[2]" (container with index 2 in this pod). This syntax is chosen only to have some well-defined way of referencing a part of an object. TODO: this design is not final and this field is subject to change in the future.
false
kind string Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
false
name string Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
false
namespace string Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/
false
resourceVersion string Specific resourceVersion to which this reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency
false
uid string UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids
false

DataPipelineRun.spec.resources.requirements

↩ Parent

The custom resource requirements for the workload, which are used if WorkloadName is not set

Name Type Description Required
limits map[string]int or string Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
false
requests map[string]int or string Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
false

DataPipelineRun.status

↩ Parent

DataPipelineRunStatus defines the observed state of DataPipelineRun

Name Type Description Required
conditions []object
false
endTime string CompletionTime is the end time of the pipeline

Format: date-time
false
failureMessage string Update in case of terminal failure message
false
failureReason string Update in case of terminal failure Borrowed from cluster api controller
false
lastUpdated string Last time the object was updated

Format: date-time
false
logs object Holds the location of log paths
false
observedGeneration integer ObservedGeneration is the Last generation that was acted on

Format: int64
false
output object the resulting dataset from the flow
false
phase string the phase of the run

Default: Pending
false
progress integer Pipeline progress Progress in percent, the progress takes into account the different stages of the pipeline

Format: int32
false
recipeRuns []string RecipeRuns is the names of the recipe runs that occur during running of the pipeline.
false
startTime string StartTime is the start time of the pipeline

Format: date-time
false

DataPipelineRun.status.conditions[index]

↩ Parent

DataPipelineRunCondition describes the state of a data processor run at a certain point.

Name Type Description Required
status string Status of the condition, one of True, False, Unknown.
true
type string Type of account condition.
true
lastTransitionTime string Last time the condition transitioned from one status to another.

Format: date-time
false
message string A human readable message indicating details about the transition.
false
reason string The reason for the condition's last transition.
false

DataPipelineRun.status.logs

↩ Parent

Holds the location of log paths

Name Type Description Required
bucketName string The name of the VirtualBucket resource where the logs are stored
false
containers []object The collection of ContainerLog objects that describe the location of logs per container
false

DataPipelineRun.status.logs.containers[index]

↩ Parent

ContainerLog describes the location of logs for a single Job

Name Type Description Required
container string The container name
false
job string The name of the Job
false
key string The path to the log in the bucket
false

DataPipelineRun.status.output

↩ Parent

the resulting dataset from the flow

Name Type Description Required
bucketName string In the case of the location type being an object storage system, BucketName is the name of the VirtualBucket resource that exists in the same tenant as the resource specifying the DataLocation. Modela will connect to the external object storage system, and will access the file from the path specified by the Path field

Default:
false
connectionName string In the case of the type of location being a database, ConnectionName specifies the name of the Connection resource that exists in the same tenant as the resource specifying the DataLocation. Modela will attempt to connect to the database using the credentials specified in the Connection, and will execute the query specified by the SQL field

Default:
false
database string The name of a database inside the database system specified by the ConnectionName field

Default:
false
path string The path to a flat-file inside an object storage system. When using the Modela API to upload files (through the FileService API), Modela will upload the data to a predetermined path based on the Tenant, DataProduct, DataProductVersion, and resource type of the resource in relation to the file being uploaded. The path does not need to adhere to this format; you can still pass the path of a file inside a bucket not managed by Modela

Default:
false
sql string The SQL statement which will be executed to query data from the table specified by Table

Default:
false
table string The name of a table inside a database, if applicable

Default:
false
topic string The name of the streaming topic (currently unsupported)

Default:
false
type string The type of location where the data resides, which can either be an object inside an object storage system (i.e. Minio), a SQL location like a table or a view, a data stream (i.e. Kafka, currently unsupported), or a web location (currently unsupported)

Default: object
false