Data Pipeline Run

Data Pipeline Run custom resource

The Data Pipeline Run resource represents an execution of a single Data Pipeline. When a Data Pipeline Run resource is created, the system will create a Job which executes the pipeline. This resource is used internally by the Data Pipeline resource controller and should not be created by an end-user; instead, use the DataPipelineService API.

Relationships to Other Resources

The execution of a Model Pipeline Run creates one or more Recipe Run resources.
A Data Pipeline Run is created for the execution of a Data Pipeline.

DataPipelineRun API Reference

DataPipelineRun represent one execution of the data pipeline

Name	Type	Description	Required
apiVersion	string	data.modela.ai/v1alpha1	true
kind	string	DataPipelineRun	true
metadata	object	Refer to the Kubernetes API documentation for the fields of the `metadata` field.	true
spec	object	DataPipelineRunSpec defines the desired state of a schema	true
status	object	DataPipelineRunStatus defines the observed state of DataPipelineRun	false

DataPipelineRun.spec

^{^{↩ Parent}}

DataPipelineRunSpec defines the desired state of a schema

Name	Type	Description	Required
aborted	boolean	Set to true to abort the pipeline run Default: false	false
datapipelineName	string	The data product Default:	false
labRef	object	The Lab where the data pipeline run.	false
owner	string	The owner of the run, set to the owner of the pipeline Default: no-one	false
paused	boolean	Set to true to pause the pipeline run Default: false	false
priority	enum	The priority of this data pipeline. The default is medium. Enum: low, medium, high, urgent Default: medium	false
resources	object	Specify the resources for the data pipeline run	false
versionName	string	The data product version of the run Default:	false

DataPipelineRun.spec.labRef

^{^{↩ Parent}}

The Lab where the data pipeline run.

Name	Type	Description	Required
apiVersion	string	API version of the referent.	false
fieldPath	string	If referring to a piece of an object instead of an entire object, this string should contain a valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. For example, if the object reference is to a container within a pod, this would take on a value like: "spec.containers{name}" (where "name" refers to the name of the container that triggered the event) or if no container name is specified "spec.containers[2]" (container with index 2 in this pod). This syntax is chosen only to have some well-defined way of referencing a part of an object. TODO: this design is not final and this field is subject to change in the future.	false
kind	string	Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds	false
name	string	Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names	false
namespace	string	Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/	false
resourceVersion	string	Specific resourceVersion to which this reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency	false
uid	string	UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids	false

DataPipelineRun.spec.resources

^{^{↩ Parent}}

Specify the resources for the data pipeline run

Name	Type	Description	Required
cpuImage	object	Reference to the managed CPU trainer image, used internally	false
gpuImage	object	Reference to the managed GPU trainer image, used internally	false
requirements	object	The custom resource requirements for the workload, which are used if `WorkloadName` is not set	false
workloadName	string	If this resource is based on the workload, this field contain the name of the workload. The name of a WorkloadClass. The system will use the resource requirements described by the WorkloadClass	false

DataPipelineRun.spec.resources.cpuImage

^{^{↩ Parent}}

Reference to the managed CPU trainer image, used internally

Name	Type	Description	Required
apiVersion	string	API version of the referent.	false
fieldPath	string	If referring to a piece of an object instead of an entire object, this string should contain a valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. For example, if the object reference is to a container within a pod, this would take on a value like: "spec.containers{name}" (where "name" refers to the name of the container that triggered the event) or if no container name is specified "spec.containers[2]" (container with index 2 in this pod). This syntax is chosen only to have some well-defined way of referencing a part of an object. TODO: this design is not final and this field is subject to change in the future.	false
kind	string	Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds	false
name	string	Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names	false
namespace	string	Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/	false
resourceVersion	string	Specific resourceVersion to which this reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency	false
uid	string	UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids	false

DataPipelineRun.spec.resources.gpuImage

^{^{↩ Parent}}

Reference to the managed GPU trainer image, used internally

Name	Type	Description	Required
apiVersion	string	API version of the referent.	false
fieldPath	string	If referring to a piece of an object instead of an entire object, this string should contain a valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. For example, if the object reference is to a container within a pod, this would take on a value like: "spec.containers{name}" (where "name" refers to the name of the container that triggered the event) or if no container name is specified "spec.containers[2]" (container with index 2 in this pod). This syntax is chosen only to have some well-defined way of referencing a part of an object. TODO: this design is not final and this field is subject to change in the future.	false
kind	string	Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds	false
name	string	Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names	false
namespace	string	Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/	false
resourceVersion	string	Specific resourceVersion to which this reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency	false
uid	string	UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids	false

DataPipelineRun.spec.resources.requirements

^{^{↩ Parent}}

The custom resource requirements for the workload, which are used if WorkloadName is not set

Name	Type	Description	Required
limits	map[string]int or string	Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/	false
requests	map[string]int or string	Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/	false

DataPipelineRun.status

^{^{↩ Parent}}

DataPipelineRunStatus defines the observed state of DataPipelineRun

Name	Type	Description	Required
conditions	[]object		false
endTime	string	CompletionTime is the end time of the pipeline Format: date-time	false
failureMessage	string	Update in case of terminal failure message	false
failureReason	string	Update in case of terminal failure Borrowed from cluster api controller	false
lastUpdated	string	Last time the object was updated Format: date-time	false
logs	object	Holds the location of log paths	false
observedGeneration	integer	ObservedGeneration is the Last generation that was acted on Format: int64	false
output	object	the resulting dataset from the flow	false
phase	string	the phase of the run Default: Pending	false
progress	integer	Pipeline progress Progress in percent, the progress takes into account the different stages of the pipeline Format: int32	false
recipeRuns	[]string	RecipeRuns is the names of the recipe runs that occur during running of the pipeline.	false
startTime	string	StartTime is the start time of the pipeline Format: date-time	false

DataPipelineRun.status.conditions[index]

^{^{↩ Parent}}

DataPipelineRunCondition describes the state of a data processor run at a certain point.

Name	Type	Description	Required
status	string	Status of the condition, one of True, False, Unknown.	true
type	string	Type of account condition.	true
lastTransitionTime	string	Last time the condition transitioned from one status to another. Format: date-time	false
message	string	A human readable message indicating details about the transition.	false
reason	string	The reason for the condition's last transition.	false

DataPipelineRun.status.logs

^{^{↩ Parent}}

Holds the location of log paths

Name	Type	Description	Required
bucketName	string	The name of the VirtualBucket resource where the logs are stored	false
containers	[]object	The collection of ContainerLog objects that describe the location of logs per container	false

DataPipelineRun.status.logs.containers[index]

^{^{↩ Parent}}

ContainerLog describes the location of logs for a single Job

Name	Type	Description	Required
container	string	The container name	false
job	string	The name of the Job	false
key	string	The path to the log in the bucket	false

DataPipelineRun.status.output

^{^{↩ Parent}}

the resulting dataset from the flow

Name	Type	Description	Required
bucketName	string	In the case of the location type being an object storage system, BucketName is the name of the VirtualBucket resource that exists in the same tenant as the resource specifying the DataLocation. Modela will connect to the external object storage system, and will access the file from the path specified by the Path field Default:	false
connectionName	string	In the case of the type of location being a database, ConnectionName specifies the name of the Connection resource that exists in the same tenant as the resource specifying the DataLocation. Modela will attempt to connect to the database using the credentials specified in the Connection, and will execute the query specified by the SQL field Default:	false
database	string	The name of a database inside the database system specified by the ConnectionName field Default:	false
path	string	The path to a flat-file inside an object storage system. When using the Modela API to upload files (through the FileService API), Modela will upload the data to a predetermined path based on the Tenant, DataProduct, DataProductVersion, and resource type of the resource in relation to the file being uploaded. The path does not need to adhere to this format; you can still pass the path of a file inside a bucket not managed by Modela Default:	false
sql	string	The SQL statement which will be executed to query data from the table specified by Table Default:	false
table	string	The name of a table inside a database, if applicable Default:	false
topic	string	The name of the streaming topic (currently unsupported) Default:	false
type	string	The type of location where the data resides, which can either be an object inside an object storage system (i.e. Minio), a SQL location like a table or a view, a data stream (i.e. Kafka, currently unsupported), or a web location (currently unsupported) Default: object	false

Feedback

Was this page helpful?

Glad to hear it!

Sorry to hear that.