Data Pipeline Run
The Data Pipeline Run resource represents an execution of a single Data Pipeline. When a Data Pipeline Run resource is created, the system will create a Job which executes the pipeline. This resource is used internally by the Data Pipeline resource controller and should not be created by an end-user; instead, use the DataPipelineService API.
Relationships to Other Resources
The execution of a Model Pipeline Run creates one or more Recipe Run resources.
A Data Pipeline Run is created for the execution of a Data Pipeline.
DataPipelineRun API Reference
DataPipelineRun represent one execution of the data pipeline
Name | Type | Description | Required |
---|---|---|---|
apiVersion | string | data.modela.ai/v1alpha1 | true |
kind | string | DataPipelineRun | true |
metadata | object | Refer to the Kubernetes API documentation for the fields of the `metadata` field. | true |
spec | object |
DataPipelineRunSpec defines the desired state of a schema |
true |
status | object |
DataPipelineRunStatus defines the observed state of DataPipelineRun |
false |
DataPipelineRun.spec
DataPipelineRunSpec defines the desired state of a schema
Name | Type | Description | Required |
---|---|---|---|
aborted | boolean |
Set to true to abort the pipeline run Default: false |
false |
datapipelineName | string |
The data product Default: |
false |
labRef | object |
The Lab where the data pipeline run. |
false |
owner | string |
The owner of the run, set to the owner of the pipeline Default: no-one |
false |
paused | boolean |
Set to true to pause the pipeline run Default: false |
false |
priority | enum |
The priority of this data pipeline. The default is medium. Enum: low, medium, high, urgent Default: medium |
false |
resources | object |
Specify the resources for the data pipeline run |
false |
versionName | string |
The data product version of the run Default: |
false |
DataPipelineRun.spec.labRef
The Lab where the data pipeline run.
Name | Type | Description | Required |
---|---|---|---|
apiVersion | string |
API version of the referent. |
false |
fieldPath | string |
If referring to a piece of an object instead of an entire object, this string should contain a valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. For example, if the object reference is to a container within a pod, this would take on a value like: "spec.containers{name}" (where "name" refers to the name of the container that triggered the event) or if no container name is specified "spec.containers[2]" (container with index 2 in this pod). This syntax is chosen only to have some well-defined way of referencing a part of an object. TODO: this design is not final and this field is subject to change in the future. |
false |
kind | string |
Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds |
false |
name | string |
Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names |
false |
namespace | string |
Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ |
false |
resourceVersion | string |
Specific resourceVersion to which this reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency |
false |
uid | string |
UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids |
false |
DataPipelineRun.spec.resources
Specify the resources for the data pipeline run
Name | Type | Description | Required |
---|---|---|---|
cpuImage | object |
Reference to the managed CPU trainer image, used internally |
false |
gpuImage | object |
Reference to the managed GPU trainer image, used internally |
false |
requirements | object |
The custom resource requirements for the workload, which are used if `WorkloadName` is not set |
false |
workloadName | string |
If this resource is based on the workload, this field contain the name of the workload. The name of a WorkloadClass. The system will use the resource requirements described by the WorkloadClass |
false |
DataPipelineRun.spec.resources.cpuImage
Reference to the managed CPU trainer image, used internally
Name | Type | Description | Required |
---|---|---|---|
apiVersion | string |
API version of the referent. |
false |
fieldPath | string |
If referring to a piece of an object instead of an entire object, this string should contain a valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. For example, if the object reference is to a container within a pod, this would take on a value like: "spec.containers{name}" (where "name" refers to the name of the container that triggered the event) or if no container name is specified "spec.containers[2]" (container with index 2 in this pod). This syntax is chosen only to have some well-defined way of referencing a part of an object. TODO: this design is not final and this field is subject to change in the future. |
false |
kind | string |
Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds |
false |
name | string |
Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names |
false |
namespace | string |
Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ |
false |
resourceVersion | string |
Specific resourceVersion to which this reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency |
false |
uid | string |
UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids |
false |
DataPipelineRun.spec.resources.gpuImage
Reference to the managed GPU trainer image, used internally
Name | Type | Description | Required |
---|---|---|---|
apiVersion | string |
API version of the referent. |
false |
fieldPath | string |
If referring to a piece of an object instead of an entire object, this string should contain a valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. For example, if the object reference is to a container within a pod, this would take on a value like: "spec.containers{name}" (where "name" refers to the name of the container that triggered the event) or if no container name is specified "spec.containers[2]" (container with index 2 in this pod). This syntax is chosen only to have some well-defined way of referencing a part of an object. TODO: this design is not final and this field is subject to change in the future. |
false |
kind | string |
Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds |
false |
name | string |
Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names |
false |
namespace | string |
Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ |
false |
resourceVersion | string |
Specific resourceVersion to which this reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency |
false |
uid | string |
UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids |
false |
DataPipelineRun.spec.resources.requirements
The custom resource requirements for the workload, which are used if WorkloadName
is not set
Name | Type | Description | Required |
---|---|---|---|
limits | map[string]int or string |
Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
false |
requests | map[string]int or string |
Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
false |
DataPipelineRun.status
DataPipelineRunStatus defines the observed state of DataPipelineRun
Name | Type | Description | Required |
---|---|---|---|
conditions | []object |
|
false |
endTime | string |
CompletionTime is the end time of the pipeline Format: date-time |
false |
failureMessage | string |
Update in case of terminal failure message |
false |
failureReason | string |
Update in case of terminal failure Borrowed from cluster api controller |
false |
lastUpdated | string |
Last time the object was updated Format: date-time |
false |
logs | object |
Holds the location of log paths |
false |
observedGeneration | integer |
ObservedGeneration is the Last generation that was acted on Format: int64 |
false |
output | object |
the resulting dataset from the flow |
false |
phase | string |
the phase of the run Default: Pending |
false |
progress | integer |
Pipeline progress Progress in percent, the progress takes into account the different stages of the pipeline Format: int32 |
false |
recipeRuns | []string |
RecipeRuns is the names of the recipe runs that occur during running of the pipeline. |
false |
startTime | string |
StartTime is the start time of the pipeline Format: date-time |
false |
DataPipelineRun.status.conditions[index]
DataPipelineRunCondition describes the state of a data processor run at a certain point.
Name | Type | Description | Required |
---|---|---|---|
status | string |
Status of the condition, one of True, False, Unknown. |
true |
type | string |
Type of account condition. |
true |
lastTransitionTime | string |
Last time the condition transitioned from one status to another. Format: date-time |
false |
message | string |
A human readable message indicating details about the transition. |
false |
reason | string |
The reason for the condition's last transition. |
false |
DataPipelineRun.status.logs
Holds the location of log paths
Name | Type | Description | Required |
---|---|---|---|
bucketName | string |
The name of the VirtualBucket resource where the logs are stored |
false |
containers | []object |
The collection of ContainerLog objects that describe the location of logs per container |
false |
DataPipelineRun.status.logs.containers[index]
ContainerLog describes the location of logs for a single Job
Name | Type | Description | Required |
---|---|---|---|
container | string |
The container name |
false |
job | string |
The name of the Job |
false |
key | string |
The path to the log in the bucket |
false |
DataPipelineRun.status.output
the resulting dataset from the flow
Name | Type | Description | Required |
---|---|---|---|
bucketName | string |
In the case of the location type being an object storage system, BucketName is the name of the VirtualBucket resource that exists in the same tenant as the resource specifying the DataLocation. Modela will connect to the external object storage system, and will access the file from the path specified by the Path field Default: |
false |
connectionName | string |
In the case of the type of location being a database, ConnectionName specifies the name of the Connection resource that exists in the same tenant as the resource specifying the DataLocation. Modela will attempt to connect to the database using the credentials specified in the Connection, and will execute the query specified by the SQL field Default: |
false |
database | string |
The name of a database inside the database system specified by the ConnectionName field Default: |
false |
path | string |
The path to a flat-file inside an object storage system. When using the Modela API to upload files (through the FileService API), Modela will upload the data to a predetermined path based on the Tenant, DataProduct, DataProductVersion, and resource type of the resource in relation to the file being uploaded. The path does not need to adhere to this format; you can still pass the path of a file inside a bucket not managed by Modela Default: |
false |
sql | string |
The SQL statement which will be executed to query data from the table specified by Table Default: |
false |
table | string |
The name of a table inside a database, if applicable Default: |
false |
topic | string |
The name of the streaming topic (currently unsupported) Default: |
false |
type | string |
The type of location where the data resides, which can either be an object inside an object storage system (i.e. Minio), a SQL location like a table or a view, a data stream (i.e. Kafka, currently unsupported), or a web location (currently unsupported) Default: object |
false |
Feedback
Was this page helpful?
Glad to hear it!
Sorry to hear that.