DataSource

Reference documentation for DataSource

modela.data.DataSource module

class DataSource

– Bases: Resource

__init__(item=, client=None, namespace=’’, name=’’, version=‘v0.0.1’, bucket=‘default-minio-bucket’, infer_file=None, infer_dataframe=None, infer_bytes=None, target_column=’’, file_type=None, task_type=None, csv_config=None, excel_config=None)
Parameters
- client – The Data Source client repository, which can be obtained through an instance of Modela.
- namespace – The target namespace of the resource.
- name – The name of the resource.
- version – The version of the resource.
- bucket (str) – If data is provided for inference then a bucket must be provided.
- infer_file (Optional[str]) – If specified, the SDK will attempt read a file with the given path and will upload it to analyse the columns and generate a schema that will be applied to the resource.
- infer_dataframe (Optional[DataFrame]) – If specified, the Pandas DataFrame will be serialized and uploaded to analyse the columns and generate a schema that will be applied to the resource.
- infer_bytes (Optional[bytes]) – If specified, the raw byte data will be uploaded to analyse the columns and generate a schema that will be applied to the resource.
- target_column (str) – The name of the target column used when training a model. This parameter only has effect when data is uploaded to infer a schema.
- file_type (Optional[FlatFileType]) – The file type of raw data, used when ingesting a Dataset from a file, or creating a data snapshot from a database source. If inferring from a dataframe, the file type will default to CSV.
- task_type (Optional[TaskType]) – The target task type in relation to the data being used.
- csv_config (Optional[CsvFileFormat]) – The CSV file format of the raw data.
- excel_config (Optional[ExcelNotebookFormat]) – The Excel file format of the raw data.
column(name) → Column – Get the column with the specified name from the schema
default()
schema
spec: DataSourceSpec
target_column: Column

class DataSourceClient

– Bases: object

__init__(stub, modela)
create(datasource) → bool
delete(namespace, name) → bool
get(namespace, name) → Union[DataSource, bool]
infer(namespace, location, file_type=FlatFileType.Csv, data_source=None, version=‘v0.0.1’) → List[ColumnProfile]
list(namespace) → Union[List[DataSource], bool]
update(datasource) → bool

Feedback

Was this page helpful?

Glad to hear it!

Sorry to hear that.