DataSource

Reference documentation for DataSource

modela.data.DataSource module

class DataSource

– Bases: Resource

  • __init__(item=, client=None, namespace=’’, name=’’, version=‘v0.0.1’, bucket=‘default-minio-bucket’, infer_file=None, infer_dataframe=None, infer_bytes=None, target_column=’’, file_type=None, task_type=None, csv_config=None, excel_config=None)

  • Parameters

    • client – The Data Source client repository, which can be obtained through an instance of Modela.

    • namespace – The target namespace of the resource.

    • name – The name of the resource.

    • version – The version of the resource.

    • bucket (str) – If data is provided for inference then a bucket must be provided.

    • infer_file (Optional[str]) – If specified, the SDK will attempt read a file with the given path and will upload it to analyse the columns and generate a schema that will be applied to the resource.

    • infer_dataframe (Optional[DataFrame]) – If specified, the Pandas DataFrame will be serialized and uploaded to analyse the columns and generate a schema that will be applied to the resource.

    • infer_bytes (Optional[bytes]) – If specified, the raw byte data will be uploaded to analyse the columns and generate a schema that will be applied to the resource.

    • target_column (str) – The name of the target column used when training a model. This parameter only has effect when data is uploaded to infer a schema.

    • file_type (Optional[FlatFileType]) – The file type of raw data, used when ingesting a Dataset from a file, or creating a data snapshot from a database source. If inferring from a dataframe, the file type will default to CSV.

    • task_type (Optional[TaskType]) – The target task type in relation to the data being used.

    • csv_config (Optional[CsvFileFormat]) – The CSV file format of the raw data.

    • excel_config (Optional[ExcelNotebookFormat]) – The Excel file format of the raw data.

  • column(name) → Column – Get the column with the specified name from the schema

  • default()

  • schema

  • spec: DataSourceSpec

  • target_column: Column

class DataSourceClient

– Bases: object

  • __init__(stub, modela)

  • create(datasource) → bool

  • delete(namespace, name) → bool

  • get(namespace, name) → Union[DataSource, bool]

  • infer(namespace, location, file_type=FlatFileType.Csv, data_source=None, version=‘v0.0.1’) → List[ColumnProfile]

  • list(namespace) → Union[List[DataSource], bool]

  • update(datasource) → bool