DataSource
modela.data.DataSource module
class DataSource
– Bases: Resource
-
__init__
(item=, client=None, namespace=’’, name=’’, version=‘v0.0.1’, bucket=‘default-minio-bucket’, infer_file=None, infer_dataframe=None, infer_bytes=None, target_column=’’, file_type=None, task_type=None, csv_config=None, excel_config=None) -
Parameters
-
client – The Data Source client repository, which can be obtained through an instance of Modela.
-
namespace – The target namespace of the resource.
-
name – The name of the resource.
-
version – The version of the resource.
-
bucket (str) – If data is provided for inference then a bucket must be provided.
-
infer_file (Optional[str]) – If specified, the SDK will attempt read a file with the given path and will upload it to analyse the columns and generate a schema that will be applied to the resource.
-
infer_dataframe (Optional[DataFrame]) – If specified, the Pandas DataFrame will be serialized and uploaded to analyse the columns and generate a schema that will be applied to the resource.
-
infer_bytes (Optional[bytes]) – If specified, the raw byte data will be uploaded to analyse the columns and generate a schema that will be applied to the resource.
-
target_column (str) – The name of the target column used when training a model. This parameter only has effect when data is uploaded to infer a schema.
-
file_type (Optional[FlatFileType]) – The file type of raw data, used when ingesting a Dataset from a file, or creating a data snapshot from a database source. If inferring from a dataframe, the file type will default to CSV.
-
task_type (Optional[TaskType]) – The target task type in relation to the data being used.
-
csv_config (Optional[CsvFileFormat]) – The CSV file format of the raw data.
-
excel_config (Optional[ExcelNotebookFormat]) – The Excel file format of the raw data.
-
-
column(name) → Column – Get the column with the specified name from the schema
-
default()
-
schema
-
spec: DataSourceSpec
-
target_column: Column
class DataSourceClient
– Bases: object
-
__init__
(stub, modela) -
create(datasource) → bool
-
delete(namespace, name) → bool
-
get(namespace, name) → Union[DataSource, bool]
-
infer(namespace, location, file_type=FlatFileType.Csv, data_source=None, version=‘v0.0.1’) → List[ColumnProfile]
-
list(namespace) → Union[List[DataSource], bool]
-
update(datasource) → bool
Feedback
Was this page helpful?
Glad to hear it!
Sorry to hear that.