Dataset
- class labelbox.schema.dataset.Dataset(client, field_values)[source]
Bases:
DbObject,Updateable,DeletableA Dataset is a collection of DataRows.
- name
- Type:
str
- description
- Type:
str
- updated_at
- Type:
datetime
- created_at
- Type:
datetime
- row_count
The number of rows in the dataset. Fetch the dataset again to update since this is cached.
- Type:
int
- created_by
ToOne relationship to User
- Type:
Relationship
- organization
ToOne relationship to Organization
- Type:
Relationship
- add_iam_integration(iam_integration: str | IAMIntegration) IAMIntegration[source]
Sets the IAM integration for the dataset. IAM integration is used to sign URLs for data row assets.
- Parameters:
iam_integration – IAM integration object or IAM integration id.
- create_data_row(items=None, **kwargs) DataRow[source]
Creates a single DataRow belonging to this dataset. >>> dataset.create_data_row(row_data=”http://my_site.com/photos/img_01.jpg”)
- Parameters:
items – Dictionary containing new DataRow data. At a minimum, must contain row_data or DataRow.row_data.
**kwargs – Key-value arguments containing new DataRow data. At a minimum, must contain row_data.
- Raises:
InvalidQueryError – If both dictionary and kwargs are provided as inputs
InvalidQueryError – If DataRow.row_data field value is not provided in kwargs.
InvalidAttributeError – in case the DB object type does not contain any of the field names given in kwargs.
ResourceCreationError – If data row creation failed on the server side.
- create_data_rows(items, file_upload_thread_count=20) DataUpsertTask[source]
Asynchronously bulk upload data rows
- Parameters:
items (iterable of (dict or str)) –
- Returns:
Task representing the data import on the server side. The Task can be used for inspecting task progress and waiting until it’s done.
- Raises:
InvalidQueryError – If the items parameter does not conform to the specification above or if the server did not accept the DataRow creation request (unknown reason).
ResourceNotFoundError – If unable to retrieve the Task for the import process. This could imply that the import failed.
InvalidAttributeError – If there are fields in items not valid for a DataRow.
ValueError – When the upload parameters are invalid
NOTE dicts and strings items can not be mixed in the same call. It is a responsibility of the caller to ensure that all items are of the same type.
- data_row_for_external_id(external_id) DataRow[source]
Convenience method for getting a single DataRow belonging to this Dataset that has the given external_id.
- Parameters:
external_id (str) – External ID of the sought DataRow.
- Returns:
A single DataRow with the given ID.
- Raises:
lbox.exceptions.ResourceNotFoundError – If there is no DataRow in this DataSet with the given external ID, or if there are multiple DataRows for it.
- data_rows(from_cursor: str | None = None, where: Comparison | None = None) PaginatedCollection[source]
Custom method to paginate data_rows via cursor.
- Parameters:
from_cursor (str) – Cursor (data row id) to start from, if none, will start from the beginning
where (dict(str,str)) – Filter to apply to data rows. Where value is a data row column name and key is the value to filter on.
example – {‘external_id’: ‘my_external_id’} to get a data row with external_id = ‘my_external_id’
Note
Order of retrieval is newest data row first. Deleted data rows are not retrieved. Failed data rows are not retrieved. Data rows in progress maybe retrieved.
- data_rows_for_external_id(external_id, limit=10) List[DataRow][source]
Convenience method for getting a multiple DataRow belonging to this Dataset that has the given external_id.
- Parameters:
external_id (str) – External ID of the sought DataRow.
limit (int) – The maximum number of data rows to return for the given external_id
- Returns:
A list of DataRow with the given ID.
- Raises:
lbox.exceptions.ResourceNotFoundError – If there is no DataRow in this DataSet with the given external ID, or if there are multiple DataRows for it.
- export(task_name: str | None = None, filters: DatasetExportFilters | None = None, params: CatalogExportParams | None = None) ExportTask[source]
Creates a dataset export task with the given params and returns the task.
>>> dataset = client.get_dataset(DATASET_ID) >>> task = dataset.export( >>> filters={ >>> "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"], >>> "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"], >>> "data_row_ids": [DATA_ROW_ID_1, DATA_ROW_ID_2, ...] # or global_keys: [DATA_ROW_GLOBAL_KEY_1, DATA_ROW_GLOBAL_KEY_2, ...] >>> }, >>> params={ >>> "performance_details": False, >>> "label_details": True >>> }) >>> task.wait_till_done() >>> task.result
- export_v2(task_name: str | None = None, filters: DatasetExportFilters | None = None, params: CatalogExportParams | None = None) Task | ExportTask[source]
Creates a dataset export task with the given params and returns the task.
>>> dataset = client.get_dataset(DATASET_ID) >>> task = dataset.export_v2( >>> filters={ >>> "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"], >>> "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"], >>> "data_row_ids": [DATA_ROW_ID_1, DATA_ROW_ID_2, ...] # or global_keys: [DATA_ROW_GLOBAL_KEY_1, DATA_ROW_GLOBAL_KEY_2, ...] >>> }, >>> params={ >>> "performance_details": False, >>> "label_details": True >>> }) >>> task.wait_till_done() >>> task.result
- remove_iam_integration() None[source]
Unsets the IAM integration for the dataset.
- Parameters:
None –
- Returns:
None
- Raises:
LabelboxError – If the IAM integration can’t be unset.
Examples
>>> dataset.remove_iam_integration()
- upsert_data_rows(items, file_upload_thread_count=20) DataUpsertTask[source]
Upserts data rows in this dataset. When “key” is provided, and it references an existing data row, an update will be performed. When “key” is not provided a new data row will be created.
>>> task = dataset.upsert_data_rows([ >>> # create new data row >>> { >>> "row_data": "http://my_site.com/photos/img_01.jpg", >>> "global_key": "global_key1", >>> "external_id": "ex_id1", >>> "attachments": [ >>> {"type": AttachmentType.RAW_TEXT, "name": "att1", "value": "test1"} >>> ], >>> "metadata": [ >>> {"name": "tag", "value": "tag value"}, >>> ] >>> }, >>> # update global key of data row by existing global key >>> { >>> "key": GlobalKey("global_key1"), >>> "global_key": "global_key1_updated" >>> }, >>> # update data row by ID >>> { >>> "key": UniqueId(dr.uid), >>> "external_id": "ex_id1_updated" >>> }, >>> ]) >>> task.wait_till_done()