Dataset

class labelbox.schema.dataset.Dataset(client, field_values)[source]

Bases: DbObject, Updateable, Deletable

A Dataset is a collection of DataRows.

name

Type:: str

description

Type:: str

updated_at

Type:: datetime

created_at

Type:: datetime

row_count

The number of rows in the dataset. Fetch the dataset again to update since this is cached.

Type:: int

created_by

ToOne relationship to User

Type:: Relationship

organization

ToOne relationship to Organization

Type:: Relationship

add_iam_integration(iam_integration: str | IAMIntegration) → IAMIntegration[source]

Sets the IAM integration for the dataset. IAM integration is used to sign URLs for data row assets.

Parameters:: iam_integration – IAM integration object or IAM integration id.

create_data_row(items=None, **kwargs) → DataRow[source]

Creates a single DataRow belonging to this dataset. >>> dataset.create_data_row(row_data=”http://my_site.com/photos/img_01.jpg”)

Parameters:

items – Dictionary containing new DataRow data. At a minimum, must contain row_data or DataRow.row_data.
**kwargs – Key-value arguments containing new DataRow data. At a minimum, must contain row_data.

Raises:

InvalidQueryError – If both dictionary and kwargs are provided as inputs
InvalidQueryError – If DataRow.row_data field value is not provided in kwargs.
InvalidAttributeError – in case the DB object type does not contain any of the field names given in kwargs.
ResourceCreationError – If data row creation failed on the server side.

create_data_rows(items, file_upload_thread_count=20) → DataUpsertTask[source]

Asynchronously bulk upload data rows

Parameters:

items (iterable of (dict or str)) –

Returns:

Task representing the data import on the server side. The Task can be used for inspecting task progress and waiting until it’s done.

Raises:

InvalidQueryError – If the items parameter does not conform to the specification above or if the server did not accept the DataRow creation request (unknown reason).
ResourceNotFoundError – If unable to retrieve the Task for the import process. This could imply that the import failed.
InvalidAttributeError – If there are fields in items not valid for a DataRow.
ValueError – When the upload parameters are invalid

NOTE dicts and strings items can not be mixed in the same call. It is a responsibility of the caller to ensure that all items are of the same type.

data_row_for_external_id(external_id) → DataRow[source]

Convenience method for getting a single DataRow belonging to this Dataset that has the given external_id.

Parameters:: external_id (str) – External ID of the sought DataRow.
Returns:: A single DataRow with the given ID.
Raises:: lbox.exceptions.ResourceNotFoundError – If there is no DataRow in this DataSet with the given external ID, or if there are multiple DataRows for it.

data_rows(from_cursor: str | None = None, where: Comparison | None = None) → PaginatedCollection[source]

Custom method to paginate data_rows via cursor.

Parameters:

from_cursor (str) – Cursor (data row id) to start from, if none, will start from the beginning
where (dict(str,str)) – Filter to apply to data rows. Where value is a data row column name and key is the value to filter on.
example – {‘external_id’: ‘my_external_id’} to get a data row with external_id = ‘my_external_id’

Note

Order of retrieval is newest data row first. Deleted data rows are not retrieved. Failed data rows are not retrieved. Data rows in progress maybe retrieved.

data_rows_for_external_id(external_id, limit=10) → List[DataRow][source]

Convenience method for getting a multiple DataRow belonging to this Dataset that has the given external_id.

Parameters:

external_id (str) – External ID of the sought DataRow.
limit (int) – The maximum number of data rows to return for the given external_id

Returns:

A list of DataRow with the given ID.

Raises:

lbox.exceptions.ResourceNotFoundError – If there is no DataRow in this DataSet with the given external ID, or if there are multiple DataRows for it.

export(task_name: str | None = None, filters: DatasetExportFilters | None = None, params: CatalogExportParams | None = None) → ExportTask[source]

Creates a dataset export task with the given params and returns the task.

>>>     dataset = client.get_dataset(DATASET_ID)
>>>     task = dataset.export(
>>>         filters={
>>>             "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
>>>             "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
>>>             "data_row_ids": [DATA_ROW_ID_1, DATA_ROW_ID_2, ...] # or global_keys: [DATA_ROW_GLOBAL_KEY_1, DATA_ROW_GLOBAL_KEY_2, ...]
>>>         },
>>>         params={
>>>             "performance_details": False,
>>>             "label_details": True
>>>         })
>>>     task.wait_till_done()
>>>     task.result

export_v2(task_name: str | None = None, filters: DatasetExportFilters | None = None, params: CatalogExportParams | None = None) → Task | ExportTask[source]

Creates a dataset export task with the given params and returns the task.

>>>     dataset = client.get_dataset(DATASET_ID)
>>>     task = dataset.export_v2(
>>>         filters={
>>>             "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
>>>             "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
>>>             "data_row_ids": [DATA_ROW_ID_1, DATA_ROW_ID_2, ...] # or global_keys: [DATA_ROW_GLOBAL_KEY_1, DATA_ROW_GLOBAL_KEY_2, ...]
>>>         },
>>>         params={
>>>             "performance_details": False,
>>>             "label_details": True
>>>         })
>>>     task.wait_till_done()
>>>     task.result

remove_iam_integration() → None[source]

Unsets the IAM integration for the dataset.

Parameters:: None –
Returns:: None
Raises:: LabelboxError – If the IAM integration can’t be unset.

Examples

>>> dataset.remove_iam_integration()

upsert_data_rows(items, file_upload_thread_count=20) → DataUpsertTask[source]

Upserts data rows in this dataset. When “key” is provided, and it references an existing data row, an update will be performed. When “key” is not provided a new data row will be created.

>>>     task = dataset.upsert_data_rows([
>>>         # create new data row
>>>         {
>>>             "row_data": "http://my_site.com/photos/img_01.jpg",
>>>             "global_key": "global_key1",
>>>             "external_id": "ex_id1",
>>>             "attachments": [
>>>                 {"type": AttachmentType.RAW_TEXT, "name": "att1", "value": "test1"}
>>>             ],
>>>             "metadata": [
>>>                 {"name": "tag", "value": "tag value"},
>>>             ]
>>>         },
>>>         # update global key of data row by existing global key
>>>         {
>>>             "key": GlobalKey("global_key1"),
>>>             "global_key": "global_key1_updated"
>>>         },
>>>         # update data row by ID
>>>         {
>>>             "key": UniqueId(dr.uid),
>>>             "external_id": "ex_id1_updated"
>>>         },
>>>     ])
>>>     task.wait_till_done()