Dataset

class labelbox.schema.dataset.Dataset(client, field_values)[source]

Bases: DbObject, Updateable, Deletable

A Dataset is a collection of DataRows.

name
Type:

str

description
Type:

str

updated_at
Type:

datetime

created_at
Type:

datetime

row_count

The number of rows in the dataset. Fetch the dataset again to update since this is cached.

Type:

int

created_by

ToOne relationship to User

Type:

Relationship

organization

ToOne relationship to Organization

Type:

Relationship

add_iam_integration(iam_integration: str | IAMIntegration) IAMIntegration[source]

Sets the IAM integration for the dataset. IAM integration is used to sign URLs for data row assets.

Parameters:

iam_integration – IAM integration object or IAM integration id.

create_data_row(items=None, **kwargs) DataRow[source]

Creates a single DataRow belonging to this dataset. >>> dataset.create_data_row(row_data=”http://my_site.com/photos/img_01.jpg”)

Parameters:
  • items – Dictionary containing new DataRow data. At a minimum, must contain row_data or DataRow.row_data.

  • **kwargs – Key-value arguments containing new DataRow data. At a minimum, must contain row_data.

Raises:
create_data_rows(items, file_upload_thread_count=20) DataUpsertTask[source]

Asynchronously bulk upload data rows

Parameters:

items (iterable of (dict or str)) –

Returns:

Task representing the data import on the server side. The Task can be used for inspecting task progress and waiting until it’s done.

Raises:
  • InvalidQueryError – If the items parameter does not conform to the specification above or if the server did not accept the DataRow creation request (unknown reason).

  • ResourceNotFoundError – If unable to retrieve the Task for the import process. This could imply that the import failed.

  • InvalidAttributeError – If there are fields in items not valid for a DataRow.

  • ValueError – When the upload parameters are invalid

NOTE dicts and strings items can not be mixed in the same call. It is a responsibility of the caller to ensure that all items are of the same type.

data_row_for_external_id(external_id) DataRow[source]

Convenience method for getting a single DataRow belonging to this Dataset that has the given external_id.

Parameters:

external_id (str) – External ID of the sought DataRow.

Returns:

A single DataRow with the given ID.

Raises:

lbox.exceptions.ResourceNotFoundError – If there is no DataRow in this DataSet with the given external ID, or if there are multiple DataRows for it.

data_rows(from_cursor: str | None = None, where: Comparison | None = None) PaginatedCollection[source]

Custom method to paginate data_rows via cursor.

Parameters:
  • from_cursor (str) – Cursor (data row id) to start from, if none, will start from the beginning

  • where (dict(str,str)) – Filter to apply to data rows. Where value is a data row column name and key is the value to filter on.

  • example – {‘external_id’: ‘my_external_id’} to get a data row with external_id = ‘my_external_id’

Note

Order of retrieval is newest data row first. Deleted data rows are not retrieved. Failed data rows are not retrieved. Data rows in progress maybe retrieved.

data_rows_for_external_id(external_id, limit=10) List[DataRow][source]

Convenience method for getting a multiple DataRow belonging to this Dataset that has the given external_id.

Parameters:
  • external_id (str) – External ID of the sought DataRow.

  • limit (int) – The maximum number of data rows to return for the given external_id

Returns:

A list of DataRow with the given ID.

Raises:

lbox.exceptions.ResourceNotFoundError – If there is no DataRow in this DataSet with the given external ID, or if there are multiple DataRows for it.

export(task_name: str | None = None, filters: DatasetExportFilters | None = None, params: CatalogExportParams | None = None) ExportTask[source]

Creates a dataset export task with the given params and returns the task.

>>>     dataset = client.get_dataset(DATASET_ID)
>>>     task = dataset.export(
>>>         filters={
>>>             "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
>>>             "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
>>>             "data_row_ids": [DATA_ROW_ID_1, DATA_ROW_ID_2, ...] # or global_keys: [DATA_ROW_GLOBAL_KEY_1, DATA_ROW_GLOBAL_KEY_2, ...]
>>>         },
>>>         params={
>>>             "performance_details": False,
>>>             "label_details": True
>>>         })
>>>     task.wait_till_done()
>>>     task.result
export_v2(task_name: str | None = None, filters: DatasetExportFilters | None = None, params: CatalogExportParams | None = None) Task | ExportTask[source]

Creates a dataset export task with the given params and returns the task.

>>>     dataset = client.get_dataset(DATASET_ID)
>>>     task = dataset.export_v2(
>>>         filters={
>>>             "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
>>>             "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
>>>             "data_row_ids": [DATA_ROW_ID_1, DATA_ROW_ID_2, ...] # or global_keys: [DATA_ROW_GLOBAL_KEY_1, DATA_ROW_GLOBAL_KEY_2, ...]
>>>         },
>>>         params={
>>>             "performance_details": False,
>>>             "label_details": True
>>>         })
>>>     task.wait_till_done()
>>>     task.result
remove_iam_integration() None[source]

Unsets the IAM integration for the dataset.

Parameters:

None

Returns:

None

Raises:

LabelboxError – If the IAM integration can’t be unset.

Examples

>>> dataset.remove_iam_integration()
upsert_data_rows(items, file_upload_thread_count=20) DataUpsertTask[source]

Upserts data rows in this dataset. When “key” is provided, and it references an existing data row, an update will be performed. When “key” is not provided a new data row will be created.

>>>     task = dataset.upsert_data_rows([
>>>         # create new data row
>>>         {
>>>             "row_data": "http://my_site.com/photos/img_01.jpg",
>>>             "global_key": "global_key1",
>>>             "external_id": "ex_id1",
>>>             "attachments": [
>>>                 {"type": AttachmentType.RAW_TEXT, "name": "att1", "value": "test1"}
>>>             ],
>>>             "metadata": [
>>>                 {"name": "tag", "value": "tag value"},
>>>             ]
>>>         },
>>>         # update global key of data row by existing global key
>>>         {
>>>             "key": GlobalKey("global_key1"),
>>>             "global_key": "global_key1_updated"
>>>         },
>>>         # update data row by ID
>>>         {
>>>             "key": UniqueId(dr.uid),
>>>             "external_id": "ex_id1_updated"
>>>         },
>>>     ])
>>>     task.wait_till_done()