Iterators#

Base Iterator#

class udao.data.iterators.base_iterator.BaseIterator(keys: Sequence[str])#

Bases: Dataset, Generic[T, ST]

Base class for all dataset iterators. Inherits from torch.utils.data.Dataset.

T is the type of the iterator output. ST is the type of the iterator output shape.

See FeatureIterator for an example.

abstract static collate(items: List[T]) T#

Collates the items into a batch. Used in the dataloader.

get_dataloader(batch_size: int, shuffle: bool = False, num_workers: int = 0, **kwargs: Any) DataLoader#

Returns a torch dataloader for the iterator, that can be used for training. This will use the collate static method to collate the items into a batch.

classmethod get_parameter_names() List[str]#

Returns the names of the container parameters of the iterator. Useful to create dynamic parameters for related parts of the pipeline (feature extractors, preprocessors)

set_augmentations(augmentations: List[Callable[[T], T]]) None#

Sets the augmentations to apply to the iterator output.

set_tensors_dtype(dtype: dtype) None#

Sets the dtype of the iterator. Useful for mixed precision training.

abstract property shape: ST#

Returns the shape of the iterator output.

class udao.data.iterators.base_iterator.UdaoIterator(keys: Sequence[str], tabular_features: TabularContainer, objectives: TabularContainer)#

Bases: BaseIterator[Tuple[UT, Tensor], UST], Generic[UT, UST]

Base iterator for the Udao use case, where the iterator returns a FeatureInput object. It is expected to accept: - a TabularContainer representing the tabular features which can be set as variables by the user in the optimization pipeline - a TabularContainer representing the objectives

FST: Type of the iterator output shape - in the Udao case, restricted to FeatureInputShape and its subclasses.

FT: Type of the iterator output - in the Udao case, restricted to th.Tensor and its subclasses This results in a type Tuple[UT, th.Tensor] for the iterator output.

Parameters:
  • keys (Sequence[str]) – Keys of the dataset, used for accessing all features

  • tabular_features (TabularContainer) – Tabular features of the iterator

  • objectives (TabularContainer) – Objectives of the iterator

Query Plan Iterator#

class udao.data.iterators.query_plan_iterator.QueryPlanInput(features: Tensor, embedding_input: T)#

Bases: UdaoEmbedInput[DGLGraph]

The embedding input is a dgl.DGLGraph

class udao.data.iterators.query_plan_iterator.QueryPlanIterator(keys: Sequence[str], tabular_features: TabularContainer, objectives: TabularContainer, query_structure: QueryStructureContainer, **kwargs: TabularContainer)#

Bases: UdaoIterator[QueryPlanInput, UdaoEmbedItemShape]

Iterator that returns a dgl.DGLGraph for each key, with associated node features. The features are stored in the graph.ndata dictionary. The features are expected to be float tensors, and to be of the same length as the number of nodes in the graph.

Parameters:
  • keys (Sequence[str]) – Keys of the dataset, used for accessing all features

  • tabular_features (TabularContainer) – Container for the tabular features associated with the plan

  • objectives (TabularContainer) – Container for the objectives associated with the plan

  • query_structure (QueryStructureContainer) – Wrapper around the graph structure and the features for each query plan

  • kwargs (BaseContainer) – Variable number of other features to add to the graph, e.g. embeddings

static collate(items: List[Tuple[QueryPlanInput, Tensor]]) Tuple[QueryPlanInput, Tensor]#

Collate a list of FeatureItem into a single graph.

property shape: UdaoEmbedItemShape[Dict[str, int]]#

Returns the dimensions of the iterator inputs and outputs.

Tabular Iterator#

class udao.data.iterators.tabular_iterator.TabularIterator(keys: Sequence[str], tabular_feature: TabularContainer)#

Bases: BaseIterator[Tensor, Dict[str, Any]]

Iterator on tabular data.

Parameters:
  • keys (Sequence[str]) – Keys of the dataset, used for accessing all features

  • table (TabularContainer) – Container for the tabular data

static collate(items: List[Tensor]) Tensor#

Collates the items into a batch. Used in the dataloader.

property shape: Any#

Returns the shape of the iterator output.