Iterators#
Base Iterator#
- class udao.data.iterators.base_iterator.BaseIterator(keys: Sequence[str])#
Bases:
Dataset
,Generic
[T
,ST
]Base class for all dataset iterators. Inherits from torch.utils.data.Dataset.
T is the type of the iterator output. ST is the type of the iterator output shape.
See FeatureIterator for an example.
- abstract static collate(items: List[T]) T #
Collates the items into a batch. Used in the dataloader.
- get_dataloader(batch_size: int, shuffle: bool = False, num_workers: int = 0, **kwargs: Any) DataLoader #
Returns a torch dataloader for the iterator, that can be used for training. This will use the collate static method to collate the items into a batch.
- classmethod get_parameter_names() List[str] #
Returns the names of the container parameters of the iterator. Useful to create dynamic parameters for related parts of the pipeline (feature extractors, preprocessors)
- set_augmentations(augmentations: List[Callable[[T], T]]) None #
Sets the augmentations to apply to the iterator output.
- set_tensors_dtype(dtype: dtype) None #
Sets the dtype of the iterator. Useful for mixed precision training.
- abstract property shape: ST#
Returns the shape of the iterator output.
- class udao.data.iterators.base_iterator.UdaoIterator(keys: Sequence[str], tabular_features: TabularContainer, objectives: TabularContainer)#
Bases:
BaseIterator
[Tuple
[UT
,Tensor
],UST
],Generic
[UT
,UST
]Base iterator for the Udao use case, where the iterator returns a FeatureInput object. It is expected to accept: - a TabularContainer representing the tabular features which can be set as variables by the user in the optimization pipeline - a TabularContainer representing the objectives
FST: Type of the iterator output shape - in the Udao case, restricted to FeatureInputShape and its subclasses.
FT: Type of the iterator output - in the Udao case, restricted to th.Tensor and its subclasses This results in a type Tuple[UT, th.Tensor] for the iterator output.
- Parameters:
keys (Sequence[str]) – Keys of the dataset, used for accessing all features
tabular_features (TabularContainer) – Tabular features of the iterator
objectives (TabularContainer) – Objectives of the iterator
Query Plan Iterator#
- class udao.data.iterators.query_plan_iterator.QueryPlanInput(features: Tensor, embedding_input: T)#
Bases:
UdaoEmbedInput
[DGLGraph
]The embedding input is a dgl.DGLGraph
- class udao.data.iterators.query_plan_iterator.QueryPlanIterator(keys: Sequence[str], tabular_features: TabularContainer, objectives: TabularContainer, query_structure: QueryStructureContainer, **kwargs: TabularContainer)#
Bases:
UdaoIterator
[QueryPlanInput
,UdaoEmbedItemShape
]Iterator that returns a dgl.DGLGraph for each key, with associated node features. The features are stored in the graph.ndata dictionary. The features are expected to be float tensors, and to be of the same length as the number of nodes in the graph.
- Parameters:
keys (Sequence[str]) – Keys of the dataset, used for accessing all features
tabular_features (TabularContainer) – Container for the tabular features associated with the plan
objectives (TabularContainer) – Container for the objectives associated with the plan
query_structure (QueryStructureContainer) – Wrapper around the graph structure and the features for each query plan
kwargs (BaseContainer) – Variable number of other features to add to the graph, e.g. embeddings
- static collate(items: List[Tuple[QueryPlanInput, Tensor]]) Tuple[QueryPlanInput, Tensor] #
Collate a list of FeatureItem into a single graph.
- property shape: UdaoEmbedItemShape[Dict[str, int]]#
Returns the dimensions of the iterator inputs and outputs.
Tabular Iterator#
- class udao.data.iterators.tabular_iterator.TabularIterator(keys: Sequence[str], tabular_feature: TabularContainer)#
Bases:
BaseIterator
[Tensor
,Dict
[str
,Any
]]Iterator on tabular data.
- Parameters:
keys (Sequence[str]) – Keys of the dataset, used for accessing all features
table (TabularContainer) – Container for the tabular data
- static collate(items: List[Tensor]) Tensor #
Collates the items into a batch. Used in the dataloader.
- property shape: Any#
Returns the shape of the iterator output.