aidsorb.datamodules#
LightningDataModuleβs for use with PyTorch Lightning.
- class aidsorb.datamodules.DataModule(path_to_X, *, path_to_Y=None, index_col=None, labels=None, train_size=None, train_transform_x=None, eval_transform_x=None, transform_y=None, shuffle=False, drop_last=False, train_batch_size=32, eval_batch_size=32, config_dataloaders=None)[source]#
Bases:
LightningDataModuleLightningDataModule for supervised/unsupervised learning.
Given the following directory structure:
project_root βββ source <-- path_to_X βΒ Β βββ foo.npy βΒ Β βββ ... βΒ Β βββ bar.npy βββ test.json βββ train.json βββ validation.json
train, validation, and test datasets are set up, all of which are instances of
Dataset.Note
Comma
,is assumed as the field separator in.csvfile.Warning
For validation and test dataloaders,
shuffle=Falseanddrop_last=False.If
train_sizeis specified, the firsttrain_sizenames fromtrain.jsonwill be used. If the data were not split withprepare_data(), ensure that names intrain.jsondonβt follow a particular order.
Todo
Add support for
predict_dataloader.- Parameters:
path_to_X (str) β Absolute or relative path to the directory holding the inputs.
path_to_Y (str, optional) β Absolute or relative path to the
.csvfile holding the labels of the inputs.index_col (str, optional) β Column name of the
.csvfile to be used for indexing. Must match file names inpath_to_X(e.g.foo.npyβfoo). No effect ifpath_to_Y=None.labels (list, optional) β Column names of the
.csvfile containing the properties to be predicted.train_size (int, default=None) β Number of training samples. If
None, all training samples are used.train_transform_x (callable, optional) β Transformation to apply to input during training.
eval_transform_x (callable, optional) β Transformation to apply to input during validation and testing.
transform_y (callable, optional) β Transformation to apply to label.
shuffle (bool, default=False) β Only for train dataloader.
drop_last (bool, default=False) β Only for train dataloader.
train_batch_size (int, default=32) β Batch size for train dataloader.
eval_batch_size (int, default=32) β Batch size for validation and test dataloaders.
config_dataloaders (dict, optional) β
Dictionary for configuring all dataloaders. For example:
config_dataloaders = { 'pin_memory': True, 'num_workers': 2, }
Note
The dictionary is not copied. To avoid side effects, consider passing a copy.
See also
DataLoaderFor a description of
shuffle,drop_lastand valid options forconfig_dataloaders.
- setup(stage=None)[source]#
Set up train, validation and test datasets.
Tip
Datasets are accesible via
self.{train,validation,test}_dataset.- Parameters:
stage ({None, 'fit', 'validate', 'test'}, default=None) β
Which datasets to set up.
If
'fit', only the train and validation datasets are set up.If
'validate'or'test', only the corresponding dataset is set up.If
None, all datasets are set up.
- Return type:
None
- test_dataloader()[source]#
Return the test dataloader.
Can be called only after
setup()has been called andstageis{None, 'test'}.- Return type:
- train_dataloader()[source]#
Return the train dataloader.
Can be called only after
setup()has been called andstageis{None, 'fit'}.- Return type: