aidsorb.datamodules#

LightningDataModule’s for use with PyTorch Lightning.

class aidsorb.datamodules.DataModule(path_to_X, *, path_to_Y=None, index_col=None, labels=None, train_size=None, train_transform_x=None, eval_transform_x=None, transform_y=None, shuffle=False, drop_last=False, train_batch_size=32, eval_batch_size=32, config_dataloaders=None)[source]#

Bases: LightningDataModule

LightningDataModule for supervised/unsupervised learning.

Given the following directory structure:

project_root
├── source      <-- path_to_X
│   ├── foo.npy
│   ├── ...
│   └── bar.npy
├── test.json
├── train.json
└── validation.json

train, validation, and test datasets are set up, all of which are instances of Dataset.

Note

Comma , is assumed as the field separator in .csv file.

Warning

For validation and test dataloaders, shuffle=False and drop_last=False.
If train_size is specified, the first train_size names from train.json will be used. If the data were not split with prepare_data(), ensure that names in train.json don’t follow a particular order.

Todo

Add support for predict_dataloader.

Parameters:

path_to_X (str) – Absolute or relative path to the directory holding the inputs.
path_to_Y (str, optional) – Absolute or relative path to the .csv file holding the labels of the inputs.
index_col (str, optional) – Column name of the .csv file to be used for indexing. Must match file names in path_to_X (e.g. foo.npy → foo). No effect if path_to_Y=None.
labels (list, optional) – Column names of the .csv file containing the properties to be predicted.
train_size (int, default=None) – Number of training samples. If None, all training samples are used.
train_transform_x (callable, optional) – Transformation to apply to input during training.
eval_transform_x (callable, optional) – Transformation to apply to input during validation and testing.
transform_y (callable, optional) – Transformation to apply to label.
shuffle (bool, default=False) – Only for train dataloader.
drop_last (bool, default=False) – Only for train dataloader.
train_batch_size (int, default=32) – Batch size for train dataloader.
eval_batch_size (int, default=32) – Batch size for validation and test dataloaders.
config_dataloaders (dict, optional) –
Dictionary for configuring all dataloaders. For example:
```
config_dataloaders = {
    'pin_memory': True,
    'num_workers': 2,
    }
```
Note

The dictionary is not copied. To avoid side effects, consider passing a copy.