aidsorb.datamodules

LightningDataModule’s for use with PyTorch Lightning.

class aidsorb.datamodules.PCDDataModule(path_to_X, *, path_to_Y=None, index_col=None, labels=None, train_size=None, train_transform_x=None, eval_transform_x=None, transform_y=None, shuffle=False, drop_last=False, train_batch_size=32, eval_batch_size=32, config_dataloaders=None)[source]

Bases: LightningDataModule

LightningDataModule for supervised/unsupervised learning on point clouds.

Given the following directory structure:

project_root
├── source      <-- path_to_X
│   ├── foo.npy
│   ├── ...
│   └── bar.npy
├── test.json
├── train.json
└── validation.json

train, validation, and test datasets are set up, all of which are instances of PCDDataset.

Note

Comma , is assumed as the field separator in .csv file.

Warning

  • For validation and test dataloaders, shuffle=False and drop_last=False.

  • If train_size is specified, the first train_size point clouds from train.json will be used. If the data were not split with prepare_data(), ensure that names in train.json don’t follow a particular order.

Todo

Add support for predict_dataloader.

Parameters:
  • path_to_X (str) – Absolute or relative path to the directory holding the point clouds.

  • path_to_Y (str, optional) – Absolute or relative path to the .csv file holding the labels of the point clouds.

  • index_col (str, optional) – Column name of the .csv file to be used for indexing.

  • labels (list, optional) – Column names of the .csv file containing the properties to be predicted.

  • train_size (int, default=None) – Number of training samples. If None, all training samples are used.

  • train_transform_x (callable, optional) – Transformation to apply to point cloud during training.

  • eval_transform_x (callable, optional) – Transformation to apply to point cloud during validation and testing.

  • transform_y (callable, optional) – Transformation to apply to label.

  • shuffle (bool, default=False) – Only for train dataloader.

  • drop_last (bool, default=False) – Only for train dataloader.

  • train_batch_size (int, default=32) – Batch size for train dataloader.

  • eval_batch_size (int, default=32) – Batch size for validation and test dataloaders.

  • config_dataloaders (dict, optional) –

    Dictionary for configuring all dataloaders. For example:

    config_dataloaders = {
        'pin_memory': True,
        'num_workers': 2,
        }
    

    Note

    The dictionary is not copied. To avoid side effects, consider passing a copy.

See also

DataLoader

For a description of shuffle, drop_last and valid options for config_dataloaders.

setup(stage=None)[source]

Set up train, validation and test datasets.

Tip

Datasets are accesible via self.{train,validation,test}_dataset.

Parameters:

stage ({None, 'fit', 'validate', 'test'}, default=None) –

Which datasets to set up.

  • If 'fit', only the train and validation datasets are set up.

  • If 'validate' or 'test', only the corresponding dataset is set up.

  • If None, all datasets are set up.

Return type:

None

test_dataloader()[source]

Return the test dataloader.

Can be called only after setup() has been called and stage is {None, 'test'}.

Return type:

DataLoader

train_dataloader()[source]

Return the train dataloader.

Can be called only after setup() has been called and stage is {None, 'fit'}.

Return type:

DataLoader

val_dataloader()[source]

Return the validation dataloader.

Can be called only after setup() has been called and stage is {None, 'fit', 'validate'}.

Return type:

DataLoader