aidsorb.datamodules

This module provides LightningDataModule’s for use with PyTorch Lightning.

class aidsorb.datamodules.PCDDataModule(path_to_X, path_to_Y, index_col, labels, train_size=None, train_transform_x=None, eval_transform_x=None, transform_y=None, shuffle=False, train_batch_size=32, eval_batch_size=32, config_dataloaders=None)[source]

Bases: LightningDataModule

LightningDataModule for point clouds.

Note

The following directory structure is assumed:

pcd_data
├──pcds.npz        <-- path_to_X
├──train.json
├──validation.json
└──test.json

Tip

Assuming pcd_data/pcds.npz already exists, you can create the above directory structure with prepare_data().

Todo

  • Add support for predict_dataloader.

  • Add option drop_last for train_dataloader.

Parameters:
  • path_to_X (str) – Absolute or relative path to the .npz file holding the point clouds.

  • path_to_Y (str) –

    Absolute or relative path to the .csv file holding the labels of the point clouds.

    Warning

    The comma , is assumed as the field separator.

  • index_col (str) – Column name of the .csv file to be used as row labels. The names (values) under this column must follow the same naming scheme as in pcds.npz.

  • labels (list) – List containing the names of the properties to be predicted. No effect if path_to_Y=None.

  • train_size (int, optional) – The number of training samples. By default, all training samples are used.

  • train_transform_x (callable, optional) – Transforms applied to input during training.

  • eval_transform_x (callable, optional) – Transforms applied to input during validation and testing.

  • transform_y (callable, optional) – Transforms applied to output.

  • shuffle (bool, default=False) – Only for train_dataloader.

  • train_batch_size (int, default=32) – batch_size for train dataloader.

  • eval_batch_size (int, default=32) – batch_size for the validation and test dataloaders.

  • config_dataloaders (dict, optional) –

    Dictionary for configuring the DataLoader’s. This is applied to all dataloaders, i.e. {train,validation,test}_dataloader. For example:

    config_dataloaders = {
        'pin_memory': True,
        'num_workers': 2,
        }
    

See also

PCDDataset

DataLoader

For a description of shuffle, batch_size and valid **kwargs passed to config_dataloaders.

set_test_dataset()[source]

Setup the validation dataset.

set_train_dataset()[source]

Setup the train dataset.

set_validation_dataset()[source]

Setup the validation dataset.

setup(stage=None)[source]

Setup train, validation and test datasets.

test_dataloader()[source]

Return the test dataloader.

Can be called only after setup() has been called and stage={None|test}.

property test_names

The names of point clouds used for testing.

train_dataloader()[source]

Return the train dataloader.

Can be called only after setup() has been called and stage={None|fit}.

property train_names

The names of point clouds used for training.

val_dataloader()[source]

Return the validation dataloader.

Can be called only after setup() has been called and stage={None|fit|validate}.

property val_names

The names of point clouds used for validation.