aidsorb.data
Helper functions and classes for creating datasets and handling point clouds of variable sizes.
- class aidsorb.data.Collator(*, channels_first, mode='upsample', return_mask=False)[source]
Bases:
objectCollate a sequence of samples into a batch.
Point clouds are padded before collation, so they can form a batch.
Shapes
Input: sequence of samples
Each sample is a tuple of
(pcd, label).pcdtensor of shape(N_i, C).labeltensor of shape(n_outputs,),()orNone.
Output: tuple
Bis the batch size andTis the size of the largest point cloud in the sequence.- Parameters:
See also
pad_pcds()For a description of the parameters.
Examples
>>> sample1 = (torch.tensor([[1, 4, 5, 2]]), torch.tensor([1., 2.])) >>> sample2 = (torch.tensor([[0, 4, 0, 2], [2, 4, 1, 8]]), torch.tensor([7., 3.]))
>>> collate_fn = Collator(channels_first=True) >>> x, y = collate_fn((sample1, sample2)) >>> x tensor([[[1, 1], [4, 4], [5, 5], [2, 2]], [[0, 2], [4, 4], [0, 1], [2, 8]]]) >>> y tensor([[1., 2.], [7., 3.]])
>>> collate_fn = Collator(channels_first=False, mode='zeropad') >>> x, y = collate_fn((sample1, sample2)) >>> x tensor([[[1, 4, 5, 2], [0, 0, 0, 0]], [[0, 4, 0, 2], [2, 4, 1, 8]]]) >>> y tensor([[1., 2.], [7., 3.]])
>>> # Label has shape (), i.e. is scalar. >>> sample1 = (torch.tensor([[3, 4, 3, 2]]), torch.tensor(0)) >>> sample2 = (torch.tensor([[2, 4, 8, 2], [9, 4, 1, 8]]), torch.tensor(1)) >>> collate_fn = Collator(channels_first=False, mode='zeropad') >>> x, y = collate_fn((sample1, sample2)) >>> x tensor([[[3, 4, 3, 2], [0, 0, 0, 0]], [[2, 4, 8, 2], [9, 4, 1, 8]]]) >>> y tensor([0, 1])
>>> # Label is None, i.e. unlabeled data. >>> sample1 = (torch.tensor([[1., 0., 1., 0.]]), None) >>> sample2 = (torch.tensor([[5., 2., 2., 0.], [9., 0., 0., 1.]]), None) >>> collate_fn = Collator(channels_first=True, mode='zeropad') >>> x, y = collate_fn((sample1, sample2)) >>> x tensor([[[1., 0.], [0., 0.], [1., 0.], [0., 0.]], [[5., 9.], [2., 0.], [2., 0.], [0., 1.]]]) >>> y
>>> # Collate and return padding mask. >>> sample1 = (torch.tensor([[4, 2, 1, 4], [2, 0, 0, 1]]), torch.tensor(1)) >>> sample2 = (torch.tensor([[1, 2, 3, 1]]), torch.tensor(4)) >>> collate_fn = Collator(channels_first=False, mode='zeropad', return_mask=True) >>> (x, mask), y = collate_fn((sample1, sample2)) >>> x tensor([[[4, 2, 1, 4], [2, 0, 0, 1]], [[1, 2, 3, 1], [0, 0, 0, 0]]]) >>> y tensor([1, 4]) >>> mask tensor([[False, False], [False, True]])
>>> # Batch a single unlabeled sample. >>> sample = (torch.tensor([[2, 3, 4]]), None) >>> collate_fn = Collator(channels_first=False) >>> x, y = collate_fn([sample]) >>> x tensor([[[2, 3, 4]]]) >>> y
>>> # Batch a single labeled sample. >>> sample = (torch.tensor([[1, 1, 2]]), torch.tensor(10)) >>> collate_fn = Collator(channels_first=True, mode='zeropad') >>> x, y = collate_fn([sample]) >>> x tensor([[[1], [1], [2]]]) >>> y tensor([10])
- class aidsorb.data.PCDDataset(pcd_names, path_to_X, *, path_to_Y=None, index_col=None, labels=None, transform_x=None, transform_y=None)[source]
Bases:
DatasetDatasetfor point clouds.Indexing the dataset returns
(x, None)if data are unlabeled, i.e.path_to_Y=None, else(x, y), wherexandyare the results oftransform_xandtransform_y, respectively.Note
- Parameters:
pcd_names (sequence) – Point cloud names.
path_to_X (str) – Absolute or relative path to the directory holding the point clouds.
path_to_Y (str, optional) – Absolute or relative path to the
.csvfile holding the labels of the point clouds.index_col (str, optional) – Column name of the
.csvfile to be used for indexing. This column must includepcd_names. No effect ifpath_to_Y=None.labels (list, optional) – List of column names from the
.csvfile containing the properties to be predicted. No effect ifpath_to_Y=None.transform_x (callable, optional) – Transformation to apply to point cloud.
transform_y (callable, optional) – Transformation to apply to label. No effect if
path_to_Y=None.
See also
aidsorb.transformsFor available point cloud transformations.
- Y
Dataframe for the labels. The columns follow the order in
labels.
- aidsorb.data.pad_pcds(pcds, *, channels_first, mode='upsample', return_mask=False)[source]
Pad a sequence of variable size point clouds.
Each point cloud must have shape
(N_i, C).Shapes
batchtensor of shape(B, T, C)ifchannels_first=False, else(B, C, T).maskboolean tensor of shape(B, T)whereTrueindicates padding.
Bis the batch size andTis the size of the largest point cloud in the sequence.- Parameters:
- Returns:
batchifreturn_mask=False, else(batch, mask).- Return type:
tensor or tuple of tensors
See also
upsample_pcd()For a description of
'upsample'mode.torch.nn.utils.rnn.pad_sequence()For a description of
'zeropad'mode.
Examples
>>> x1 = torch.tensor([[1, 2, 3, 4]]) >>> x2 = torch.tensor([[2, 5, 3, 8], [0, 2, 8, 9]])
>>> batch = pad_pcds((x1, x2), channels_first=False) >>> batch tensor([[[1, 2, 3, 4], [1, 2, 3, 4]], [[2, 5, 3, 8], [0, 2, 8, 9]]])
>>> batch = pad_pcds((x1, x2), channels_first=True) >>> batch tensor([[[1, 1], [2, 2], [3, 3], [4, 4]], [[2, 0], [5, 2], [3, 8], [8, 9]]])
>>> batch = pad_pcds((x1, x2), channels_first=False, mode='zeropad') >>> batch tensor([[[1, 2, 3, 4], [0, 0, 0, 0]], [[2, 5, 3, 8], [0, 2, 8, 9]]])
>>> batch = pad_pcds((x1, x2), channels_first=True, mode='zeropad') >>> batch tensor([[[1, 0], [2, 0], [3, 0], [4, 0]], [[2, 0], [5, 2], [3, 8], [8, 9]]])
>>> # Pad and return padding mask (useful for attention-based architectures). >>> batch, mask = pad_pcds((x1, x2), channels_first=False, return_mask=True) >>> batch tensor([[[1, 2, 3, 4], [1, 2, 3, 4]], [[2, 5, 3, 8], [0, 2, 8, 9]]]) >>> mask tensor([[False, True], [False, False]])
>>> # Pad a single point cloud. >>> pad_pcds([x1], channels_first=False, mode='zeropad') tensor([[[1, 2, 3, 4]]]) >>> pad_pcds([x1], channels_first=True, mode='upsample') tensor([[[1], [2], [3], [4]]])
- aidsorb.data.prepare_data(source, split_ratio=None, seed=1)[source]
Split point clouds into train, validation and test sets.
Each
.jsonfile that is created, stores the names of the point clouds that will be used for training, validation and testing.Warning
All
.jsonfiles are stored under the parent directory ofsource.Splitting doesn’t support stratification. If your dataset is small and you want to perform classification, consider using train_test_split.
- Parameters:
source (str) – Absolute or relative path to the directory holding the point clouds.
split_ratio (sequence, default=None) – Absolute sizes or fractions of splits of the form
(train, val, test). IfNone, it is set to(0.8, 0.1, 0.1).seed (int, default=1) – Controls randomness of the
rngused for splitting.
- Return type:
None
Examples
Before the split:
project_root └── source ├── foo.npy ├── ... └── bar.npy>>> prepare_data('path/to/source')
After the split:
project_root ├── source │ ├── foo.npy │ ├── ... │ └── bar.npy ├── test.json ├── train.json └── validation.json