aidsorb.data
This module provides helper functions and classes for creating datasets and handling point clouds of variable sizes.
- class aidsorb.data.Collator(channels_first=True, mode='upsample')[source]
Bases:
objectCollate a sequence of samples into a
batch.Point clouds are padded before collation, so they can form a batch.
Shapes
Input: sequence of samples
Each sample is a tuple of tensors
(pcd, label), wherepcdhas shape(N_i, C)andlabelhas shape(n_outputs,)or().Output: tuple of length 2
batch[0] == xwith shape(B, C, T)ifchannels_first=True, otherwise(B, T, C).Bis the batch size andTis the size of the largest point cloud in the sequence.batch[1] == ywith shape(B, n_outputs)or(B,).
Tip
Use an instance of this class as
collate_fnwithchannels_first=True, if your model isPointNet.Todo
Add functionality for collating only point clouds (useful when the dataset is unlabeled).
- Parameters:
channels_first (bool, default=True)
mode ({'zeropad', 'upsample'}, default='upsample')
See also
pad_pcds()For a description of the parameters.
upsample_pcd()For a description of the parameters.
Examples
>>> sample1 = (torch.tensor([[1, 4, 5, 2]]), torch.tensor([1., 2.])) >>> sample2 = (torch.tensor([[0, 4, 0, 2], [2, 4, 1, 8]]), torch.tensor([7., 3.]))
>>> collate_fn = Collator() >>> x, y = collate_fn((sample1, sample2)) >>> x.shape torch.Size([2, 4, 2]) >>> y.shape torch.Size([2, 2]) >>> x tensor([[[1, 1], [4, 4], [5, 5], [2, 2]], [[0, 2], [4, 4], [0, 1], [2, 8]]]) >>> y tensor([[1., 2.], [7., 3.]])
>>> collate_fn = Collator(channels_first=False, mode='zeropad') >>> x, y = collate_fn((sample1, sample2)) >>> x tensor([[[1, 4, 5, 2], [0, 0, 0, 0]], [[0, 4, 0, 2], [2, 4, 1, 8]]]) >>> y tensor([[1., 2.], [7., 3.]])
>>> # Label has shape (), i.e. is scalar. >>> sample1 = (torch.tensor([[3, 4, 3, 2]]), torch.tensor(0)) >>> sample2 = (torch.tensor([[2, 4, 8, 2], [9, 4, 1, 8]]), torch.tensor(1)) >>> collate_fn = Collator(channels_first=False, mode='zeropad') >>> x, y = collate_fn((sample1, sample2)) >>> x tensor([[[3, 4, 3, 2], [0, 0, 0, 0]], [[2, 4, 8, 2], [9, 4, 1, 8]]]) >>> y tensor([0, 1])
- class aidsorb.data.PCDDataset(pcd_names, path_to_X, path_to_Y=None, index_col=None, labels=None, transform_x=None, transform_y=None)[source]
Bases:
DatasetDatasetfor point clouds.Tip
For implementing your own transforms, have a look at the transforms tutorial. For more flexibility, consider implementing them as callable instances of classes.
- Parameters:
pcd_names (list) – List containing the names of the point clouds.
path_to_X (str) – Absolute or relative path to the
.npzfile holding the point clouds.path_to_Y (str, optional) –
Absolute or relative path to the
.csvfile holding the labels of the point clouds.Warning
The comma
,is assumed as the field separator.index_col (str, optional) – Column name of the
.csvfile to be used as row labels. The names (values) under this column must follow the same naming scheme as inpcd_names.labels (list, optional) – List containing the names of the properties to be predicted. No effect if
path_to_Y=None.transform_x (callable, optional) – Transforms applied to
input, i.e to each point cloud.transform_y (callable, optional) – Transforms applied to
output. No effect ifpath_to_Y=None.
See also
aidsorb.transformsFor available point cloud transformations.
- property pcd_names
The names of the point clouds.
- aidsorb.data.pad_pcds(pcds, channels_first=True, mode='upsample')[source]
Pad a sequence of variable size point clouds.
Each point cloud must have shape
(N_i, C).- Parameters:
pcds (sequence of tensors)
mode ({'zeropad', 'upsample'}, default='upsample')
channels_first (bool, default=True)
- Returns:
batch – If
channels_first=False, thenbatchhas shape(B, T, C), whereB == len(pcds)is the batch size andTis the size of the largest point cloud inpcds. Otherwise,(B, C, T).- Return type:
tensor of shape (B, T, C) or (B, C, T)
See also
upsample_pcd()For a description of
'upsample'mode.torch.nn.utils.rnn.pad_sequence()For a description of
'zeropad'mode.
Examples
>>> x1 = torch.tensor([[1, 2, 3, 4]]) >>> x2 = torch.tensor([[2, 5, 3, 8], [0, 2, 8, 9]])
>>> batch = pad_pcds((x1, x2), channels_first=False) >>> batch tensor([[[1, 2, 3, 4], [1, 2, 3, 4]], [[2, 5, 3, 8], [0, 2, 8, 9]]])
>>> batch = pad_pcds((x1, x2), channels_first=True) >>> batch tensor([[[1, 1], [2, 2], [3, 3], [4, 4]], [[2, 0], [5, 2], [3, 8], [8, 9]]])
>>> batch = pad_pcds((x1, x2), channels_first=False, mode='zeropad') >>> batch tensor([[[1, 2, 3, 4], [0, 0, 0, 0]], [[2, 5, 3, 8], [0, 2, 8, 9]]])
>>> batch = pad_pcds((x1, x2), channels_first=True, mode='zeropad') >>> batch tensor([[[1, 0], [2, 0], [3, 0], [4, 0]], [[2, 0], [5, 2], [3, 8], [8, 9]]])
- aidsorb.data.prepare_data(source, split_ratio=(0.8, 0.1, 0.1), seed=1)[source]
Split a source of point clouds in train, validation and test sets.
Each
.jsonfile that is created, stores the names of the point clouds that will be used for training, validation and testing.Warning
No directory is created by
prepare_data(). All.jsonfiles are stored under the directory containingsource.Splitting doesn’t support stratification. If your dataset is small and you want to perform classification, consider using train_test_split.
- Parameters:
source (str) – Absolute or relative path to the file holding the point clouds.
split_ratio (sequence, default=(0.8, 0.1, 0.1)) –
The sizes or fractions of splits to be produced.
split_ratio[0] == train.split_ratio[1] == validation.split_ratio[2] == test.
seed (int, default=1) – Controls the randomness of the
rngused for splitting.
Examples
Before the split:
pcd_data └──source.npz
>>> prepare_data('path/to/pcd_data/source.npz')
After the split:
pcd_data ├──source.npz ├──train.json ├──validation.json └──test.json
- aidsorb.data.upsample_pcd(pcd, size)[source]
Upsample
pcdto a newsizeby sampling with replacement frompcd.- Parameters:
pcd (tensor of shape (N, C)) – The original point cloud of size
N.size (int) – The size of the new point cloud.
- Returns:
new_pcd
- Return type:
tensor of shape (size, C).
Examples
>>> pcd = torch.tensor([[2, 4, 5, 6]]) >>> upsample_pcd(pcd, 3) tensor([[2, 4, 5, 6], [2, 4, 5, 6], [2, 4, 5, 6]])
>>> # New points point must be from pcd. >>> pcd = torch.randn(10, 4) >>> new_pcd = upsample_pcd(pcd, 20) >>> (new_pcd[-1] == pcd).all(1).any() # Check for last point. tensor(True)
>>> # No upsampling. >>> pcd = torch.randn(100, 4) >>> new_pcd = upsample_pcd(pcd, len(pcd)) >>> torch.equal(pcd, new_pcd) True