Tutorial

Note

This tutorial covers the most common use cases of AIdsorb. For more advanced usage, you should consult the API Reference.

Introduction

What is a point cloud?

A point cloud is a set of 3D data points, i.e. a set of 3D coordinates and (optionally) associated features. More formally:

\[\mathcal{P} = \{\mathbf{p}_1, \mathbf{p}_2, \dots, \mathbf{p}_N\} \quad \text{and} \quad \mathbf{p}_i \in \mathbb{R}^{3+C}\]

where \(N\) is the number of points in the point cloud and \(C\) is the number of (per-point) features.

In AIdsorb, a point cloud is represented as a ndarray or Tensor of shape (N, 3+C):

\[\begin{split}\mathcal{P} = \begin{bmatrix} \mathbf{p}_1 \\ \mathbf{p}_2 \\ \vdots \\ \mathbf{p}_N \end{bmatrix} = \begin{bmatrix} x_1 & y_1 & z_1 & f_{1}^1 & \dots & f_1^C \\ x_2 & y_2 & z_2 & f_{2}^1 & \dots & f_2^C \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ x_N & y_N & z_N & f_{N}^1 & \dots & f_N^C \\ \end{bmatrix}\end{split}\]

What is a molecular point cloud?

It is a point cloud where coordinates correspond to atomic positions, and features correspond to atomic numbers and any additional information.

In AIdsorb, a molecular pcd is represented as ndarray or Tensor of shape (N, 4+C), where N is the number of atoms, pcd[:, :3] are the atomic coordinates, pcd[:, 3] are the atomic numbers and pcd[:, 4:] any additional features. If C == 0, then the only features are the atomic numbers.

Questions

Using point clouds not created with AIdsorb?

Yes! The only requirement is to store them under a directory in .npy format (see numpy.save()) and respect the shapes described in Introduction. Then, you can proceed as described earlier (omitting the point clouds creation part).

Deep learning without the CLI?

Of course! Although you are encouraged to use the CLI, you can also use AIdsorb with plain PyTorch or PyTorch Lightning.

See also

For PyTorch:

For PyTorch Lightning:

PyTorch

from torch.utils.data import DataLoader
from aidsorb.data import PCDDataset, Collator, get_names
from aidsorb.modules import PointNet

# Create the datasets.
train_set = PCDDataset(
    pcd_names=get_names('path/to/project_root/train.json'),
    path_to_X='path/to/pcd_data/',
    path_to_Y='path/to/labels.csv',
    ...
    )
val_set = PCDDataset(
    pcd_names=get_names('path/to/project_root/validation.json'),
    path_to_X='path/to/pcd_data/',
    path_to_Y='path/to/labels.csv',
    ...
    )

# Create the dataloaders.
train_loader = DataLoader(train_set, ..., collate_fn=Collator(channels_first=True))
val_loader = DataLoader(val_set, ..., collate_fn=Collator(channels_first=True))

# Create the model.
model = PointNet(...)

# Your code goes here.
...

PyTorch Lightning

import lightning as L
from aidsorb.data import Collator
from aidsorb.datamodules import PCDDataModule
from aidsorb.modules import PointNet
from aidsorb.litmodules import PCDLit

# Create the datamodule.
dm = PCDDataModule(
    path_to_X='path/to/pcd_data',
    path_to_Y='path/to/labels.csv',
    ...,
    config_dataloaders=dict(collate_fn=Collator(channels_first=True), ...),
    )

# Create the litmodel.
litmodel = PCDLit(model=PointNet(...), ...)

# Create the trainer.
trainer = L.Trainer(...)

# Your code goes here.
...

Predicting directly from the CLI?

Currently, this feature is not available (see TODO).

Further questions

We warmly encourage you to share any questions or ideas in the Discussions.

Note

Before asking how to do X?, please read the documentation carefully.

Tutorial

Introduction

Deep learning on molecular point clouds

Data preparation

Train and test

Summing up