🚀 Getting started#

Note

This section introduces the framework, its core concepts, and the main components of its workflow. It provides a starting point for understanding the framework and prepares you for the complete end-to-end Tutorial. For advanced usage, consult the 📖 API Reference.

Introduction#

At its core, AIdsorb automates the end-to-end workflow of training deep learning models for porous materials.

The process starts from a directory of molecular structures together with a labels.csv file containing the target properties. The structures are first converted into one of the built-in representations. Alternatively, users can supply their own precomputed representations stored as .npy files. The resulting data are then split into training, validation, and test sets, after which the entire training pipeline is orchestrated through a single .yaml configuration file.

The general workflow is illustrated below.

Representations#

Tip

The representations described below are built into AIdsorb, but you are not limited to them. You can train models using your own representations, as long as they are stored as .npy files (see numpy.save()) in a directory.

Point clouds#

What is a point cloud?

A point cloud is a set of 3D data points, i.e. a set of 3D coordinates and (optionally) associated features. More formally:

\[\mathcal{P} = \{\mathbf{p}_1, \mathbf{p}_2, \dots, \mathbf{p}_N\} \quad \text{and} \quad \mathbf{p}_i \in \mathbb{R}^{3+C}\]

where \(N\) is the number of points in the point cloud and \(C\) is the number of (per-point) features.

In AIdsorb, a point cloud is represented as a ndarray or Tensor of shape (N, 3+C):

\[\begin{split}\mathcal{P} = \begin{bmatrix} \mathbf{p}_1 \\ \mathbf{p}_2 \\ \vdots \\ \mathbf{p}_N \end{bmatrix} = \begin{bmatrix} x_1 & y_1 & z_1 & f_{1}^1 & \dots & f_1^C \\ x_2 & y_2 & z_2 & f_{2}^1 & \dots & f_2^C \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ x_N & y_N & z_N & f_{N}^1 & \dots & f_N^C \\ \end{bmatrix}\end{split}\]

What is a molecular point cloud?

It is a point cloud where coordinates correspond to atomic positions, and features correspond to atomic numbers and any additional information.

In AIdsorb, a molecular point cloud is represented as ndarray or Tensor of shape (N, 4+C), where N is the number of atoms, pcd[:, :3] are the atomic coordinates, pcd[:, 3] are the atomic numbers and pcd[:, 4:] any additional features. If C == 0, then the only features are the atomic numbers.

Why molecular point clouds?

A fast, generic, and flexible representation that can be applied to a wide range of molecular and material systems. It enables deep learning directly from raw structural information, but typically requires more training data than more specialized representations.

Energy voxels#

What are energy voxels?

It is the voxelized potential energy surface of the material, that is a 3D energy image, representing the landscape of host-guest interactions.

In AIdsorb, energy voxels are represented as ndarray or Tensor of shape (C, D, H, W) (multi-channel image) or (D, H, W) (single-channel image).

Why energy voxels?

A physics-informed representation tailored for adsorption in porous materials. By explicitly encoding host–guest interaction energies, it often achieves good predictive performance with less training data than more generic representations, at the cost of reduced generality.

Energy voxels example

The above energy image represents IRMOF-1. You can hover 🖱️ over the figure to play with it.

Questions#

We warmly encourage you to share any questions or ideas in the Discussions. Before asking how to do X?, please read the documentation carefully.

🚀 Getting started#

Introduction#

Representations#

Point clouds#

Energy voxels#

Tutorial#

Data preparation#

Train and test#

Summing up#

Using the Python API#

Questions#