aidsorb.utils

This module provides helper functions for creating and handling molecular point clouds.

aidsorb.utils.pcd_from_dir(dirname, outname, features=None)[source]

Create molecular point clouds from a directory and store them.

The point clouds are stored in .npz format as key-value pairs. For more information on this format, see numpy.savez().

Parameters:
  • dirname (str) – Absolute or relative path to the directory.

  • outname (str) – Name of the file where point clouds will be stored.

  • features (list, optional) – See pcd_from_file().

Notes

Molecules that can’t be processed are omitted.

Examples

>>> # Create and store the point clouds.
>>> outname = 'path/to/pcds.npz'
>>> pcd_from_dir('path/to/dir', outname=outname)  
>>> # Load back and access the point clouds.
>>> pcds = np.load(outname)  
>>> mol1_pcd = pcds['mol1']  
aidsorb.utils.pcd_from_file(filename, features=None)[source]

Create molecular point cloud from a file.

The molecular pcd has shape (N, 4+C) where N is the number of atoms, pcd[:, :3] are the atomic coordinates, pcd[:, 3] are the atomic numbers and pcd[:, 4:] any additional features. If features=None, then the only features are the atomic numbers.

Todo

Add option to drop hydrogen atoms for reducing size of point clouds.

Parameters:
  • filename (str) – Absolute or relative path to the file.

  • features (list of str, optional) – All float properties from periodic table are supported.

Returns:

name_and_pcd

  • name_and_pcd[0] == name.

  • name_and_pcd[1] == pcd.

Return type:

tuple of length 2

Notes

  • The name of the molecule is the basename of filename with its suffix removed.

  • To get a list of the supported chemical file formats see ase.io.read(). Alternatively, you can list them from the command line with: ase info --formats.

Examples

>>> # xyz coordinates + atomic number + electronegativity + radius.
>>> name, pcd = pcd_from_file('path/to/file', features=['en_pauling', 'atomic_radius']) 
aidsorb.utils.pcd_from_files(filenames, outname, features=None)[source]

Create molecular point clouds from a list of files and store them.

The point clouds are stored in .npz format as key-value pairs. For more information on this format, see numpy.savez().

Parameters:
  • filenames (iterable) – An iterable providing the filenames. Absolute or relative paths can be used.

  • outname (str) – Filename where the data will be stored.

  • features (list, optional) – See pcd_from_file().

Notes

Molecules that can’t be processed are omitted.

Examples

>>> # Create and store the point clouds.
>>> outname = 'path/to/pcds.npz'
>>> pcd_from_files(['path/to/mol1.xyz', 'path/to/mol2.cif'], outname=outname)  
>>> # Load back and access the point clouds.
>>> pcds = np.load(outname)  
>>> mol1_pcd = pcds['mol1']  
aidsorb.utils.split_pcd(pcd)[source]

Split a point cloud to coordinates and features.

Note

The returned arrays are copies.

Parameters:

pcd (array of shape (N, 3+C))

Returns:

coords_and_feats

  • coords_and_feats[0] == coords, array of shape (N, 3).

  • coords_and_feats[1] == feats, array of shape (N, C).

Return type:

tuple of length 2

Examples

>>> pcd = np.random.randn(25, 7)  # Point cloud with 4 features.
>>> coords, feats = split_pcd(pcd)
>>> coords.shape
(25, 3)
>>> feats.shape
(25, 4)
>>> pcd = np.random.randn(15, 3)  # Point cloud with no features.
>>> coords, feats = split_pcd(pcd)
>>> coords.shape
(15, 3)
>>> feats.shape
(15, 0)