aidsorb.utils

Helper functions for creating molecular point clouds.

Todo

Add support for optional transform before storing the point cloud.

aidsorb.utils.pcd_from_dir(dirname, outname, features=None)[source]

Create molecular point clouds from a directory of structure files and store them.

Point clouds are stored under outname as .npy files.

Tip

To get a list of the supported chemical file formats see ase.io.read(). Alternatively, you can list them from the command line with: ase info --formats.

Parameters:
  • dirname (str) – Absolute or relative path to the directory.

  • outname (str) – Directory name where the point clouds will be stored. The directory will be created if does not exist.

  • features (list of str, optional) – Elemental properties from periodic table.

Return type:

None

Notes

Molecules that can’t be processed are omitted.

Examples

>>> dirname = 'path/to/structures'
>>> outname = 'path/to/pcd_data'
>>> # xyz coordinates + atomic number + electronegativity
>>> pcd_from_dir(dirname, outname, features=['en_pauling'])
aidsorb.utils.pcd_from_file(filename, features=None)[source]

Create molecular point cloud from a structure file.

The molecular pcd has shape (N, 4+C) where N is the number of atoms, pcd[:, :3] are the atomic coordinates, pcd[:, 3] are the atomic numbers and pcd[:, 4:] any additional features. If features=None, then the only features are the atomic numbers.

Parameters:
Returns:

data – Molecular point cloud and its name as (name, pcd).

Return type:

tuple

Notes

The name of the molecule is the basename of filename with its suffix removed.

Examples

>>> # xyz coordinates + atomic number + electronegativity + radius
>>> name, pcd = pcd_from_file('path/to/file', features=['en_pauling', 'atomic_radius'])
...
aidsorb.utils.pcd_from_files(filenames, outname, features=None)[source]

Create molecular point clouds from a list of structure files and store them.

Point clouds are stored under outname as .npy files.

Parameters:
  • filenames (iterable) – An iterable providing the filenames. Absolute or relative paths can be used.

  • outname (str) – Directory name where the point clouds will be stored.

  • features (list of str, optional) – See pcd_from_dir().

Return type:

None

Notes

Molecules that can’t be processed are omitted.

Examples

>>> # Create and store the point clouds.
>>> outname = 'path/to/pcd_data'
>>> pcd_from_files(['path/to/foo.xyz', 'path/to/bar.cif'], outname)
>>> # Load back a point cloud.
>>> pcd = np.load(f'{outname}/foo.npy')