aidsorb.utils
This module provides helper functions for creating and handling molecular point clouds.
- aidsorb.utils.pcd_from_dir(dirname, outname, features=None)[source]
Create molecular point clouds from a directory and store them.
The point clouds are stored in
.npzformat as key-value pairs. For more information on this format, seenumpy.savez().- Parameters:
dirname (str) – Absolute or relative path to the directory.
outname (str) – Name of the file where point clouds will be stored.
features (list, optional) – See
pcd_from_file().
Notes
Molecules that can’t be processed are omitted.
Examples
>>> # Create and store the point clouds. >>> outname = 'path/to/pcds.npz' >>> pcd_from_dir('path/to/dir', outname=outname) >>> # Load back and access the point clouds. >>> pcds = np.load(outname) >>> mol1_pcd = pcds['mol1']
- aidsorb.utils.pcd_from_file(filename, features=None)[source]
Create molecular point cloud from a file.
The molecular
pcdhas shape(N, 4+C)whereNis the number of atoms,pcd[:, :3]are the atomic coordinates,pcd[:, 3]are the atomic numbers andpcd[:, 4:]any additionalfeatures. Iffeatures=None, then the only features are the atomic numbers.Todo
Add option to drop hydrogen atoms for reducing size of point clouds.
- Parameters:
filename (str) – Absolute or relative path to the file.
features (list of str, optional) – All
floatproperties from periodic table are supported.
- Returns:
name_and_pcd –
name_and_pcd[0] == name.name_and_pcd[1] == pcd.
- Return type:
tuple of length 2
Notes
The
nameof the molecule is thebasenameoffilenamewith its suffix removed.To get a list of the supported chemical file formats see
ase.io.read(). Alternatively, you can list them from the command line with:ase info --formats.
Examples
>>> # xyz coordinates + atomic number + electronegativity + radius. >>> name, pcd = pcd_from_file('path/to/file', features=['en_pauling', 'atomic_radius'])
- aidsorb.utils.pcd_from_files(filenames, outname, features=None)[source]
Create molecular point clouds from a list of files and store them.
The point clouds are stored in
.npzformat as key-value pairs. For more information on this format, seenumpy.savez().- Parameters:
filenames (iterable) – An iterable providing the filenames. Absolute or relative paths can be used.
outname (str) – Filename where the data will be stored.
features (list, optional) – See
pcd_from_file().
Notes
Molecules that can’t be processed are omitted.
Examples
>>> # Create and store the point clouds. >>> outname = 'path/to/pcds.npz' >>> pcd_from_files(['path/to/mol1.xyz', 'path/to/mol2.cif'], outname=outname) >>> # Load back and access the point clouds. >>> pcds = np.load(outname) >>> mol1_pcd = pcds['mol1']
- aidsorb.utils.split_pcd(pcd)[source]
Split a point cloud to coordinates and features.
Note
The returned arrays are copies.
- Parameters:
pcd (array of shape (N, 3+C))
- Returns:
coords_and_feats –
coords_and_feats[0] == coords, array of shape (N, 3).coords_and_feats[1] == feats, array of shape (N, C).
- Return type:
tuple of length 2
Examples
>>> pcd = np.random.randn(25, 7) # Point cloud with 4 features. >>> coords, feats = split_pcd(pcd) >>> coords.shape (25, 3) >>> feats.shape (25, 4)
>>> pcd = np.random.randn(15, 3) # Point cloud with no features. >>> coords, feats = split_pcd(pcd) >>> coords.shape (15, 3) >>> feats.shape (15, 0)