API documentation#

Core API#

Most users will only need to deal with the MOF class.

Defining the main representation of a MOF.

class moffragmentor.mof.MOF(structure, structure_graph)[source]#

Main representation for a MOF structure.

This container holds a structure and its associated graph. It also provides some convenience methods for getting neighbors or results of the fragmentation.

Internally, this code typically uses IStructure objects to avoid bugs due to the mutability of Structure objects (e.g. the fragmentation code performs operations on the structure and we want to be sure that there is no impact on the input).

Examples

>>> from moffragmentor import MOF
>>> mof = MOF(structure, structure_graph)
>>> # equivalent is to read from a cif file
>>> mof = MOF.from_cif(cif_file)
>>> # visualize the structure
>>> mof.show_structure()
>>> # get the neighbors of a site
>>> mof.get_neighbor_indices(0)
>>> # perform fragmentation
>>> fragments mof.fragment()

property bridges: Dict[int, int]#

Get a dictionary of bridges.

Bridges are edges in a graph that, if deleted, increase the number of connected components.

Returns:: dictionary of bridges
Return type:: Dict[int, int]

dump(path)[source]#

Dump this object as pickle file

Return type:: None

property frac_coords: ndarray#

Return fractional coordinates of the structure.

We cache this call as pymatgen seems to re-compute this.

Returns:

fractional coordinates of the structure: in array of shape (n_sites, 3)

Return type:

np.ndarray

fragment(check_dimensionality=True, create_single_metal_bus=False, break_organic_nodes_at_metal=True)[source]#

Split the MOF into building blocks.

The building blocks are linkers, nodes, bound, unbound solvent, net embedding of those building blocks.

Parameters:

check_dimensionality (bool) – Check if the node is 0D. If not, split into isolated metals. Defaults to True.
create_single_metal_bus (bool) – Create a single metal BUs. Defaults to False.
break_organic_nodes_at_metal (bool) – Break nodes into single metal BU if they appear “too organic”.

Returns:

FragmentationResult object.

Return type:

FragmentationResult

classmethod from_cif(cif, symprec=None, angle_tolerance=None, get_primitive=True)[source]#

Initialize a MOF object from a cif file.

Note that this method, by default, symmetrizes the structure.

Parameters:

cif (str) – path to the cif file
symprec (float, optional) – Symmetry precision
angle_tolerance (float, optional) – Angle tolerance
get_primitive (bool) – Whether to get the primitive cell

Returns:

MOF object

Return type:

MOF

get_neighbor_indices(site)[source]#

Get list of indices of neighboring sites.

Return type:: List[int]

get_symbol_of_site(site)[source]#

Get elemental symbol of site indexed site.

Return type:: str

property nx_graph#: Structure graph as networkx graph object

show_adjacency_matrix(highlight_metals=False)[source]#: Plot structure graph as adjaceny matrix

show_structure()[source]#: Visualize structure using nglview.

property terminal_indices: List[int]#

Return the indices of the terminal sites.

A terminal site is a site that has only one neighbor. And is connected via a bridge to the rest of the structure. That means, splitting the bond between the terminal site and the rest of the structure will increase the number of connected components.

Typical examples of terminal sites are hydrogren atoms, or halogen functional groups.

Returns:: indices of the terminal sites
Return type:: List[int]

write_cif(filename)[source]#

Write the structure to a CIF file.

Return type:: None

Command line interface#

Command line interfaces

SBU subpackage#

Defines datastructures for the building blocks as well as collections of building blocks

Representation for a secondary building block.

class moffragmentor.sbu.sbu.SBU(molecule, molecule_graph, graph_branching_indices, binding_indices, molecule_original_indices_mapping=None, dummy_molecule=None, dummy_molecule_graph=None, dummy_molecule_indices_mapping=None, dummy_branching_indices=None)[source]#

Representation for a secondary building block.

It also acts as container for site indices:

graph_branching_indices: are the branching indices according
to the graph-based definition. They might not be part of the molecule.

binding_indices: are the indices of the sites between
the branching index and metal

original_indices: complete original set of indices that has been selected
for this building blocks

Note

The coordinates in the molecule object are not the ones directly extracted from the MOF. They are the coordinates of sites unwrapped to ensure that there are no “broken molecules” .

To obtain the “original” coordinates, use the _coordinates attribute.

Note

Dummy molecules

In dummy molecules the binding and branching sites are replaces by dummy atoms (noble gas). They also have special properties that indicate the original species.

Examples

>>> # visualize the molecule
>>> sbu_object.show_molecule()
>>> # search pubchem for the molecule
>>> sbu_object.search_pubchem()

get_neighbor_indices(site)[source]#

Get list of indices of neighboring sites

Return type:: List[int]

property hash: str#

Return hash.

The hash is a combination of Weisfeiler-Lehman graph hash and center.

Returns:: Hash.
Return type:: str

search_pubchem(listkey_counts=10, **kwargs)[source]#

Search for a molecule in pubchem # noqa: DAR401

Second element of return tuple is true if there was an identity match

Parameters:

listkey_counts (int) – Number of list keys to return (relevant for substructure search). Defaults to 10.
kwargs – Additional arguments to pass to PubChem.search

Returns:

List of pubchem ids and whether there was an identity match

Return type:

Tuple[List[str], bool]

property smiles: str#

Return canonical SMILES.

Use openbabel to compute the SMILES, but then get the canonical version with RDKit as we observed sometimes the same molecule ends up as different canonical SMILES for openbabel. If RDKit cannot make a canonical SMILES (can happen with organometallics) we simply use the openbabel version.

Returns:: Canonical SMILES
Return type:: str

Collection for MOF building blocks

class moffragmentor.sbu.sbucollection.SBUCollection(sbus)[source]#

Container for a collection of SBUs

property smiles#

Return a list of the SMILES strings of the SBUs.

Returns:: A list of smiles strings.
Return type:: List[str]

Create Python containers for node building blocks.

Here we understand metal clusters as nodes.

class moffragmentor.sbu.node.Node(molecule, molecule_graph, graph_branching_indices, binding_indices, molecule_original_indices_mapping=None, dummy_molecule=None, dummy_molecule_graph=None, dummy_molecule_indices_mapping=None, dummy_branching_indices=None)[source]#

Container for metal cluster building blocks.

Will typically automatically be constructured by the fragmentor.

classmethod from_mof_and_indices(mof, node_indices, branching_indices, binding_indices)[source]#

Build a node object from a MOF and some intermediate outputs of the fragmentation.

Parameters:

mof (MOF) – The MOF to build the node from.
node_indices (Set[int]) – The indices of the nodes in the MOF.
branching_indices (Set[int]) – The indices of the branching points in the MOF that belong to this node.
binding_indices (Set[int]) – The indices of the binding points in the MOF that belong to this node.

Returns:

A node object.

Describing a collection of nodes

class moffragmentor.sbu.nodecollection.NodeCollection(sbus)[source]#: Collection of node building blocks

Describing the organic building blocks, i.e., linkers.

class moffragmentor.sbu.linker.Linker(molecule, molecule_graph, graph_branching_indices, binding_indices, molecule_original_indices_mapping=None, dummy_molecule=None, dummy_molecule_graph=None, dummy_molecule_indices_mapping=None, dummy_branching_indices=None)[source]#: Describe a linker in a MOF

Describing an collection of linkers

class moffragmentor.sbu.linkercollection.LinkerCollection(sbus)[source]#

Collection of linker building blocks

property building_block_composition: List[str]#

Return a list of strings of building blocks.

Strings are of the form of L{i} where i is an integer.

Returns:: List of strings of building blocks.
Return type:: List[str]

molecule subpackage#

Defines datastructures for the non-building-block molecules (e.g. solvent) as well as collections of such molecules

Dealing with molecules that not part of a secondary building unit.

class moffragmentor.molecule.nonsbumolecule.NonSbuMolecule(molecule, molecule_graph, indices, connecting_index=None)[source]#

Class to handle solvent or other non-SBU molecules.

classmethod from_structure_graph_and_indices(structure_graph, indices)[source]#

Create a a new NonSbuMolecule from a part of a structure graph.

Parameters:

structure_graph (StructureGraph) – Structure graph with structure attribute
indices (List[int]) – Indices that label nodes in the structure graph, indexing the molecule of interest

Returns:

Instance of NonSbuMolecule

Return type:

NonSbuMolecule

show_molecule()[source]#: Use nglview to show the molecule.

Collections of molecules, e.g. bound solvents and non-bound solvents.

class moffragmentor.molecule.nonsbumoleculecollection.NonSbuMoleculeCollection(non_sbu_molecules)[source]#

Class to handle collections of molecules.

For example, bound solvents and non-bound solvents.

property composition: str#

Get a string describing the composition.

Return type:: str

Fragmentor subpackage#

This subpackage is not optimized for end-users. It is intended for developers who wish to customize the behavior of the fragmentor.

Routines for finding branching points in a structure graph of a MOF.

Note that those routines do not work for other reticular materials as they assume the presence of a metal.

moffragmentor.fragmentor.branching_points.get_branch_points(mof)[source]#

Get all branching points in the MOF.

Parameters:: mof (MOF) – MOF object.
Returns:: List of indices of branching points.
Return type:: List[int]

Some pure functions that are used to perform the node identification.

Node classification techniques described in https://pubs.acs.org/doi/pdf/10.1021/acs.cgd.8b00126.

class moffragmentor.fragmentor.nodelocator.NodelocationResult(nodes, branching_indices, connecting_paths, binding_indices, to_terminal_from_branching)#

binding_indices#: Alias for field number 3

branching_indices#: Alias for field number 1

connecting_paths#: Alias for field number 2

nodes#: Alias for field number 0

to_terminal_from_branching#: Alias for field number 4

moffragmentor.fragmentor.nodelocator.find_node_clusters(mof, unbound_solvent_indices=None, forbidden_indices=None)[source]#

Locate the branching indices, and node clusters in MOFs.

Starting from the metal indices it performs depth first search on the structure graph up to branching points.

Parameters:

mof (MOF) – moffragmentor MOF instance
unbound_solvent_indices (List[int], optionl) – indices of unbound solvent atoms. Defaults to None.
forbidden_indices (List[int], optional) – indices not considered as metals, for instance, because they are part of a linker. Defaults to None.

Returns:

nametuple with the slots “nodes”, “branching_indices” and: ”connecting_paths”

Return type:

NodelocationResult

Based on the node location, locate the linkers

moffragmentor.fragmentor.linkerlocator.create_linker_collection(mof, node_location_result, node_collection, unbound_solvents, bound_solvents)[source]#

Based on MOF, node locaion and unbound solvent location locate the linkers

Return type:: Tuple[LinkerCollection, dict]

Funnctions that can be used to locate bound and unbound sovlent

moffragmentor.fragmentor.solventlocator.get_all_bound_solvent_molecules(mof, node_atom_sets)[source]#

Identify all bound solvent molecules.

Bound solvent is defined as being connected via one bridge to one metal center.

Parameters:

mof (MOF) – instance of a MOF object
node_atom_sets (List[Set[int]]) – List of indices for the MOF nodes

Returns:

Collection of NonSbuMolecule objects: containing the bound solvent molecules

Return type:

NonSbuMoleculeCollection

moffragmentor.fragmentor.solventlocator.get_floating_solvent_molecules(mof)[source]#

Create a collection of NonSbuMolecules from a MOF.

Parameters:: mof (MOF) – instance of MOF
Returns:: collection of NonSbuMolecules
Return type:: NonSbuMoleculeCollection

Extraction of pymatgen Molecules from a structure for which we know the branching points.

This module contains functions that perform filtering on indices or fragments.

Those fragments are typically obtained from the other fragmentation modules.

moffragmentor.fragmentor.filter.bridges_across_cell(mof, indices)[source]#

Check if a molecule of indices bridges across the cell

Return type:: bool

moffragmentor.fragmentor.filter.in_hull(pointcloud, hull)[source]#

Test if points in p are in hull.

Taken from https://stackoverflow.com/a/16898636

Parameters:

pointcloud (np.array) – points to test (NxK coordinates of N points in K dimensions)
hull (np.array) – Is either a scipy.spatial.Delaunay object or the MxK array of the coordinates of M points in K dimensions for which Delaunay triangulation will be computed

Returns:

True if all points are in the hull, False otherwise

Return type:

bool

Generate molecules as the subgraphs from graphs

moffragmentor.fragmentor.molfromgraph.wrap_molecule(mol_idxs, mof, starting_index=None, add_additional_site=True)[source]#

Wrap a molecule in the cell of the MOF by walking along the structure graph.

For this we perform BFS from the starting index. That is, we use a queue to keep track of the indices of the atoms that we still need to visit (the neighbors of the current index). We then compute new coordinates by computing the Cartesian coordinates of the neighbor image closest to the new coordinates of the current atom.

To then create a Molecule with the correct ordering of sites, we walk through the hash table in the order of the original indices.

Parameters:

mol_idxs (Iterable[int]) – The indices of the atoms in the molecule in the MOF.
mof (MOF) – MOF object that contains the mol_idxs.
starting_index (int, optional) – Starting index for the walk. Defaults to 0.
add_additional_site (bool) – Whether to add an additional site

Returns:

wrapped molecule

Return type:

Molecule

Utils subpackage#

Also the utils subpackage is not optimized for end-users.

Helper functions.

class moffragmentor.utils.IStructure(lattice, species, coords, charge=None, validate_proximity=False, to_unit_cell=False, coords_are_cartesian=False, site_properties=None)[source]#

pymatgen IStructure with faster equality comparison.

This dramatically speeds up lookups in the LRU cache when an object with the same __hash__ is already in the cache.

moffragmentor.utils.enable_logging()[source]#

Set up the mofdscribe logging with sane defaults.

Return type:: List[int]

moffragmentor.utils.get_sub_structure(mof, indices)[source]#

Return a sub-structure of the structure with only the sites with the given indices.

Parameters:

mof (MOF) – MOF object
indices (Collection[int]) – Collection of integers

Returns:

sub-structure of the structure with only the sites with the given indices

Return type:

Structure

moffragmentor.utils.is_tool(name)[source]#

Check whether name is on PATH and marked as executable.

https://stackoverflow.com/questions/11210104/check-if-a-program-exists-from-a-python-script

Parameters:: name (str) – The name of the tool to check for.
Returns:: True if the tool is on PATH and marked as executable.
Return type:: bool

Errors reused across the moffragmentor package

exception moffragmentor.utils.errors.JavaNotFoundError[source]#: Raised if Java executable could not be found

exception moffragmentor.utils.errors.NoMetalError[source]#: Raised if structure contains no metal

Methods to rank molecules according to some measure of similarity.

moffragmentor.utils.mol_compare.mcs_rank(smiles_reference, smiles, additional_attributes=None)[source]#: Rank SMILES based on the maximum common substructure to the reference smiles.

moffragmentor.utils.mol_compare.tanimoto_rank(smiles_reference, smiles, additional_attributes=None)[source]#: Rank SMILES based on the Tanimoto similarity to the reference smiles.

Methods on structure graphs

Methods for running systre