BaseDatasetFactory

pydantic model openff.qcsubmit.factories.BaseDatasetFactory[source]

The Base factory which all other dataset factories should inherit from.

Show JSON schema
{
   "title": "BaseDatasetFactory",
   "description": "The Base factory which all other dataset factories should inherit from.",
   "type": "object",
   "properties": {
      "qc_specifications": {
         "title": "Qc Specifications",
         "description": "The QCSpecifications which will be computed for this dataset.",
         "default": {
            "default": {
               "method": "B3LYP-D3BJ",
               "basis": "DZVP",
               "program": "psi4",
               "spec_name": "default",
               "spec_description": "Standard OpenFF optimization quantum chemistry specification.",
               "store_wavefunction": "none",
               "implicit_solvent": null,
               "maxiter": 200,
               "scf_properties": [
                  "dipole",
                  "quadrupole",
                  "wiberg_lowdin_indices",
                  "mayer_indices"
               ],
               "keywords": null
            }
         },
         "type": "object",
         "additionalProperties": {
            "$ref": "#/definitions/QCSpec"
         }
      },
      "driver": {
         "description": "The type of single point calculations which will be computed. Note some services require certain calculations for example optimizations require graident calculations.",
         "default": "energy",
         "allOf": [
            {
               "$ref": "#/definitions/DriverEnum"
            }
         ]
      },
      "priority": {
         "title": "Priority",
         "description": "The priority the dataset should be computed at compared to other datasets currently running.",
         "default": "normal",
         "type": "string"
      },
      "dataset_tags": {
         "title": "Dataset Tags",
         "description": "The dataset tags which help identify the dataset.",
         "default": [
            "openff"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "compute_tag": {
         "title": "Compute Tag",
         "description": "The tag the computes tasks will be assigned to, managers wishing to execute these tasks should use this compute tag.",
         "default": "openff",
         "type": "string"
      },
      "type": {
         "title": "Type",
         "description": "The type of dataset factory which corresponds to the dataset made.",
         "default": "BaseDatasetFactory",
         "enum": [
            "BaseDatasetFactory"
         ],
         "type": "string"
      },
      "workflow": {
         "title": "Workflow",
         "description": "The set of workflow components and their settings which will be executed in order on the input molecules to make the dataset.",
         "default": [],
         "type": "array",
         "items": {
            "anyOf": [
               {
                  "$ref": "#/definitions/StandardConformerGenerator"
               },
               {
                  "$ref": "#/definitions/RMSDCutoffConformerFilter"
               },
               {
                  "$ref": "#/definitions/CoverageFilter"
               },
               {
                  "$ref": "#/definitions/ElementFilter"
               },
               {
                  "$ref": "#/definitions/MolecularWeightFilter"
               },
               {
                  "$ref": "#/definitions/RotorFilter"
               },
               {
                  "$ref": "#/definitions/SmartsFilter"
               },
               {
                  "$ref": "#/definitions/EnumerateTautomers"
               },
               {
                  "$ref": "#/definitions/EnumerateProtomers"
               },
               {
                  "$ref": "#/definitions/EnumerateStereoisomers"
               },
               {
                  "$ref": "#/definitions/WBOFragmenter"
               },
               {
                  "$ref": "#/definitions/PfizerFragmenter"
               },
               {
                  "$ref": "#/definitions/ChargeFilter"
               },
               {
                  "$ref": "#/definitions/ScanFilter"
               },
               {
                  "$ref": "#/definitions/ScanEnumerator"
               }
            ]
         }
      }
   },
   "definitions": {
      "WavefunctionProtocolEnum": {
         "title": "WavefunctionProtocolEnum",
         "description": "Wavefunction to keep from a computation.",
         "enum": [
            "all",
            "orbitals_and_eigenvalues",
            "return_results",
            "none"
         ],
         "type": "string"
      },
      "PCMSettings": {
         "title": "PCMSettings",
         "description": "A class to handle PCM settings which can be used with PSi4.",
         "type": "object",
         "properties": {
            "units": {
               "title": "Units",
               "description": "The units used in the input options atomic units are used by default.",
               "type": "string"
            },
            "codata": {
               "title": "Codata",
               "description": "The set of fundamental physical constants to be used in the module.",
               "default": 2010,
               "type": "integer"
            },
            "cavity_Type": {
               "title": "Cavity Type",
               "description": "Completely specifies type of molecular surface and its discretization.",
               "default": "GePol",
               "type": "string"
            },
            "cavity_Area": {
               "title": "Cavity Area",
               "description": "Average area (weight) of the surface partition for the GePol cavity in the specified units. By default this is in AU.",
               "default": 0.3,
               "type": "number"
            },
            "cavity_Scaling": {
               "title": "Cavity Scaling",
               "description": "If true, the radii for the spheres will be scaled by 1.2. For finer control on the scaling factor for each sphere, select explicit creation mode.",
               "default": true,
               "type": "boolean"
            },
            "cavity_RadiiSet": {
               "title": "Cavity Radiiset",
               "description": "Select set of atomic radii to be used. Currently Bondi-Mantina Bondi, UFF  and Allinger\u2019s MM3 sets available. Radii in Allinger\u2019s MM3 set are obtained by dividing the value in the original paper by 1.2, as done in the ADF COSMO implementation We advise to turn off scaling of the radii by 1.2 when using this set.",
               "default": "Bondi",
               "type": "string"
            },
            "cavity_MinRadius": {
               "title": "Cavity Minradius",
               "description": "Minimal radius for additional spheres not centered on atoms. An arbitrarily big value is equivalent to switching off the use of added spheres, which is the default in AU.",
               "default": 100,
               "type": "number"
            },
            "cavity_Mode": {
               "title": "Cavity Mode",
               "description": "How to create the list of spheres for the generation of the molecular surface.",
               "default": "Implicit",
               "type": "string"
            },
            "medium_SolverType": {
               "title": "Medium Solvertype",
               "description": "Type of solver to be used. All solvers are based on the Integral Equation Formulation of the Polarizable Continuum Model.",
               "default": "IEFPCM",
               "type": "string"
            },
            "medium_Nonequilibrium": {
               "title": "Medium Nonequilibrium",
               "description": "Initializes an additional solver using the dynamic permittivity. To be used in response calculations.",
               "default": false,
               "type": "boolean"
            },
            "medium_Solvent": {
               "title": "Medium Solvent",
               "description": "Specification of the dielectric medium outside the cavity. Note this will always be converted to the molecular formula to aid parsing via PCM.",
               "type": "string"
            },
            "medium_MatrixSymm": {
               "title": "Medium Matrixsymm",
               "description": "If True, the PCM matrix obtained by the IEFPCM collocation solver is symmetrized.",
               "default": true,
               "type": "boolean"
            },
            "medium_Correction": {
               "title": "Medium Correction",
               "description": "Correction, k for the apparent surface charge scaling factor in the CPCM solver.",
               "default": 0.0,
               "minimum": 0,
               "type": "number"
            },
            "medium_DiagonalScaling": {
               "title": "Medium Diagonalscaling",
               "description": "Scaling factor for diagonal of collocation matrices, values commonly used in the literature are 1.07 and 1.0694.",
               "default": 1.07,
               "minimum": 0,
               "type": "number"
            },
            "medium_ProbeRadius": {
               "title": "Medium Proberadius",
               "description": "Radius of the spherical probe approximating a solvent molecule. Used for generating the solvent-excluded surface (SES) or an approximation of it. Overridden by the built-in value for the chosen solvent. Default in AU.",
               "default": 1.0,
               "type": "number"
            }
         },
         "required": [
            "units",
            "medium_Solvent"
         ]
      },
      "SCFProperties": {
         "title": "SCFProperties",
         "description": "The type of SCF property that should be extracted from a single point calculation.",
         "enum": [
            "dipole",
            "quadrupole",
            "mulliken_charges",
            "lowdin_charges",
            "wiberg_lowdin_indices",
            "mayer_indices",
            "mbis_charges"
         ],
         "type": "string"
      },
      "QCSpec": {
         "title": "QCSpec",
         "description": "A basic config class for results structures.",
         "type": "object",
         "properties": {
            "method": {
               "title": "Method",
               "description": "The name of the computational model used to execute the calculation. This could be the QC method or the forcefield name.",
               "default": "B3LYP-D3BJ",
               "type": "string"
            },
            "basis": {
               "title": "Basis",
               "description": "The name of the basis that should be used with the given method, outside of QC this can be the parameterization ie antechamber or None.",
               "default": "DZVP",
               "type": "string"
            },
            "program": {
               "title": "Program",
               "description": "The name of the program that will be used to perform the calculation.",
               "default": "psi4",
               "type": "string"
            },
            "spec_name": {
               "title": "Spec Name",
               "description": "The name the specification will be stored under in QCArchive.",
               "default": "default",
               "type": "string"
            },
            "spec_description": {
               "title": "Spec Description",
               "description": "The description of the specification which will be stored in QCArchive.",
               "default": "Standard OpenFF optimization quantum chemistry specification.",
               "type": "string"
            },
            "store_wavefunction": {
               "description": "The level of wavefunction detail that should be saved in QCArchive. Note that this is done for every calculation and should not be used with optimizations.",
               "default": "none",
               "allOf": [
                  {
                     "$ref": "#/definitions/WavefunctionProtocolEnum"
                  }
               ]
            },
            "implicit_solvent": {
               "title": "Implicit Solvent",
               "description": "If PCM is to be used with psi4 this is the full description of the settings that should be used.",
               "allOf": [
                  {
                     "$ref": "#/definitions/PCMSettings"
                  }
               ]
            },
            "maxiter": {
               "title": "Maxiter",
               "description": "The maximum number of SCF iterations in QM calculations this will be ignored by programs where this does not make sense.",
               "default": 200,
               "exclusiveMinimum": 0,
               "type": "integer"
            },
            "scf_properties": {
               "description": "The SCF properties which should be extracted after every single point calculation.",
               "default": [
                  "dipole",
                  "quadrupole",
                  "wiberg_lowdin_indices",
                  "mayer_indices"
               ],
               "type": "array",
               "items": {
                  "$ref": "#/definitions/SCFProperties"
               }
            },
            "keywords": {
               "title": "Keywords",
               "description": "An optional set of program specific computational keywords that should be passed to the program. These may include, for example, DFT grid settings.",
               "type": "object",
               "additionalProperties": {
                  "anyOf": [
                     {
                        "type": "string"
                     },
                     {
                        "type": "integer"
                     },
                     {
                        "type": "number"
                     },
                     {
                        "type": "boolean"
                     },
                     {
                        "type": "array",
                        "items": {
                           "type": "number"
                        }
                     }
                  ]
               }
            }
         }
      },
      "DriverEnum": {
         "title": "DriverEnum",
         "description": "The type of calculation that is being performed (e.g., energy, gradient, Hessian, ...).",
         "enum": [
            "energy",
            "gradient",
            "hessian",
            "properties"
         ],
         "type": "string"
      },
      "StandardConformerGenerator": {
         "title": "StandardConformerGenerator",
         "description": "Standard conformer generator using the OFFTK and the back end toolkits.",
         "type": "object",
         "properties": {
            "type": {
               "title": "Type",
               "default": "StandardConformerGenerator",
               "enum": [
                  "StandardConformerGenerator"
               ],
               "type": "string"
            },
            "rms_cutoff": {
               "title": "Rms Cutoff",
               "description": "The rms cut off in angstroms to be used when generating the conformers. Passing None will use the default in toolkit of 1.",
               "type": "number"
            },
            "max_conformers": {
               "title": "Max Conformers",
               "description": "The maximum number of conformers to be generated per molecule.",
               "default": 10,
               "type": "integer"
            },
            "clear_existing": {
               "title": "Clear Existing",
               "description": "If any pre-existing conformers should be kept.",
               "default": true,
               "type": "boolean"
            }
         }
      },
      "RMSDCutoffConformerFilter": {
         "title": "RMSDCutoffConformerFilter",
         "description": "Prunes conformers from a molecule that are less than a specified RMSD from\nall other conformers",
         "type": "object",
         "properties": {
            "type": {
               "title": "Type",
               "default": "RMSDCutoffConformerFilter",
               "enum": [
                  "RMSDCutoffConformerFilter"
               ],
               "type": "string"
            },
            "cutoff": {
               "title": "Cutoff",
               "description": "The RMSD cut off in angstroms.",
               "default": -1.0,
               "type": "number"
            }
         }
      },
      "CoverageFilter": {
         "title": "CoverageFilter",
         "description": "Filters molecules based on the requested force field parameter ids.\n\nNote:\n    * The options ``allowed_ids`` and ``filtered_ids`` are mutually exclusive.",
         "type": "object",
         "properties": {
            "type": {
               "title": "Type",
               "default": "CoverageFilter",
               "enum": [
                  "CoverageFilter"
               ],
               "type": "string"
            },
            "allowed_ids": {
               "title": "Allowed Ids",
               "description": "The SMIRKS parameter ids of the parameters which are allowed to be exercised by the molecules. Molecules should use at least one of these ids to be passed by the component.",
               "type": "array",
               "items": {
                  "type": "string"
               },
               "uniqueItems": true
            },
            "filtered_ids": {
               "title": "Filtered Ids",
               "description": "The SMIRKS parameter ids of the parameters which are not allowed to be exercised by the molecules.",
               "type": "array",
               "items": {
                  "type": "string"
               },
               "uniqueItems": true
            },
            "forcefield": {
               "title": "Forcefield",
               "description": "The name of the force field which we want to filter against.",
               "default": "openff_unconstrained-1.0.0.offxml",
               "type": "string"
            }
         }
      },
      "ElementFilter": {
         "title": "ElementFilter",
         "description": "Filter the molecules based on a list of allowed elements.\n\nNote:\n    The `allowed_elements` attribute can take a list of either symbols or atomic numbers and will resolve them to a\n    common internal format as required.\n\nExample:\n    Using atomic symbols or atomic numbers in components.\n\n    ```python\n    >>> from openff.qcsubmit.workflow_components import ElementFilter\n    >>> efil = ElementFilter()\n    # set the allowed elements to H,C,N,O\n    >>> efil.allowed_elements = ['H', 'C', 'N', 'O']\n    >>> efil.allowed_elements = [1, 6, 7, 8]\n    ```",
         "type": "object",
         "properties": {
            "type": {
               "title": "Type",
               "default": "ElementFilter",
               "enum": [
                  "ElementFilter"
               ],
               "type": "string"
            },
            "allowed_elements": {
               "title": "Allowed Elements",
               "description": "The list of allowed elements as symbols or atomic number ints.",
               "default": [
                  "H",
                  "C",
                  "N",
                  "O",
                  "F",
                  "P",
                  "S",
                  "Cl",
                  "Br",
                  "I"
               ],
               "type": "array",
               "items": {
                  "anyOf": [
                     {
                        "type": "integer"
                     },
                     {
                        "type": "string"
                     }
                  ]
               }
            }
         }
      },
      "MolecularWeightFilter": {
         "title": "MolecularWeightFilter",
         "description": "Filters molecules based on the minimum and maximum allowed molecular weights.",
         "type": "object",
         "properties": {
            "type": {
               "title": "Type",
               "default": "MolecularWeightFilter",
               "enum": [
                  "MolecularWeightFilter"
               ],
               "type": "string"
            },
            "minimum_weight": {
               "title": "Minimum Weight",
               "description": "The minimum allowed molecule weight  default value taken from the openeye blockbuster filter",
               "default": 130,
               "type": "integer"
            },
            "maximum_weight": {
               "title": "Maximum Weight",
               "description": "The maximum allow molecule weight, default taken from the openeye blockbuster filter.",
               "default": 781,
               "type": "integer"
            }
         }
      },
      "RotorFilter": {
         "title": "RotorFilter",
         "description": "Filters molecules based on the maximum and or minimum allowed number of rotatable bonds.\n\nNote:\n    Rotatable bonds are torsions found using the `find_rotatable_bonds` method of the\n    openforcefield.topology.Molecule class.",
         "type": "object",
         "properties": {
            "type": {
               "title": "Type",
               "default": "RotorFilter",
               "enum": [
                  "RotorFilter"
               ],
               "type": "string"
            },
            "maximum_rotors": {
               "title": "Maximum Rotors",
               "description": "The maximum number of rotatable bonds allowed in the molecule, if `None` the molecule has no maximum limit on rotatable bonds.",
               "default": 4,
               "type": "integer"
            },
            "minimum_rotors": {
               "title": "Minimum Rotors",
               "description": "The minimum number of rotatble bonds allowed in the molecule, if `None` the molecule has no limit to the minimum number of rotatble bonds.",
               "type": "integer"
            }
         }
      },
      "SmartsFilter": {
         "title": "SmartsFilter",
         "description": "Filters molecules based on if they contain certain smarts substructures.\n\nNote:\n    * The smarts tags used for filtering should be numerically tagged in order to work with the toolkit.\n    * The options ``allowed_substructures`` and ``filtered_substructures`` are mutually exclusive.",
         "type": "object",
         "properties": {
            "type": {
               "title": "Type",
               "default": "SmartsFilter",
               "enum": [
                  "SmartsFilter"
               ],
               "type": "string"
            },
            "allowed_substructures": {
               "title": "Allowed Substructures",
               "description": "The list of allowed substructures which should be tagged with indices.",
               "type": "array",
               "items": {
                  "type": "string"
               }
            },
            "filtered_substructures": {
               "title": "Filtered Substructures",
               "description": "The list of substructures which should be filtered.",
               "type": "array",
               "items": {
                  "type": "string"
               }
            }
         }
      },
      "EnumerateTautomers": {
         "title": "EnumerateTautomers",
         "description": "Enumerate the tautomers of a molecule using the backend toolkits through the OFFTK.",
         "type": "object",
         "properties": {
            "type": {
               "title": "Type",
               "default": "EnumerateTautomers",
               "enum": [
                  "EnumerateTautomers"
               ],
               "type": "string"
            },
            "max_tautomers": {
               "title": "Max Tautomers",
               "description": "The maximum number of tautomers that should be generated.",
               "default": 20,
               "type": "integer"
            }
         }
      },
      "EnumerateProtomers": {
         "title": "EnumerateProtomers",
         "description": "Enumerate the formal charges of the input molecule using the backend toolkits through the OFFTK.\n\nImportant:\n    Only Openeye is supported so far.",
         "type": "object",
         "properties": {
            "type": {
               "title": "Type",
               "default": "EnumerateProtomers",
               "enum": [
                  "EnumerateProtomers"
               ],
               "type": "string"
            },
            "max_states": {
               "title": "Max States",
               "description": "The maximum number of states that should be generated.",
               "default": 10,
               "type": "integer"
            }
         }
      },
      "EnumerateStereoisomers": {
         "title": "EnumerateStereoisomers",
         "description": "Enumerate the stereo centers and bonds of a molecule using the backend toolkits through the OFFTK, only well defined\nmolecules are returned by this component, this is check via a OFFTK round trip.",
         "type": "object",
         "properties": {
            "type": {
               "title": "Type",
               "default": "EnumerateStereoisomers",
               "enum": [
                  "EnumerateStereoisomers"
               ],
               "type": "string"
            },
            "undefined_only": {
               "title": "Undefined Only",
               "description": "If we should only enumerate parts of the molecule with undefined stereochemistry or all stereochemistry.",
               "default": false,
               "type": "boolean"
            },
            "max_isomers": {
               "title": "Max Isomers",
               "description": "The maximum number of stereoisomers to be generated.",
               "default": 20,
               "type": "integer"
            },
            "rationalise": {
               "title": "Rationalise",
               "description": "If we should check that the resulting molecules are physically possible by attempting to generate conformers for them.",
               "default": true,
               "type": "boolean"
            }
         }
      },
      "WBOFragmenter": {
         "title": "WBOFragmenter",
         "description": "Fragment molecules using the WBO fragmenter class of the fragmenter module.\nFor more information see <https://github.com/openforcefield/fragmenter>.",
         "type": "object",
         "properties": {
            "type": {
               "title": "Type",
               "default": "WBOFragmenter",
               "enum": [
                  "WBOFragmenter"
               ],
               "type": "string"
            },
            "threshold": {
               "title": "Threshold",
               "description": "The WBO error threshold between the parent and the fragment value, the fragmentation will stop when the difference between the fragment and parent is less than this value.",
               "default": 0.03,
               "type": "number"
            },
            "keep_non_rotor_ring_substituents": {
               "title": "Keep Non Rotor Ring Substituents",
               "description": "If any non rotor ring substituents should be kept during the fragmentation resulting in smaller fragments when `False`.",
               "default": false,
               "type": "boolean"
            },
            "heuristic": {
               "title": "Heuristic",
               "description": "The path fragmenter should take when fragment needs to be grown out. The options are ``['wbo', 'path_length']``.",
               "default": "path_length",
               "enum": [
                  "path_length",
                  "wbo"
               ],
               "type": "string"
            }
         }
      },
      "PfizerFragmenter": {
         "title": "PfizerFragmenter",
         "description": "The openff.fragmenter implementation of the Pfizer fragmenation method as described here\n(doi: 10.1021/acs.jcim.9b00373)",
         "type": "object",
         "properties": {
            "type": {
               "title": "Type",
               "default": "PfizerFragmenter",
               "enum": [
                  "PfizerFragmenter"
               ],
               "type": "string"
            }
         }
      },
      "ChargeFilter": {
         "title": "ChargeFilter",
         "description": "Filter molecules if their formal charge is not in the `charges_to_include` list or is in the `charges_to_exclude` list.",
         "type": "object",
         "properties": {
            "type": {
               "title": "Type",
               "default": "ChargeFilter",
               "enum": [
                  "ChargeFilter"
               ],
               "type": "string"
            },
            "charges_to_include": {
               "title": "Charges To Include",
               "description": "The list of net molecule formal charges which are allowed in the dataset.This option is mutually exclusive with ``charges_to_exclude``.",
               "type": "array",
               "items": {
                  "type": "integer"
               }
            },
            "charges_to_exclude": {
               "title": "Charges To Exclude",
               "description": "The list of net molecule formal charges which are to be removed from the dataset.This option is mutually exclusive with ``charges_to_include``.",
               "type": "array",
               "items": {
                  "type": "integer"
               }
            }
         }
      },
      "ScanFilter": {
         "title": "ScanFilter",
         "description": "A filter to remove/include molecules from the workflow who have scans targeting the specified SMARTS.\n\nImportant:\n    Currently only checks against 1D scans.",
         "type": "object",
         "properties": {
            "type": {
               "title": "Type",
               "default": "ScanFilter",
               "enum": [
                  "ScanFilter"
               ],
               "type": "string"
            },
            "scans_to_include": {
               "title": "Scans To Include",
               "description": "Only molecules with SCANs covering these SMARTspatterns should be kept. This option is mutuallyexclusive with ``scans_to_exclude``.",
               "type": "array",
               "items": {
                  "type": "string"
               }
            },
            "scans_to_exclude": {
               "title": "Scans To Exclude",
               "description": "Any molecules with scans covering these SMARTs willbe removed from the dataset. This option is mutallyexclusive with ``scans_to_include``.",
               "type": "array",
               "items": {
                  "type": "string"
               }
            }
         }
      },
      "Scan1D": {
         "title": "Scan1D",
         "description": "A class to hold information on 1D scans to be computed.",
         "type": "object",
         "properties": {
            "smarts1": {
               "title": "Smarts1",
               "description": "The numerically tagged SMARTs pattern used to select the torsion.",
               "type": "string"
            },
            "scan_range1": {
               "title": "Scan Range1",
               "description": "The scan range that should be given to this torsion drive.",
               "type": "array",
               "items": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ]
            },
            "scan_increment": {
               "title": "Scan Increment",
               "description": "The angle in degrees between each grid point in the scan.",
               "default": 15,
               "type": "array",
               "items": {
                  "type": "integer"
               }
            }
         },
         "required": [
            "smarts1"
         ]
      },
      "Scan2D": {
         "title": "Scan2D",
         "description": "A class to hold information on 2D scans to be computed.",
         "type": "object",
         "properties": {
            "smarts1": {
               "title": "Smarts1",
               "description": "The numerically tagged SMARTs pattern used to select the torsion.",
               "type": "string"
            },
            "scan_range1": {
               "title": "Scan Range1",
               "description": "The scan range that should be given to this torsion drive.",
               "type": "array",
               "items": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ]
            },
            "scan_increment": {
               "title": "Scan Increment",
               "default": [
                  15,
                  15
               ],
               "type": "array",
               "items": {
                  "type": "integer"
               }
            },
            "smarts2": {
               "title": "Smarts2",
               "description": "The second numerically tagged SMARTs pattern used to select a torsion.",
               "type": "string"
            },
            "scan_range2": {
               "title": "Scan Range2",
               "description": "The scan range which should be given to the second torsion drive.",
               "type": "array",
               "items": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ]
            }
         },
         "required": [
            "smarts1",
            "smarts2"
         ]
      },
      "ImproperScan": {
         "title": "ImproperScan",
         "description": "A class to hold information on Improper scans to be computed.",
         "type": "object",
         "properties": {
            "smarts": {
               "title": "Smarts",
               "description": "The numerically tagged SMARTs pattern used to select the improper torsion.",
               "type": "string"
            },
            "central_smarts": {
               "title": "Central Smarts",
               "description": "The numerically tagged SMARTSs pattern used to select the centralof the improper torsion.",
               "type": "string"
            },
            "scan_range": {
               "title": "Scan Range",
               "description": "The scan range which should be used for the improper.",
               "type": "array",
               "items": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ]
            },
            "scan_increment": {
               "title": "Scan Increment",
               "description": "The angle in degrees between each grid point in the scan.",
               "default": 15,
               "type": "array",
               "items": {
                  "type": "integer"
               }
            }
         },
         "required": [
            "smarts",
            "central_smarts"
         ]
      },
      "ScanEnumerator": {
         "title": "ScanEnumerator",
         "description": "This module will tag any matching substructures for scanning, useful for torsiondrive datasets.",
         "type": "object",
         "properties": {
            "type": {
               "title": "Type",
               "default": "ScanEnumerator",
               "enum": [
                  "ScanEnumerator"
               ],
               "type": "string"
            },
            "torsion_scans": {
               "title": "Torsion Scans",
               "description": "A list of scan objects which describes the scan range and scan incrementthat should be used with the associated smarts pattern.",
               "default": [],
               "type": "array",
               "items": {
                  "$ref": "#/definitions/Scan1D"
               }
            },
            "double_torsion_scans": {
               "title": "Double Torsion Scans",
               "description": "A list of double scan objects which describes the scan ranges and scan increments,that should be used with each of the smarts patterns.",
               "default": [],
               "type": "array",
               "items": {
                  "$ref": "#/definitions/Scan2D"
               }
            },
            "improper_scans": {
               "title": "Improper Scans",
               "description": "A list of improper scan objects which describes the scan range and scan incrementthat should be used with the smarts pattern.",
               "default": [],
               "type": "array",
               "items": {
                  "$ref": "#/definitions/ImproperScan"
               }
            }
         }
      }
   }
}

Config
  • allow_mutation: bool = True

  • arbitrary_types_allowed: bool = True

  • json_encoders: dict = {<class ‘numpy.ndarray’>: <function DatasetConfig.Config.<lambda> at 0x7f536e5e08b0>, <enum ‘Enum’>: <function DatasetConfig.Config.<lambda> at 0x7f536e5e0940>}

  • validate_assignment: bool = True

Fields
field type: Literal['BaseDatasetFactory'] = 'BaseDatasetFactory'

The type of dataset factory which corresponds to the dataset made.

field workflow: List[Union[openff.qcsubmit.workflow_components.conformer_generation.StandardConformerGenerator, openff.qcsubmit.workflow_components.filters.RMSDCutoffConformerFilter, openff.qcsubmit.workflow_components.filters.CoverageFilter, openff.qcsubmit.workflow_components.filters.ElementFilter, openff.qcsubmit.workflow_components.filters.MolecularWeightFilter, openff.qcsubmit.workflow_components.filters.RotorFilter, openff.qcsubmit.workflow_components.filters.SmartsFilter, openff.qcsubmit.workflow_components.state_enumeration.EnumerateTautomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateProtomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateStereoisomers, openff.qcsubmit.workflow_components.fragmentation.WBOFragmenter, openff.qcsubmit.workflow_components.fragmentation.PfizerFragmenter, openff.qcsubmit.workflow_components.filters.ChargeFilter, openff.qcsubmit.workflow_components.filters.ScanFilter, openff.qcsubmit.workflow_components.state_enumeration.ScanEnumerator]] = []

The set of workflow components and their settings which will be executed in order on the input molecules to make the dataset.

classmethod from_file(file_name)[source]

Create a factory from the serialised model file.

Parameters

file_name (str) –

provenance(toolkit_registry)[source]

Create the provenance of openff-qcsubmit that created the molecule input data.

Returns

A dict of the provenance information.

Parameters

toolkit_registry (openff.toolkit.utils.toolkit_registry.ToolkitRegistry) –

Return type

Dict[str, str]

Important

We can not check which toolkit was used to generate the Cmiles data but we know that openeye will always be used first when available.

clear_workflow()[source]

Reset the workflow to be empty.

Return type

None

add_workflow_components(*components)[source]

Take the workflow components validate them then insert them into the workflow.

Parameters

components (Union[openff.qcsubmit.workflow_components.conformer_generation.StandardConformerGenerator, openff.qcsubmit.workflow_components.filters.RMSDCutoffConformerFilter, openff.qcsubmit.workflow_components.filters.CoverageFilter, openff.qcsubmit.workflow_components.filters.ElementFilter, openff.qcsubmit.workflow_components.filters.MolecularWeightFilter, openff.qcsubmit.workflow_components.filters.RotorFilter, openff.qcsubmit.workflow_components.filters.SmartsFilter, openff.qcsubmit.workflow_components.state_enumeration.EnumerateTautomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateProtomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateStereoisomers, openff.qcsubmit.workflow_components.fragmentation.WBOFragmenter, openff.qcsubmit.workflow_components.fragmentation.PfizerFragmenter, openff.qcsubmit.workflow_components.filters.ChargeFilter, openff.qcsubmit.workflow_components.filters.ScanFilter, openff.qcsubmit.workflow_components.state_enumeration.ScanEnumerator]) – A list of or an individual workflow component which is to be validated and added to the current workflow.

Raises

InvalidWorkflowComponentError – If an invalid workflow component is attempted to be added to the workflow.

Return type

None

get_workflow_components(component_name)[source]

Find any workflow components with this component name.

Parameters

component_name (str) – The name of the component to be gathered from the workflow.

Returns

A list of instances of the requested component from the workflow.

Raises

MissingWorkflowComponentError – If the component could not be found by its component name in the workflow.

Return type

List[Union[openff.qcsubmit.workflow_components.conformer_generation.StandardConformerGenerator, openff.qcsubmit.workflow_components.filters.RMSDCutoffConformerFilter, openff.qcsubmit.workflow_components.filters.CoverageFilter, openff.qcsubmit.workflow_components.filters.ElementFilter, openff.qcsubmit.workflow_components.filters.MolecularWeightFilter, openff.qcsubmit.workflow_components.filters.RotorFilter, openff.qcsubmit.workflow_components.filters.SmartsFilter, openff.qcsubmit.workflow_components.state_enumeration.EnumerateTautomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateProtomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateStereoisomers, openff.qcsubmit.workflow_components.fragmentation.WBOFragmenter, openff.qcsubmit.workflow_components.fragmentation.PfizerFragmenter, openff.qcsubmit.workflow_components.filters.ChargeFilter, openff.qcsubmit.workflow_components.filters.ScanFilter, openff.qcsubmit.workflow_components.state_enumeration.ScanEnumerator]]

remove_workflow_component(component_name)[source]

Find and remove any components via its type attribute.

Parameters

component_name (str) – The name of the component to be gathered from the workflow.

Raises

MissingWorkflowComponentError – If the component could not be found by its component name in the workflow.

Return type

None

import_workflow(workflow, clear_existing=True)[source]

Instance the workflow from a workflow object or from an input file.

Parameters
  • workflow (Union[str, Dict]) – The name of the file the workflow should be created from or a workflow dictionary.

  • clear_existing (bool) – If the current workflow should be deleted and replaced or extended.

Return type

None

export_workflow(file_name)[source]

Export the workflow components and their settings to file so that they can be loaded later.

Parameters

file_name (str) – The name of the file the workflow should be exported to.

Raises

UnsupportedFiletypeError – If the file type is not supported.

Return type

None

export(file_name)[source]

Export the whole factory to file including settings and workflow.

Parameters

file_name (str) – The name of the file the factory should be exported to.

Return type

None

export_settings(file_name)[source]

Export the current model to file this will include the workflow as well along with each components settings.

Parameters

file_name (str) – The name of the file the settings and workflow should be exported to.

Raises

UnsupportedFiletypeError – When the file type requested is not supported.

Return type

None

import_settings(settings, clear_workflow=True)[source]

Import settings and workflow from a file.

Parameters
  • settings (Union[str, Dict]) – The name of the file the settings should be extracted from or the reference to a settings dictionary.

  • clear_workflow (bool) – If the current workflow should be extended or replaced.

Return type

None

create_dataset(dataset_name, molecules, description, tagline, metadata=None, processors=None, toolkit_registry=None, verbose=True)[source]

Process the input molecules through the given workflow then create and populate the corresponding dataset class which acts as a local representation for the collection and tasks to be performed in qcarchive.

Parameters
  • dataset_name (str) – The name that will be given to the collection on submission to an archive instance.

  • molecules (Union[str, List[openff.toolkit.topology.molecule.Molecule], openff.toolkit.topology.molecule.Molecule]) – The list of molecules which should be processed by the workflow and added to the dataset, this can also be a file name which is to be unpacked by the openforcefield toolkit.

  • description (str) – A string describing the dataset this should be detail the purpose of the dataset and outline the selection method of the molecules.

  • tagline (str) – A short tagline description which will be displayed with collection name in the QCArchive.

  • metadata (Optional[openff.qcsubmit.common_structures.Metadata]) – Any metadata which should be associated with this dataset this can be changed from the default after making the dataset.

  • processors (Optional[int]) – The number of processors available to the workflow, note None will use all available processors.

  • toolkit_registry (Optional[openff.toolkit.utils.toolkit_registry.ToolkitRegistry]) – The openff.toolkit.utils.ToolkitRegistry which declares the available toolkits and the order in which they should be queried for functionality.If None is passed the default global registry will be used with all installed toolkits.

  • verbose (bool) – If True a progress bar for each workflow component will be shown.

Returns

A dataset instance populated with the molecules that have passed through the workflow.

Return type

openff.qcsubmit.factories.T

create_index(molecule)[source]

Create an index for the current molecule.

Parameters

molecule (openff.toolkit.topology.molecule.Molecule) – The molecule for which the dataset index will be generated.

Returns

The molecule name or the canonical isomeric smiles for the molecule if the name is not assigned or is blank.

Return type

str

Important

Each dataset can have a different indexing system depending on the data, in this basic dataset each conformer of a molecule is expanded into its own entry separately indexed entry. This is handled by the dataset however so we just generate a general index for the molecule before adding to the dataset.