BaseDatasetFactory

class openff.qcsubmit.factories.BaseDatasetFactory(*, qc_specifications={'default': QCSpec(method='B3LYP-D3BJ', basis='DZVP', program='psi4', spec_name='default', spec_description='Standard OpenFF optimization quantum chemistry specification.', store_wavefunction=<WavefunctionProtocolEnum.none: 'none'>, implicit_solvent=None, maxiter=200, scf_properties=[<SCFProperties.Dipole: 'dipole'>, <SCFProperties.Quadrupole: 'quadrupole'>, <SCFProperties.WibergLowdinIndices: 'wiberg_lowdin_indices'>, <SCFProperties.MayerIndices: 'mayer_indices'>], keywords={})}, driver=SinglepointDriver.energy, priority='normal', dataset_tags=['openff'], compute_tag='openff', type='BaseDatasetFactory', workflow=[])[source]

The Base factory which all other dataset factories should inherit from.

Parameters

qc_specifications (Dict[str, openff.qcsubmit.common_structures.QCSpec]) –
driver (qcportal.singlepoint.record_models.SinglepointDriver) –
priority (str) –
dataset_tags (List[str]) –
compute_tag (str) –
type (Literal['BaseDatasetFactory']) –
workflow (List[Union[openff.qcsubmit.workflow_components.conformer_generation.StandardConformerGenerator, openff.qcsubmit.workflow_components.filters.RMSDCutoffConformerFilter, openff.qcsubmit.workflow_components.filters.CoverageFilter, openff.qcsubmit.workflow_components.filters.ElementFilter, openff.qcsubmit.workflow_components.filters.MolecularWeightFilter, openff.qcsubmit.workflow_components.filters.RotorFilter, openff.qcsubmit.workflow_components.filters.SmartsFilter, openff.qcsubmit.workflow_components.state_enumeration.EnumerateTautomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateProtomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateStereoisomers, openff.qcsubmit.workflow_components.fragmentation.WBOFragmenter, openff.qcsubmit.workflow_components.fragmentation.PfizerFragmenter, openff.qcsubmit.workflow_components.filters.ChargeFilter, openff.qcsubmit.workflow_components.filters.ScanFilter, openff.qcsubmit.workflow_components.state_enumeration.ScanEnumerator]]) –

Return type

None

__init__(**data)

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Parameters: data (Any) –
Return type: None

Methods

`__init__`(**data)	Create a new model by parsing and validating input data from keyword arguments.
`add_qc_spec`(method, basis, program, ...[, ...])	Add a new qcspecification to the factory which will be applied to the dataset.
`add_workflow_components`(*components)	Take the workflow components validate them then insert them into the workflow.
`clear_qcspecs`()	Clear out any current QCSpecs.
`clear_workflow`()	Reset the workflow to be empty.
`construct`([_fields_set])	Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data.
`copy`(*[, include, exclude, update, deep])	Duplicate a model, optionally choose which fields to include, exclude and change.
`create_dataset`(dataset_name, molecules, ...)	Process the input molecules through the given workflow then create and populate the corresponding dataset class which acts as a local representation for the collection and tasks to be performed in qcarchive.
`create_index`(molecule)	Create an index for the current molecule.
`dict`(args, *kwargs)	Overwrite the dict method to handle any enums when saving to yaml/json via a dict call.
`export`(file_name)	Export the whole factory to file including settings and workflow.
`export_settings`(file_name)	Export the current model to file this will include the workflow as well along with each components settings.
`export_workflow`(file_name)	Export the workflow components and their settings to file so that they can be loaded later.
`from_file`(file_name)	Create a factory from the serialised model file.
`from_orm`(obj)
`get_workflow_components`(component_name)	Find any workflow components with this component name.
`import_settings`(settings[, clear_workflow])	Import settings and workflow from a file.
`import_workflow`(workflow[, clear_existing])	Instance the workflow from a workflow object or from an input file.
`json`(*[, include, exclude, by_alias, ...])	Generate a JSON representation of the model, include and exclude arguments as per dict().
`parse_file`(path, *[, content_type, ...])
`parse_obj`(obj)
`parse_raw`(b, *[, content_type, encoding, ...])
`provenance`(toolkit_registry)	Create the provenance of openff-qcsubmit that created the molecule input data.
`remove_qcspec`(spec_name)	Remove a QCSpec from the dataset.
`remove_workflow_component`(component_name)	Find and remove any components via its type attribute.
`schema`([by_alias, ref_template])
`schema_json`(*[, by_alias, ref_template])
`update_forward_refs`(**localns)	Try to update ForwardRefs on fields based on this Model, globalns and localns.
`validate`(value)

Attributes

`n_qc_specs`	Return the number of QCSpecs on this dataset.
`type`
`workflow`

classmethod from_file(file_name)[source]

Create a factory from the serialised model file.

Parameters: file_name (str) –

provenance(toolkit_registry)[source]

Create the provenance of openff-qcsubmit that created the molecule input data.

Returns: A dict of the provenance information.
Parameters: toolkit_registry (openff.toolkit.utils.toolkit_registry.ToolkitRegistry) –
Return type: Dict[str, str]

Important

We can not check which toolkit was used to generate the Cmiles data but we know that openeye will always be used first when available.

clear_workflow()[source]

Reset the workflow to be empty.

Return type: None

add_workflow_components(*components)[source]

Take the workflow components validate them then insert them into the workflow.

Parameters: components (Union[openff.qcsubmit.workflow_components.conformer_generation.StandardConformerGenerator, openff.qcsubmit.workflow_components.filters.RMSDCutoffConformerFilter, openff.qcsubmit.workflow_components.filters.CoverageFilter, openff.qcsubmit.workflow_components.filters.ElementFilter, openff.qcsubmit.workflow_components.filters.MolecularWeightFilter, openff.qcsubmit.workflow_components.filters.RotorFilter, openff.qcsubmit.workflow_components.filters.SmartsFilter, openff.qcsubmit.workflow_components.state_enumeration.EnumerateTautomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateProtomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateStereoisomers, openff.qcsubmit.workflow_components.fragmentation.WBOFragmenter, openff.qcsubmit.workflow_components.fragmentation.PfizerFragmenter, openff.qcsubmit.workflow_components.filters.ChargeFilter, openff.qcsubmit.workflow_components.filters.ScanFilter, openff.qcsubmit.workflow_components.state_enumeration.ScanEnumerator]) – A list of or an individual workflow component which is to be validated and added to the current workflow.
Raises: InvalidWorkflowComponentError – If an invalid workflow component is attempted to be added to the workflow.
Return type: None

get_workflow_components(component_name)[source]

Find any workflow components with this component name.

Parameters: component_name (str) – The name of the component to be gathered from the workflow.
Returns: A list of instances of the requested component from the workflow.
Raises: MissingWorkflowComponentError – If the component could not be found by its component name in the workflow.
Return type: List[Union[openff.qcsubmit.workflow_components.conformer_generation.StandardConformerGenerator, openff.qcsubmit.workflow_components.filters.RMSDCutoffConformerFilter, openff.qcsubmit.workflow_components.filters.CoverageFilter, openff.qcsubmit.workflow_components.filters.ElementFilter, openff.qcsubmit.workflow_components.filters.MolecularWeightFilter, openff.qcsubmit.workflow_components.filters.RotorFilter, openff.qcsubmit.workflow_components.filters.SmartsFilter, openff.qcsubmit.workflow_components.state_enumeration.EnumerateTautomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateProtomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateStereoisomers, openff.qcsubmit.workflow_components.fragmentation.WBOFragmenter, openff.qcsubmit.workflow_components.fragmentation.PfizerFragmenter, openff.qcsubmit.workflow_components.filters.ChargeFilter, openff.qcsubmit.workflow_components.filters.ScanFilter, openff.qcsubmit.workflow_components.state_enumeration.ScanEnumerator]]

remove_workflow_component(component_name)[source]

Find and remove any components via its type attribute.

Parameters: component_name (str) – The name of the component to be gathered from the workflow.
Raises: MissingWorkflowComponentError – If the component could not be found by its component name in the workflow.
Return type: None

import_workflow(workflow, clear_existing=True)[source]

Instance the workflow from a workflow object or from an input file.

Parameters

workflow (Union[str, Dict]) – The name of the file the workflow should be created from or a workflow dictionary.
clear_existing (bool) – If the current workflow should be deleted and replaced or extended.

Return type

None

export_workflow(file_name)[source]

Export the workflow components and their settings to file so that they can be loaded later.

Parameters: file_name (str) – The name of the file the workflow should be exported to.
Raises: UnsupportedFiletypeError – If the file type is not supported.
Return type: None

export(file_name)[source]

Export the whole factory to file including settings and workflow.

Parameters: file_name (str) – The name of the file the factory should be exported to.
Return type: None

export_settings(file_name)[source]

Export the current model to file this will include the workflow as well along with each components settings.

Parameters: file_name (str) – The name of the file the settings and workflow should be exported to.
Raises: UnsupportedFiletypeError – When the file type requested is not supported.
Return type: None

import_settings(settings, clear_workflow=True)[source]

Import settings and workflow from a file.

Parameters

settings (Union[str, Dict]) – The name of the file the settings should be extracted from or the reference to a settings dictionary.
clear_workflow (bool) – If the current workflow should be extended or replaced.

Return type

None

create_dataset(dataset_name, molecules, description, tagline, metadata=None, processors=None, toolkit_registry=None, verbose=True)[source]

Process the input molecules through the given workflow then create and populate the corresponding dataset class which acts as a local representation for the collection and tasks to be performed in qcarchive.

Parameters

dataset_name (str) – The name that will be given to the collection on submission to an archive instance.
molecules (Union[str, List[openff.toolkit.topology.molecule.Molecule], openff.toolkit.topology.molecule.Molecule]) – The list of molecules which should be processed by the workflow and added to the dataset, this can also be a file name which is to be unpacked by the openforcefield toolkit.
description (str) – A string describing the dataset this should be detail the purpose of the dataset and outline the selection method of the molecules.
tagline (str) – A short tagline description which will be displayed with collection name in the QCArchive.
metadata (Optional[openff.qcsubmit.common_structures.Metadata]) – Any metadata which should be associated with this dataset this can be changed from the default after making the dataset.
processors (Optional[int]) – The number of processors available to the workflow, note None will use all available processors.
toolkit_registry (Optional[openff.toolkit.utils.toolkit_registry.ToolkitRegistry]) – The openff.toolkit.utils.ToolkitRegistry which declares the available toolkits and the order in which they should be queried for functionality.If None is passed the default global registry will be used with all installed toolkits.
verbose (bool) – If True a progress bar for each workflow component will be shown.

Returns

A dataset instance populated with the molecules that have passed through the workflow.

Return type

openff.qcsubmit.factories.T

create_index(molecule)[source]

Create an index for the current molecule.

Parameters: molecule (openff.toolkit.topology.molecule.Molecule) – The molecule for which the dataset index will be generated.
Returns: The molecule name or the canonical isomeric smiles for the molecule if the name is not assigned or is blank.
Return type: str

Important

Each dataset can have a different indexing system depending on the data, in this basic dataset each conformer of a molecule is expanded into its own entry separately indexed entry. This is handled by the dataset however so we just generate a general index for the molecule before adding to the dataset.

add_qc_spec(method, basis, program, spec_name, spec_description, store_wavefunction='none', overwrite=False, implicit_solvent=None, maxiter=200, scf_properties=None, keywords=None)

Add a new qcspecification to the factory which will be applied to the dataset.

Parameters

method (str) – The name of the method to use eg B3LYP-D3BJ
basis (Optional[str]) – The name of the basis to use can also be None
program (str) – The name of the program to execute the computation
spec_name (str) – The name the spec should be stored under
spec_description (str) – The description of the spec
store_wavefunction (str) – what parts of the wavefunction that should be saved
overwrite (bool) – If there is a spec under this name already overwrite it
implicit_solvent (Optional[openff.qcsubmit.common_structures.PCMSettings]) – The implicit solvent settings if it is to be used.
maxiter (pydantic.v1.types.PositiveInt) – The maximum number of SCF iterations that should be done.
scf_properties (Optional[List[openff.qcsubmit.common_structures.SCFProperties]]) – The list of SCF properties that should be extracted from the calculation.
keywords (Optional[Dict[str, Union[pydantic.v1.types.StrictStr, pydantic.v1.types.StrictInt, pydantic.v1.types.StrictFloat, pydantic.v1.types.StrictBool, List[pydantic.v1.types.StrictFloat]]]]) – Program specific computational keywords that should be passed to the program

Return type

None

clear_qcspecs()

Clear out any current QCSpecs.

Return type: None

dict(*args, **kwargs): Overwrite the dict method to handle any enums when saving to yaml/json via a dict call.

property n_qc_specs: int: Return the number of QCSpecs on this dataset.

remove_qcspec(spec_name)

Remove a QCSpec from the dataset.

Parameters: spec_name (str) – The name of the spec that should be removed.
Return type: None

Note

The QCSpec settings are not mutable and so they must be removed and a new one added to ensure they are fully validated.