Usage


In /scripts you may find some scripts prepared to run the default values with the only input being the dataset to be used, through the argument –dataset.

NOTE: Configure the paths to the datasets by editting the file in qdf/settings.py:

DATASET_PATH = ".../QuantumDeepField_molecule/dataset"
SAVE_PATH = ".../QuantumDeepField_molecule/output"

1. Preprocessing (for training):

python preprocess_train.py --dataset=$dataset_trained

e.g _python preprocess_train.py –dataset=QM9under7atoms_homolumoeV

Options:

  • dataset [required]: [string] dataset to be used in pre-training. From those that can be installed directly from the cloned repository the options are:

    • “QM9under14atoms_atomizationenergy_eV”

    • “QM9full_atomizationenergy_eV”

    • “QM9full_homolumo_eV” Note: Two properties (homo and lumo)


2. Training:

python train.py --dataset=$dataset_trained --num_workers=$num_workers --seed=$seed --device=$device

e.g _python train.py –dataset=QM9under7atoms_homolumoeV

Options:

  • dataset [required]: [string] dataset to be used in pre-training. From those that can be installed directly from the cloned repository the options are:

    • “QM9under14atoms_atomizationenergy_eV”

    • “QM9full_atomizationenergy_eV”

    • “QM9full_homolumo_eV” Note: Two properties (homo and lumo)

  • num_workers: [int] number of workers to use for the dataloader. Defaults to 1.

  • seed: [int] integer used to specify the seed for the model initialization. Defaults to 1729.

  • device: [string] device to use for training and inference in the model, options are [“cuda”, “cpu”], if None is specified it will use “cuda” if available in your system, else will use “cpu” (slower).


3. Preprocessing inference (predict):

python preprocess_predict.py --dataset_train=$dataset_trained --dataset_predict=$dataset_predict

e.g python preprocess_predict.py –dataset_train=QM9under7atoms_homolumo_eV –dataset_predict=QM9full_homolumo_eV

Options:

  • dataset_train [required]: [string] dataset that was used in pre-training. It is use to look for and load the appropriate orbital dictionaries so that the preprocessing done in the prediction dataset is coherent to what was done in pre-processing the original dataset trained on.

  • dataset_predict [required]: [string] dataset to be used in prediction.


4. Prediction (Inference):

python predict.py --dataset_train=$dataset_trained --dataset_predict=$dataset_predict --model_path=$model_path --num_workers=$num_workers --seed=$seed --device=$device

e.g python predict.py –dataset_train=QM9under7atoms_homolumo_eV –dataset_predict=QM9full_homolumo_eV –model_path=”../pretrained/model”

Options:

  • dataset_train [required]: [string] dataset that was used in pre-training. It is use to look for and load the appropriate orbital dictionaries so that the preprocessing done in the prediction dataset is coherent to what was done in pre-processing the original dataset trained on.

  • dataset_predict [required]: [string] dataset to be used in prediction.

  • model_path [required]: [string] path to file where the pre-trained model is saved.

  • num_workers: [int] number of workers to use for the dataloader. Defaults to 1.

  • seed: [int] integer used to specify the seed for the model initialization. Defaults to 1729.

  • device: [string] device to use for training and inference in the model, options are [“cuda”, “cpu”], if None is specified it will use “cuda” if available in your system, else will use “cpu” (slower).