# Test Suite¶

A test suite plays a vital role in open-source software use and development.

• For a PSI4 user, tests provide models of inputs that should work “as-is” and a searchable collection of syntax and capabilities. The test suite also allows high-quality development snapshots of the codebase to be built automatically for users.

• For a user who has PSI4 as part of a complex computational molecular software environment, a test suite alongside installed PSI4 can be used to show that the PSI4 piece is working.

• For a feature developer, adding tests provides confidence that you can leave your code untouched and still advertise that the feature works years later. With tests, proposed changes to PSI4 that break your code fall upon the change proposer to fix, rather than being merged silently and lying in wait for a concientious user to detect and report and then likely falling upon you to fix.

• For a general developer, the test suite allows confidence in refactoring, switching out underlying libraries, maintenance, and upgrading.

## CTest and pytest, PSIthon and PsiAPI¶

In designing a test, sometimes you want it to be a model input for the user in a single file or you don’t want a lot of psi4. or Python syntax cluttering the input. In this case, follow Adding PSIthon Test Cases to prepare as PSIthon (psi4 input.dat) for, roughly speaking, running through ctest. The PSIthon/CTest test suite occupies the whole of psi4/tests except psi4/tests/pytests.

At other times you want the test to check several variations of a template job or you want to test error handling or you want to focus on PsiAPI rather than PSIthon or you want to control the compute conditions with environment variables. In this case, follow Adding PsiAPI Test Cases to prepare as PsiAPI (import psi4) for, roughly speaking, running through pytest. The PsiAPI/pytest test suite occupies psi4/tests/pytests.

The above description sounds as if there are two disjoint test suites, and you have to run both ctest and pytest to fully test PSI4. This has indeed been the case until March 2022. The difficulty has been that (1) two test suites is unexpected so some developers don’t know to run both; and (2) there are important tests in the PSIthon suite that can’t be run on a PSI4 installation since CTest only works in a build directory. Now, by adding an extra file to the test directory (test_input.py), PSIthon tests can also be run through pytest. This hasn’t rolled out to all ~500 PSIthon tests (help wanted), but eventually PSI4 can be tested with a single command from a build or from an installation. Therefore, in designing a test, choose its mode based on whether PSIthon or PsiAPI suits it better and whether it’s a simple model for users (probably PSIthon) or for expert users (probably PsiAPI). Both will continue to work in future, and neither have limitations.

## Test Contents¶

• Most PSI4 tests will be integration tests focusing on non-regression of user input to answers, and we insist on having these. But if you find unit tests helpful, by all means add them to the test suite.

• Most tests should store reference results (from literature or another implementation or a carefully run PSI4 calculation), run quantum chemistry, then apply one or more of the Comparison Functions so that the test will fail if the answer is unexpected. The functions are the same in CTest and pytest, but in the former they are, for example, compare_matrices(refmat, mat, ...) while in the latter it’s asserted, like assert compare_matrices(refmat, mat, ...). The main advantage of the testing functions is that they provide helpful error printing upon failure. Deep down, they’re NumPy functions.

• In preparing the test case reference values, aim for the converged value rather than many digits from your computer under default convergence conditions. This will make the test more robust for different OS, different BLAS libraries, and variations in SCF cycles. Turn energy, density, amplitude, and geometry convergence criteria to very tight levels, and use these results for reference energies, reference geometries, reference cube files, etc.. Then, either remove or relax the convergence settings, if these are not a vital part of the test. In choosing the number of digits for compare_values() and other compare_* functions, select a number looser than the convergence set in the test or the default convergence for the calculation type (energy, gradient, etc.).

• Keep tests as short as possible without sacrificing coverage and variety. Under 30 seconds is a good aim.

To create a new test case, first make a folder in psi4/tests or, for an addon, a subfolder under the addon folder. Use hyphens, not spaces or underscores, in the directory name. Add the directory name to the list of tests in psi4/tests/CMakeLists.txt or, for an addon, tests/<addon>/CMakeLists.txt. The test directory will need at least two files, CMakeLists.txt and input.dat.

### CMakeLists.txt¶

This file adds the test case to the suite. It should have at least the following two uncommented lines:

include(TestingMacros)

# if extra files needed
# file(COPY grid.dat DESTINATION \${CMAKE_CURRENT_BINARY_DIR})

# if minutes long
# set_tests_properties(isapt1 PROPERTIES COST 300)


The labels specify which groups of tests include the test case for ctest -L label purposes. The psi label should always be added, but the other labels are test-specific. The method tested should always be included, and this is often sufficient. If adding a test for an already existing module, the labels for other tests of the module will suggest other labels to add. Labels have been added as developers needed, so they are not systematic or thorough. If you see labels to add or rename, please do.

A test requiring over 15 minutes should be labeled longtests. A short test under 30 seconds used for general bug checking should be labeled quicktests. A test that confirms PSI4 is operational should be labeled smoketests.

If a test needs extra input files like grid.dat or extra reference files for checking against, like fchk, specify these in the CMakeLists.txt as shown above. Such tests must be run through ctest and don’t usually work when run “by hand” from the objdir via stage/bin/psi4 ../tests/directory_name/input.dat.

If a test is multiple minutes long, load-balancing a parallel CTest run requires the test to be started early. Use the COST line as shown above to set a weighting to about the number of seconds the test takes.

### input.dat¶

The other necessary file is the input file itself, input.dat. The input file should be just a simple input file to run the test, with small additions.

#! RI-SCF cc-pVTZ energy of water, with Z-matrix input and cc-pVTZ-RI auxiliary basis.
#! Also a bit more to force a second line.

nucenergy = 8.801466202085710  #TEST
refenergy = -76.05098402733282  #TEST

molecule h2o {
symmetry c1
O
H 1 1.0
H 1 1.0 2 104.5
}

set {
basis cc-pVTZ
scf_type df
df_basis_scf cc-pVTZ-RI
e_convergence 10
}

thisenergy = energy("hf")

compare_values(nucenergy, h2o.nuclear_repulsion_energy(), 9, "Nuclear repulsion energy")  #TEST
compare_values(refenergy, thisenergy, 9, "Reference energy")  #TEST
compare_values(refenergy, variable('scf total energy'), 9, "Reference energy")  #TEST


Of those small modifications, first, note the special comment at the top (starting with the #! comment marker). This should be descriptive since it is inlined into the manual (unless !nosample in the comment) as a sample input.

Reference values are often assigned to variables for later use. The compare_values function (along with several relatives in psi4/psi4/driver/p4util/testing.py for comparing strings, matrices, etc.) checks that the computed values match these reference values to suitable precision. This function prints an error message and signals that the test failed to the make system, if the values don’t match. Any lines of the input associated with the validation process should be flagged with #TEST at the end of each line, so that they can be removed when copying from the tests to the samples directory.

### output.ref¶

When your test case is in final form, run it locally, rename the output to output.ref, and check it into the repository alongside input.dat. While this isn’t used for any testing machinery (except for the nearly decommissioned psi4/tests/psitest.pl for CC tests; full decommission expected by v1.6), it can be handy for users or developers to consult.

### test_input.py¶

Starting March 2022, one can also run tests designed as above for CTest through pytest. To bring the test to pytest’s notice, add a file to the directory named test_input.py. Below is an example for the psi4/tests/ci-property/test_input.py

from addons import *

@ctest_labeler("quick;ci;cas;properties;cart;noc1")
def test_ci_property():
ctest_runner(__file__, ["grid.dat"])


This file contains much the same information as the CMakeLists.txt. The def test_ci_property contains the name of the test, now with underscores rather than hyphens. def test_ identifies it to pytest as a test. That part of the function name and the name of the file, test_input.py are required, but no further registration with CMake is necessary. Most tests need only the simple form of the runner line ctest_runner(__file__). This uses QCEngine machinery to execute python psi4 input.dat. If additional input files are needed from the test directory, their names can be added to the the second argument list as shown above. Those additional input files do need to be registered in psi4/psi4/CMakeLists.txt.

Finally, the label string passed to CTest is here handed to pytest, with a few changes:

• psi added automatically, so exclude it when copying from CTest CMakeLists.txt

• cli added automatically to distinguish CTest origin from deliberate pytest origin, which have api added

• smoke used instead of CTest smoketests

• quick used instead of CTest quicktests

• long used instead of CTest longtests

• addon and <name-of-addon> added automatically when @uusing("<name-of-addon>") decorates the test or marks=using("<name-of-addon>") marks the test

CTest “labels” are called “marks” in pytest. Any new marks should be added to psi4/pytest.ini.

### Running for Debugging¶

• PSIthon tests that don’t need extra files to run are easily run from <objdir> via stage/bin/psi4 ../tests/<test-name>/input.dat, with the output appearing in ../tests/<test-name>/input.out.

• All PSIthon tests are runable through CTest, and output files appear in <objdir>/tests/<test-name>/output.dat and stdout results appear in <objdir>/Testing/Temporary/LastTest.log*.

To create a new test case, either create a new file or add to an existing file under psi4/tests/pytests.

• Test must be in the psi4/tests/pytests/ directory.

• Test file name must start with test_. This is how pytest knows to collect it.

• A test file may contain many tests, each of which is an ordinary Python function with name starting test_. This is how pytest knows to collect it.

• No registration required to bring a test to pytest’s attention.

• No registration required to bring a test to CMake’s attention. If a test needs additional files, register them in psi4/psi4/CMakeLists.txt.

A few notes on test contents:

• Import testing functions from utils and use Python assert: assert compare_values(expected, computed, ...).

• Don’t worry about cleaning up files or resetting options. A function in psi4/tests/pytests/conftest.py does this automatically between every test.

• Especially if using data or functions from outside a test, run a variety of tests at different pytest -n <N> levels to mix up test ordering. If tests fail that pass when run alone, you’ve got a function of the same name changing state or some similar correctable phenomenon.

A few notes on test labels:

• For every new test file, add pytestmark = [pytest.mark.psi, pytest.mark.api] at the top. This ensures that every test has the psi mark and every PsiAPI test has the api mark to contrast with PSIthon tests with cli mark.

• There are individual “marks” that can be added to whole tests or parts of parameterized tests so that they can be run by category (pytest -m <mark> vs. ctest -L <mark>) rather than just by name (pytest -k <name_fragment> vs. ctest -R <name_fragment>). Far more complicated logic is allowed than for CTest: pytest -m "dftd3 and not api and not long".

• The most important marks are “quick” and “long” that opt tests into the quick CI suite or out of the normal full suite. Mark with a decorator for the full test or the marks argument in a parameterized test. Search “mark” in the test suite for examples. Use “quick” freely for tests that cover functionality and are under 15s. Use “long” sparingly to winnow out the longest examples, particularly those over a minute.

### Running for Debugging¶

There are many ways to run pytest, How to test a Psi4 installation, and three different copies of the test file (i.e., psi4/tests/pytests/test_mp2.py, <objdir>/stage/lib/PYMOD_INSTALL_LIBDIR/psi4/tests/test_mp2.py, CMAKE_INSTALL_PREFIX/lib/PYMOD_INSTALL_LIBDIR/psi4/tests/test_mp2.py). But for developing a pytest test, you probably want to use the first so you can edit it in place rather than running cmake --build after each change.

• Easiest is from <objdir>, run pytest ../tests. Add any filters (-k test_name_fragment) or parallelism (-n <N> or -n auto if pytest-xdist installed) or print test names (-v) or print warnings (-rws).

• An important point is that because they’re PsiAPI, import psi4 is happening, so the <objdir> PSI4 module must be in PYTHONPATH. Also, any call to QCEngine is using which psi4, so the <objdir> PSI4 executable must be in PATH. The easiest way to prepare your local environment is to execute the printout of <objdir>/stage/bin/psi4 --psiapi.

• To see stdout output from an otherwise passing test, easiest to add assert 0 at its end to trigger failure.

• If stdout printing is insufficient, and you really need to see output.dat or other files, comment out their deletion in psi4/tests/pytests/conftest.py and run the single test, deleting the file each time (since it appends).

## Comparison Functions¶

### Plain Old Data¶

psi4.compare_values(expected, computed, atol_exponent, label[, *, **kwargs])
psi4.compare_values(expected, computed[, label, *, **kwargs])

Comparison function for float or float array-like data structures. See qcelemental.testing.compare_values() for details.

psi4.compare_arrays is an old comparison function for float NumPy arrays that is now an alias to this.

Handles both Psi4-style signatures ((expected, computed, atol_exponent, label); see atol_exponent parameter below) and QCA-style signatures ((expected, computed, label)).

Parameters:

atol_exponent (int or float) – Absolute tolerance (see formula in qcelemental.testing.compare_values() notes). Values less than one are taken literally; one or greater taken as decimal digits for comparison. So 1 means atol=0.1 and 2 means atol=0.01 but 0.04 means atol=0.04 Note that the largest expressable processed atol will be ~0.99.

qcelemental.testing.compare_values(expected, computed, label=None, *, atol=1e-06, rtol=1e-16, equal_nan=False, equal_phase=False, passnone=False, quiet=False, return_message=False, return_handler=None)[source]

Returns True if two floats or float arrays are element-wise equal within a tolerance.

Parameters:
Return type:
Returns:

• allclose (bool) – Returns True if expected and computed are equal within tolerance; False otherwise.

• message (str) – When return_message=True, also return passed or error message.

Notes

absolute(computed - expected) <= (atol + rtol * absolute(expected))

psi4.compare_integers(expected, computed[, label, *, **kwargs])

Comparison function for integers, strings, booleans, or integer array-like data structures. See qcelemental.testing.compare() for details.

psi4.compare_strings is an alias to this.

qcelemental.testing.compare(expected, computed, label=None, *, equal_phase=False, quiet=False, return_message=False, return_handler=None)[source]

Returns True if two integers, strings, booleans, or integer arrays are element-wise equal.

Parameters:
Return type:
Returns:

• allclose (bool) – Returns True if expected and computed are equal; False otherwise.

• message (str) – When return_message=True, also return passed or error message.

Notes

### Objects¶

psi4.compare_matrices(expected, computed, atol_exponent, label[, *, check_name=False, **kwargs])
psi4.compare_matrices(expected, computed[, label, *, check_name=False, **kwargs])

Comparison function for psi4.core.Matrix objects. Compares Matrix properties of name (optional through check_name), nirrep, symmetry, and number of rows and columns for each irrep. For comparing actual numerical contents, the matrices are serialized to NumPy array format and passed to qcelemental.testing.compare_recursive().

Handles both Psi4-style signatures ((expected, computed, atol_exponent, label); see atol_exponent parameter below) and QCA-style signatures ((expected, computed, label)).

Parameters:

atol_exponent (int or float) – Absolute tolerance (see formula in qcelemental.testing.compare_values() notes). Values less than one are taken literally; one or greater taken as decimal digits for comparison. So 1 means atol=0.1 and 2 means atol=0.01 but 0.04 means atol=0.04 Note that the largest expressable processed atol will be ~0.99.

qcelemental.testing.compare_recursive(expected, computed, label=None, *, atol=1e-06, rtol=1e-16, forgive=None, equal_phase=False, quiet=False, return_message=False, return_handler=None)[source]

Recursively compares nested structures such as dictionaries and lists.

Parameters:
Return type:
Returns:

• allclose (bool) – Returns True if expected and computed are equal within tolerance; False otherwise.

• message (str) – When return_message=True, also return passed or error message.

Notes

absolute(computed - expected) <= (atol + rtol * absolute(expected))

psi4.compare_vectors(expected, computed, atol_exponent, label[, *, check_name=False, **kwargs])
psi4.compare_vectors(expected, computed[, label, *, check_name=False, **kwargs])

Comparison function for psi4.core.Vector objects. Compares Vector properties of name (optional through check_name), nirrep, and dimension of each irrep. For comparing actual numerical contents, the vectors are serialized to NumPy array format and passed to qcelemental.testing.compare_recursive().

Handles both Psi4-style signatures ((expected, computed, atol_exponent, label); see atol_exponent parameter below) and QCA-style signatures ((expected, computed, label)).

Parameters:

atol_exponent (int or float) – Absolute tolerance (see formula in qcelemental.testing.compare_values() notes). Values less than one are taken literally; one or greater taken as decimal digits for comparison. So 1 means atol=0.1 and 2 means atol=0.01 but 0.04 means atol=0.04 Note that the largest expressable processed atol will be ~0.99.

psi4.compare_wavefunctions(expected, computed, atol_exponent, label[, *, check_name=False, **kwargs])
psi4.compare_wavefunctions(expected, computed[, label, *, check_name=False, **kwargs])

Comparison function for psi4.core.Wavefunction objects. Compares over 30 Wavefunction properties, including nirrep, nso, molecule geometry, basis set nbf, density matrices, gradient results, etc.

Handles both Psi4-style signatures ((expected, computed, atol_exponent, label); see atol_exponent parameter below) and QCA-style signatures ((expected, computed, label)).

Parameters:

atol_exponent (int or float) – Absolute tolerance (see formula in qcelemental.testing.compare_values() notes). Values less than one are taken literally; one or greater taken as decimal digits for comparison. So 1 means atol=0.1 and 2 means atol=0.01 but 0.04 means atol=0.04 Note that the largest expressable processed atol will be ~0.99.

psi4.compare_molrecs(expected, computed[, label, *, check_name=False, **kwargs])

Comparison function for psi4.core.Molecule.to_dict() objects. See qcelemental.testing.compare_molrecs() for details.

Note only QCA-style signature ((expected, computed, label)) available.

qcelemental.testing.compare_molrecs(expected, computed, label=None, *, atol=1e-06, rtol=1e-16, forgive=None, verbose=1, relative_geoms='exact', return_message=False, return_handler=None)[source]

Function to compare Molecule dictionaries.

Return type:

bool

Parameters:

### File Formats¶

psi4.compare_cubes(expected, computed[, label, *, check_name=False, **kwargs])

Comparison function for volumetric data in cube file format. Compares only the volumetric data, not the voxel data or molecular geometry or other header contents. The volumetric data is passed to qcelemental.testing.compare_values().

Note only QCA-style signature ((expected, computed, label)) available.

Parameters:
• expected – Reference cube file against which computed is compared. Read by numpy.genfromtxt() so expected can be any of file, str, pathlib.Path, list of str, generator.

• computed – Input cube file to compare against expected. Read by numpy.genfromtxt() so computed can be any of file, str, pathlib.Path, list of str, generator.

psi4.compare_fchkfiles(expected, computed, atol_exponent, label)[source]

Comparison function for output data in FCHK (formatted checkpoint) file format. Compares many fields including number of electrons, highest angular momentum, basis set exponents, densities, final gradient.

Note only Psi4-style signature ((expected, computed, atol_exponent, label)) available.

An older format description can be found here http://wild.life.nctu.edu.tw/~jsyu/compchem/g09/g09ur/f_formchk.htm It lists more fields (logical, character) that are not included in this test function. They should be covered by the string comparison. This function is only meant to work with PSI4’s FCHK files.

Parameters:
• expected (str) – Path to reference FCHK file against which computed is compared.

• computed (str) – Path to input FCHK file to compare against expected.

• atol_exponent (Union[int, float]) – Absolute tolerance for high accuracy fields – 1.e-8 or 1.e-9 is suitable. Values less than one are taken literally; one or greater taken as decimal digits for comparison. So 1 means atol=0.1 and 2 means atol=0.01 but 0.04 means atol=0.04 Note that the largest expressable processed atol will be ~0.99.

• label (str) – Label for passed and error messages.

psi4.compare_fcidumps(expected, computed, label)[source]

Comparison function for FCIDUMP files. Compares the first six below, then computes energies from MO integrals and compares the last four.

• ‘norb’ : number of basis functions

• ‘nelec’ : number of electrons

• ‘ms2’ : spin polarization of the system

• ‘isym’ : symmetry of state (if present in FCIDUMP)

• ‘orbsym’ : list of symmetry labels of each orbital

• ‘uhf’ : whether restricted or unrestricted

• ‘ONE-ELECTRON ENERGY’ : SCF one-electron energy

• ‘TWO-ELECTRON ENERGY’ : SCF two-electron energy

• ‘SCF TOTAL ENERGY’ : SCF total energy

• ‘MP2 CORRELATION ENERGY’ : MP2 correlation energy

Parameters:
psi4.compare_moldenfiles(expected, computed, atol_exponent=1e-07, label='Compare Molden')[source]

Comparison function for output data in Molden file format. Compares many fields including geometry, basis set, occupations, symmetries, energies.

Note only Psi4-style signature ((expected, computed, atol_exponent, label)) available.

A format description is found https://www3.cmbi.umcn.nl/molden/molden_format.html

Parameters:
• expected (str) – Path to reference Molden file against which computed is compared.

• computed (str) – Path to input Molden file to compare against expected.

• atol_exponent (Union[int, float]) – Absolute tolerance for high accuracy fields – 1.e-8 or 1.e-9 is suitable. Values less than one are taken literally; one or greater taken as decimal digits for comparison. So 1 means atol=0.1 and 2 means atol=0.01 but 0.04 means atol=0.04 Note that the largest expressable processed atol will be ~0.99.

• label (str) – Label for passed and error messages.

qcdb.compare_vibinfos(expected, computed, tol, label, verbose=1, forgive=None, required=None, toldict=None)[source]

Returns True if two dictionaries of vibration Datum objects are equivalent within a tolerance.

Parameters:
Returns:

allclose – Returns True if expected and computed are equal within tolerance; False otherwise.

Return type:

bool

### Extra QCA Functions¶

psi4.compare(expected, computed, label=None, *, equal_phase=False, quiet=False, return_message=False, return_handler=<function _psi4_true_raise_handler>)

Returns True if two integers, strings, booleans, or integer arrays are element-wise equal.

Parameters:
• expected (Union[int, bool, str, List[int], ndarray]) – int, bool, str or array-like of same. Reference value against which computed is compared.

• computed (Union[int, bool, str, List[int], ndarray]) – int, bool, str or array-like of same. Input value to compare against expected.

• label (str) – Label for passed and error messages. Defaults to calling function name.

• equal_phase (bool) – Compare computed or its opposite as equal.

• quiet (bool) – Whether to log the return message.

• return_message (bool) – Whether to return tuple. See below.

• return_handler (Callable) – Function to control printing, logging, raising, and returning. Specialized interception for interfacing testing systems.

Returns:

• allclose (bool) – Returns True if expected and computed are equal; False otherwise.

• message (str) – When return_message=True, also return passed or error message.

Return type:

Union[bool, Tuple[bool, str]]

Notes

psi4.compare_recursive(expected, computed, *args, **kwargs)

Comparison function for recursively comparing mixed-type and nested structures such as dictionaries and lists. See qcelemental.testing.compare_recursive() for details.