Using data
objects¶
Any pandas.DataFrame
indexed by names of chemical species is a valid data
object in pyrrole [1]:
>>> import pandas as pd
>>> data = pd.DataFrame(
... [{'name': 'CO3-2(aq)', 'freeenergy': -527.8},
... {'name': 'HCO3-(aq)', 'freeenergy': -586.85},
... {'name': 'H2CO3(aq)', 'freeenergy': -623.1},
... {'name': 'OH-(aq)', 'freeenergy': -157.2},
... {'name': 'H2O(l)', 'freeenergy': -237.14}])
>>> data = data.set_index('name')
>>> data # doctest: +NORMALIZE_WHITESPACE
freeenergy
name
CO3-2(aq) -527.80
HCO3-(aq) -586.85
H2CO3(aq) -623.10
OH-(aq) -157.20
H2O(l) -237.14
The pandas library, a dependency of pyrrole, can be used to create data
objects.
Below are examples of creating data
objects from different sources.
Reading local files¶
Pandas can read data sets in various formats, such as comma-separated values (CSV), Google BigQuery, Hierarchical Data Format (HDF), JavaScript Object Notation (JSON), Microsoft Excel, and many other supported format types:
>>> data = pd.read_hdf("data/acetate/data.h5")
>>> data[['jobfilename', 'freeenergy', 'enthalpy']]
jobfilename freeenergy enthalpy
0 data/acetate/acetate.out -228.000450 -227.969431
1 data/acetate/acetate@water.out -228.120113 -228.089465
2 data/acetate/acetic_acid.out -228.564509 -228.533374
3 data/acetate/acetic_acid@water.out -228.575268 -228.544332
Pyrrole requires indices to represent names of chemical species, which is, like above, not always the case.
Setting meaningful indices can be accomplished by feeding a custom function to data.apply
:
>>> def update(series):
... """Compute a new column 'name' and add it to row."""
... series['name'] = (series['jobfilename']
... .replace('data/acetate/', '')
... .replace('.out', ''))
... series['name'] = (series['name']
... .replace('acetate', 'AcO-')
... .replace('acetic_acid', 'AcOH'))
... series['name'] = series['name'].replace('@water', '(aq)')
... if '(aq)' not in series['name']:
... series['name'] += "(g)"
... return series
The function above should be applied to the data
object, which can then be reindexed:
>>> data = data.apply(update, axis='columns').set_index('name')
>>> data[['jobfilename', 'freeenergy', 'enthalpy']] # doctest: +NORMALIZE_WHITESPACE
jobfilename freeenergy enthalpy
name
AcO-(g) data/acetate/acetate.out -228.000450 -227.969431
AcO-(aq) data/acetate/acetate@water.out -228.120113 -228.089465
AcOH(g) data/acetate/acetic_acid.out -228.564509 -228.533374
AcOH(aq) data/acetate/acetic_acid@water.out -228.575268 -228.544332
The data
object is now ready to be used:
>>> from pyrrole import ChemicalSystem
>>> system = ChemicalSystem(['AcO-(g) <=> AcO-(aq)',
... 'AcOH(g) <=> AcOH(aq)'],
... data['freeenergy'])
>>> system.to_dataframe() # doctest: +NORMALIZE_WHITESPACE
freeenergy
chemical_equation
AcO-(g) <=> AcO-(aq) -0.119663
AcOH(g) <=> AcOH(aq) -0.010759
In Getting started, we showed how to use create_data
to produce a data
object by reading output files from computational chemistry programs.
Reading lots of logfiles is slow, which is why storing the data in a file translates to faster retrievals later.
This can be accomplished with ccframe, a command-line tool that is part of cclib (a dependency of pyrrole).
In fact, the file data.h5
used in the example above was produced using ccframe:
$ ccframe -O data/acetate/data.h5 data/acetate*out \
data/acetic_acid*out
Learn more about ccframe in both its help page ($ ccframe -h
) and documentation.
Reading the web¶
There’s a lot of freely available data on the internet. For instance, NIST offers enthalpies of formation at 0K (in kJ/mol). Luckily, pandas supports reading HTML tables directly:
>>> url = "https://cccbdb.nist.gov/hf0k.asp"
>>> data = pd.read_html(url, header=0)[3] # fourth table in page
>>> data = data.set_index("Species")
>>> data = data[["Name", "Hfg 0K", "DOI"]]
>>> data.head() # doctest: +NORMALIZE_WHITESPACE
Name Hfg 0K DOI
Species
D Deuterium atom 219.8 NaN
H Hydrogen atom 216.0 10.1002/bbpc.19900940121
H+ Hydrogen atom cation 1528.1 NaN
D2 Deuterium diatomic 0.0 NaN
H2 Hydrogen diatomic 0.0 10.1002/bbpc.19900940121
This data allows us to calculate the bond-dissociation enthalpy of the hydrogen molecule at 0K, for instance:
>>> from pyrrole import ChemicalEquation
>>> equation = ChemicalEquation("H2 -> 2 H", data)
>>> equation.to_series()
Hfg 0K 432.0
Name: H2 -> 2 H, dtype: float64
That’s 432 kJ/mol, or 103.3 kcal/mol.
It’s time to take a deeper look at Systems and equations.
[1] | Obtained from standard Gibbs free energy of formation. |