******************** Using `data` objects ******************** Any `pandas.DataFrame` indexed by names of chemical species is a valid `data` object in pyrrole [#standard-gibbs-free-energy-of-formation]_: >>> import pandas as pd >>> data = pd.DataFrame( ... [{'name': 'CO3-2(aq)', 'freeenergy': -527.8}, ... {'name': 'HCO3-(aq)', 'freeenergy': -586.85}, ... {'name': 'H2CO3(aq)', 'freeenergy': -623.1}, ... {'name': 'OH-(aq)', 'freeenergy': -157.2}, ... {'name': 'H2O(l)', 'freeenergy': -237.14}]) >>> data = data.set_index('name') >>> data # doctest: +NORMALIZE_WHITESPACE freeenergy name CO3-2(aq) -527.80 HCO3-(aq) -586.85 H2CO3(aq) -623.10 OH-(aq) -157.20 H2O(l) -237.14 The `pandas library `_, a dependency of pyrrole, can be used to create `data` objects. Below are examples of creating `data` objects from different sources. Reading local files =================== Pandas can read data sets in various formats, such as `comma-separated values (CSV) `_, `Google BigQuery `_, `Hierarchical Data Format (HDF) `_, `JavaScript Object Notation (JSON) `_, `Microsoft Excel `_, and many `other supported format types `_: >>> data = pd.read_hdf("data/acetate/data.h5") >>> data[['jobfilename', 'freeenergy', 'enthalpy']] jobfilename freeenergy enthalpy 0 data/acetate/acetate.out -228.000450 -227.969431 1 data/acetate/acetate@water.out -228.120113 -228.089465 2 data/acetate/acetic_acid.out -228.564509 -228.533374 3 data/acetate/acetic_acid@water.out -228.575268 -228.544332 Pyrrole requires indices to represent names of chemical species, which is, like above, not always the case. Setting meaningful indices can be accomplished by feeding a custom function to `data.apply`: >>> def update(series): ... """Compute a new column 'name' and add it to row.""" ... series['name'] = (series['jobfilename'] ... .replace('data/acetate/', '') ... .replace('.out', '')) ... series['name'] = (series['name'] ... .replace('acetate', 'AcO-') ... .replace('acetic_acid', 'AcOH')) ... series['name'] = series['name'].replace('@water', '(aq)') ... if '(aq)' not in series['name']: ... series['name'] += "(g)" ... return series The function above should be applied to the `data` object, which can then be reindexed: >>> data = data.apply(update, axis='columns').set_index('name') >>> data[['jobfilename', 'freeenergy', 'enthalpy']] # doctest: +NORMALIZE_WHITESPACE jobfilename freeenergy enthalpy name AcO-(g) data/acetate/acetate.out -228.000450 -227.969431 AcO-(aq) data/acetate/acetate@water.out -228.120113 -228.089465 AcOH(g) data/acetate/acetic_acid.out -228.564509 -228.533374 AcOH(aq) data/acetate/acetic_acid@water.out -228.575268 -228.544332 The `data` object is now ready to be used: >>> from pyrrole import ChemicalSystem >>> system = ChemicalSystem(['AcO-(g) <=> AcO-(aq)', ... 'AcOH(g) <=> AcOH(aq)'], ... data['freeenergy']) >>> system.to_dataframe() # doctest: +NORMALIZE_WHITESPACE freeenergy chemical_equation AcO-(g) <=> AcO-(aq) -0.119663 AcOH(g) <=> AcOH(aq) -0.010759 In `getting-started`, we showed how to use `create_data` to produce a `data` object by reading output files from computational chemistry programs. Reading lots of logfiles is slow, which is why storing the data in a file translates to faster retrievals later. This can be accomplished with `ccframe `_, a command-line tool that is part of `cclib `_ (a dependency of pyrrole). In fact, the file ``data.h5`` used in the example above was produced using ccframe: .. code-block:: console $ ccframe -O data/acetate/data.h5 data/acetate*out \ data/acetic_acid*out Learn more about ccframe in both its help page (``$ ccframe -h``) and `documentation `_. Reading the web =============== There's a lot of freely available data on the internet. For instance, `NIST `_ offers `enthalpies of formation at 0K `_ (in kJ/mol). Luckily, pandas supports `reading HTML tables `_ directly: >>> url = "https://cccbdb.nist.gov/hf0k.asp" >>> data = pd.read_html(url, header=0)[3] # fourth table in page >>> data = data.set_index("Species") >>> data = data[["Name", "Hfg 0K", "DOI"]] >>> data.head() # doctest: +NORMALIZE_WHITESPACE Name Hfg 0K DOI Species D Deuterium atom 219.8 NaN H Hydrogen atom 216.0 10.1002/bbpc.19900940121 H+ Hydrogen atom cation 1528.1 NaN D2 Deuterium diatomic 0.0 NaN H2 Hydrogen diatomic 0.0 10.1002/bbpc.19900940121 This data allows us to calculate the `bond-dissociation enthalpy `_ of the hydrogen molecule at 0K, for instance: >>> from pyrrole import ChemicalEquation >>> equation = ChemicalEquation("H2 -> 2 H", data) >>> equation.to_series() Hfg 0K 432.0 Name: H2 -> 2 H, dtype: float64 That's 432 kJ/mol, or 103.3 kcal/mol. It's time to take a deeper look at `systems-and-equations`. .. [#standard-gibbs-free-energy-of-formation] Obtained from `standard Gibbs free energy of formation `_.