Main API Interface (cheminfo_api)

This module is the main entry point for all user interactions. It is designed around two core philosophies to cater to different use cases:

  1. Bulk Data Retrieval (Engine): The get_properties() function is the workhorse of this library. It achieves maximum efficiency by fetching multiple properties for multiple compounds in a single, consolidated API call. It returns a Pandas DataFrame, making it ideal for data analysis, scripting, and integration with the scientific computing ecosystem (e.g., RDKit, Scikit-learn).

  2. Convenience and Object-Oriented Access (Interface): A series of get_<property>() functions provide direct access to individual data points. For scenarios requiring comprehensive, type-safe data, get_compound() returns a fully validated Pydantic Compound object.

Core Bulk Processing Function

ChemInformant.cheminfo_api.get_properties(identifiers: int | str | List[int] | List[str], properties: str | List[str] | None = None, *, namespace: str = 'cid', include_3d: bool = False, all_properties: bool = False, **kwargs) DataFrame[source]

Retrieve chemical properties for one or more compounds from PubChem.

This function is the core interface for fetching molecular properties. It accepts various types of chemical identifiers and returns data in a standardized snake_case format with consistent column ordering and error handling.

Parameters:
  • identifiers – Chemical identifier(s) to look up. Can be: - Single identifier: string name, CID number, or SMILES - List of identifiers: mixed types allowed Examples: β€œaspirin”, 2244, β€œCC(=O)OC1=CC=CC=C1C(=O)O”

  • properties – Specific properties to retrieve. Can be: - None: Returns core property set (default) - String: Single property name or comma-separated list - List: Multiple property names Supports both snake_case (β€œmolecular_weight”) and CamelCase (β€œMolecularWeight”)

  • namespace – Input identifier namespace (currently only β€œcid” supported)

  • include_3d – If True and properties=None, includes 3D molecular descriptors in addition to core properties. Ignored when properties is specified.

  • all_properties – If True, retrieves all ~40 available properties from PubChem. Mutually exclusive with properties and include_3d parameters.

  • **kwargs – Additional keyword arguments (for future compatibility)

Returns:

Results with columns:
  • input_identifier: Original input as provided

  • cid: PubChem Compound ID (string type)

  • status: β€œOK” for success, exception name for failures

  • [property columns]: Requested properties in snake_case format

Column order preserves the original properties parameter order. Failed lookups return rows with status != β€œOK” and missing property values.

Return type:

pd.DataFrame

Raises:

ValueError – If unsupported properties are requested, or if all_properties=True is used with other property selection parameters

Property Categories:
Core properties (default): molecular_weight, molecular_formula, canonical_smiles,

isomeric_smiles, iupac_name, cas, synonyms, xlogp, tpsa, complexity, h_bond_donor_count, h_bond_acceptor_count, rotatable_bond_count, heavy_atom_count, charge, atom_stereo_count, bond_stereo_count, covalent_unit_count, in_ch_i, in_ch_i_key

3D properties: volume_3d, feature_count_3d, conformer_count_3d, etc.

Special properties: cas (CAS Registry Number), synonyms (list of names)

Examples

>>> # Get core properties for a single compound
>>> df = get_properties("aspirin")
>>> print(df.columns.tolist())
['input_identifier', 'cid', 'status', 'molecular_weight', ...]
>>> # Get specific properties for multiple compounds
>>> df = get_properties(["aspirin", "caffeine"], properties=["molecular_weight", "xlogp"])
>>> print(df[["input_identifier", "molecular_weight", "xlogp"]])
>>> # Get all available properties
>>> df = get_properties("aspirin", all_properties=True)
>>> print(f"Retrieved {len(df.columns)} columns")
>>> # Include 3D descriptors with core set
>>> df = get_properties("aspirin", include_3d=True)
>>> # Handle mixed input types and failures
>>> df = get_properties([2244, "invalid_name", "CC(=O)O"])
>>> print(df[["input_identifier", "status"]])

Notes

  • Uses intelligent fallbacks for SMILES properties (canonical_smiles falls back to connectivity_smiles if canonical is unavailable)

  • Automatically handles API pagination for large batch requests

  • Results are cached to improve performance on repeated queries

  • All property names in output use snake_case format for consistency

  • CID column is returned as string type to handle large compound IDs

Object-Oriented Fetchers

ChemInformant.cheminfo_api.get_compound(identifier: str | int) Compound[source]

Retrieve a complete Compound object with all available properties.

This function fetches all properties for a single compound and returns a structured Compound object with type validation and convenient access to all molecular data.

Parameters:

identifier – Chemical identifier (name, CID, or SMILES)

Returns:

Compound object with all available properties as attributes

Raises:

Examples

>>> compound = get_compound("aspirin")
>>> print(compound.MolecularWeight)
180.16
>>> print(compound.CanonicalSMILES)
'CC(=O)OC1=CC=CC=C1C(=O)O'

Note

This function uses CamelCase property names to match the Compound model. For DataFrame output with snake_case names, use get_properties() instead.

ChemInformant.cheminfo_api.get_compounds(identifiers: Iterable[str | int]) List[Compound][source]

Retrieve multiple Compound objects for a list of identifiers.

This function processes multiple chemical identifiers and returns a list of Compound objects. Failed lookups will raise exceptions.

Parameters:

identifiers – Iterable of chemical identifiers (names, CIDs, or SMILES)

Returns:

List of Compound objects in the same order as input identifiers

Raises:

Examples

>>> compounds = get_compounds(["aspirin", "caffeine"])
>>> for comp in compounds:
...     print(f"{comp.InputIdentifier}: {comp.MolecularWeight}")
aspirin: 180.16
caffeine: 194.19

Note

For batch processing with error handling, consider using get_properties() which returns a DataFrame with status information for failed lookups.

Convenience Lookups

Basic Properties

ChemInformant.cheminfo_api.get_weight(id_: str | int) float | None[source]

Get the molecular weight of a compound.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

Molecular weight in g/mol, or None if compound not found

Examples

>>> get_weight("aspirin")
180.16
>>> get_weight(2244)  # Same as above using CID
180.16
ChemInformant.cheminfo_api.get_formula(id_: str | int) str | None[source]

Get the molecular formula of a compound.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

Molecular formula string, or None if compound not found

Examples

>>> get_formula("aspirin")
'C9H8O4'
>>> get_formula("water")
'H2O'
ChemInformant.cheminfo_api.get_cas(id_: str | int) str | None[source]

Get the CAS Registry Number of a compound.

CAS (Chemical Abstracts Service) numbers are unique identifiers assigned to chemical substances by the American Chemical Society.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

CAS Registry Number as string, or None if not found

Examples

>>> get_cas("aspirin")
'50-78-2'
>>> get_cas("water")
'7732-18-5'
ChemInformant.cheminfo_api.get_iupac_name(id_: str | int) str | None[source]

Get the IUPAC (systematic) name of a compound.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

IUPAC name string, or None if compound not found

Examples

>>> get_iupac_name("aspirin")
'2-acetyloxybenzoic acid'
>>> get_iupac_name("water")
'oxidane'

SMILES and Identifiers

ChemInformant.cheminfo_api.get_canonical_smiles(id_: str | int) str | None[source]

Get the canonical SMILES representation of a compound.

Canonical SMILES provide a unique string representation of molecular structure with consistent atom ordering and standardized conventions.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

Canonical SMILES string, or None if compound not found

Examples

>>> get_canonical_smiles("aspirin")
'CC(=O)OC1=CC=CC=C1C(=O)O'
>>> get_canonical_smiles(2244)
'CC(=O)OC1=CC=CC=C1C(=O)O'
ChemInformant.cheminfo_api.get_isomeric_smiles(id_: str | int) str | None[source]

Get the isomeric SMILES representation of a compound.

Isomeric SMILES include stereochemical information and isotope specifications, providing more detailed structural information than canonical SMILES.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

Isomeric SMILES string, or None if compound not found

Examples

>>> get_isomeric_smiles("glucose")
'C([C@@H]1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O)O'
ChemInformant.cheminfo_api.get_inchi(id_: str | int) str | None[source]

Get the InChI (International Chemical Identifier) of a compound.

InChI is a standardized string representation developed by IUPAC for uniquely identifying chemical substances across databases.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

InChI string, or None if compound not found

Examples

>>> get_inchi("aspirin")
'InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)'
ChemInformant.cheminfo_api.get_inchi_key(id_: str | int) str | None[source]

Get the InChI Key (hashed version of InChI) of a compound.

InChI Key is a fixed-length (27 character) hash of the InChI, designed for database searching and web queries.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

InChI Key string, or None if compound not found

Examples

>>> get_inchi_key("aspirin")
'BSYNRYMUTXBXSQ-UHFFFAOYSA-N'

Molecular Descriptors

ChemInformant.cheminfo_api.get_xlogp(id_: str | int) float | None[source]

Get the XLogP value (octanol-water partition coefficient) of a compound.

XLogP is a key descriptor for drug discovery, indicating lipophilicity and membrane permeability. Values typically range from -3 to +10.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

XLogP value (log units), or None if compound not found

Examples

>>> get_xlogp("aspirin")
1.2
>>> get_xlogp("water")
-0.7
ChemInformant.cheminfo_api.get_tpsa(id_: str | int) float | None[source]

Get the Topological Polar Surface Area (TPSA) of a compound.

TPSA is a key descriptor for drug discovery, predicting membrane permeability and blood-brain barrier penetration. Values < 90 Ε² suggest good oral bioavailability.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

TPSA value in Ε² (square Angstroms), or None if compound not found

Examples

>>> get_tpsa("aspirin")
63.6
ChemInformant.cheminfo_api.get_complexity(id_: str | int) float | None[source]

Get the molecular complexity score of a compound.

Complexity is a measure of structural intricacy based on symmetry, branching, and ring systems. Higher values indicate more complex structures.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

Complexity score (unitless), or None if compound not found

Examples

>>> get_complexity("aspirin")
212

Mass Properties

ChemInformant.cheminfo_api.get_exact_mass(id_: str | int) float | None[source]

Get the exact mass of a compound.

Exact mass is the sum of atomic masses using the most abundant isotopes. Used in mass spectrometry for precise compound identification.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

Exact mass in Da (atomic mass units), or None if compound not found

Examples

>>> get_exact_mass("aspirin")
180.04225873
ChemInformant.cheminfo_api.get_monoisotopic_mass(id_: str | int) float | None[source]

Get the monoisotopic mass of a compound.

Monoisotopic mass is calculated using the most abundant isotope of each element. Important for mass spectrometry and structural analysis.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

Monoisotopic mass in Da, or None if compound not found

Examples

>>> get_monoisotopic_mass("aspirin")
180.04225873

Molecular Counts

ChemInformant.cheminfo_api.get_h_bond_donor_count(id_: str | int) int | None[source]

Get the number of hydrogen bond donors in a compound.

Counts atoms that can donate hydrogen bonds (typically N, O with H). Important for drug design and predicting molecular interactions.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

Number of H-bond donors, or None if compound not found

Examples

>>> get_h_bond_donor_count("aspirin")
1
ChemInformant.cheminfo_api.get_h_bond_acceptor_count(id_: str | int) int | None[source]

Get the number of hydrogen bond acceptors in a compound.

Counts atoms that can accept hydrogen bonds (typically N, O). Key descriptor for drug-like properties and solubility prediction.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

Number of H-bond acceptors, or None if compound not found

Examples

>>> get_h_bond_acceptor_count("aspirin")
4
ChemInformant.cheminfo_api.get_rotatable_bond_count(id_: str | int) int | None[source]

Get the number of rotatable bonds in a compound.

Rotatable bonds are acyclic single bonds between non-terminal heavy atoms. Indicates molecular flexibility, important for drug binding and bioavailability.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

Number of rotatable bonds, or None if compound not found

Examples

>>> get_rotatable_bond_count("aspirin")
3
ChemInformant.cheminfo_api.get_heavy_atom_count(id_: str | int) int | None[source]

Get the number of heavy atoms (non-hydrogen atoms) in a compound.

Heavy atoms include all atoms except hydrogen. This is a basic measure of molecular size and complexity.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

Number of heavy atoms, or None if compound not found

Examples

>>> get_heavy_atom_count("aspirin")
13
ChemInformant.cheminfo_api.get_charge(id_: str | int) int | None[source]

Get the formal charge of a compound.

The total formal charge of the molecule, indicating whether it’s neutral (0), positively charged (+), or negatively charged (-).

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

Formal charge (integer), or None if compound not found

Examples

>>> get_charge("aspirin")
0

Stereochemistry

ChemInformant.cheminfo_api.get_atom_stereo_count(id_: str | int) int | None[source]

Get the number of stereocenters (chiral centers) in a compound.

Counts atoms with defined stereochemistry, important for understanding the three-dimensional structure and potential biological activity.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

Number of stereocenters, or None if compound not found

Examples

>>> get_atom_stereo_count("glucose")
4
ChemInformant.cheminfo_api.get_bond_stereo_count(id_: str | int) int | None[source]

Get the number of stereo bonds (E/Z double bonds) in a compound.

Counts double bonds with defined stereochemistry (cis/trans or E/Z). Important for understanding molecular geometry and reactivity.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

Number of stereo bonds, or None if compound not found

Examples

>>> get_bond_stereo_count("retinol")
4
ChemInformant.cheminfo_api.get_covalent_unit_count(id_: str | int) int | None[source]

Get the number of covalently bonded units in a compound.

For most organic molecules this is 1. Higher values indicate multiple separate molecular components or fragments.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

Number of covalent units, or None if compound not found

Examples

>>> get_covalent_unit_count("aspirin")
1

Synonyms and Names

ChemInformant.cheminfo_api.get_synonyms(id_: str | int) List[str][source]

Get all known synonyms (alternative names) for a compound.

Returns a comprehensive list of names including common names, brand names, systematic names, and other identifiers used for the compound.

Parameters:

id – Chemical identifier (name, CID, or SMILES)

Returns:

List of synonym strings, empty list if compound not found

Examples

>>> synonyms = get_synonyms("aspirin")
>>> print(synonyms[:3])  # First few names
['aspirin', 'acetylsalicylic acid', '2-acetyloxybenzoic acid']

Visualization Functions

ChemInformant.cheminfo_api.draw_compound(identifier: str | int)[source]

Draw the 2D chemical structure of a compound.

This function fetches the chemical structure image from PubChem and displays it using matplotlib. Requires matplotlib and PIL to be installed.

Parameters:

identifier – A compound identifier (name, CID, or SMILES)

Raises:
  • NotFoundError – If the identifier cannot be resolved to a valid compound

  • ImportError – If required dependencies (matplotlib, PIL) are not installed