Main API Interface (cheminfo_api)ο
This module is the main entry point for all user interactions. It is designed around two core philosophies to cater to different use cases:
Bulk Data Retrieval (Engine): The
get_properties()function is the workhorse of this library. It achieves maximum efficiency by fetching multiple properties for multiple compounds in a single, consolidated API call. It returns a Pandas DataFrame, making it ideal for data analysis, scripting, and integration with the scientific computing ecosystem (e.g., RDKit, Scikit-learn).Convenience and Object-Oriented Access (Interface): A series of
get_<property>()functions provide direct access to individual data points. For scenarios requiring a curated, type-safe view of common compound fields,get_compound()returns a validated PydanticCompoundobject.
Core Bulk Processing Function
- ChemInformant.cheminfo_api.get_properties(identifiers: int | str | Sequence[int | str], properties: str | list[str] | None = None, *, include_3d: bool = False, all_properties: bool = False) DataFrame[source]ο
Retrieve chemical properties for one or more compounds from PubChem.
This function is the core interface for fetching molecular properties. It accepts various types of chemical identifiers and returns data in a standardized snake_case format with consistent column ordering and error handling.
- Parameters:
identifiers β Chemical identifier(s) to look up. Can be: - Single identifier: string name, CID number, or SMILES - Sequence of identifiers: mixed names/CIDs/SMILES in one call Examples: βaspirinβ, 2244, [βaspirinβ, βcaffeineβ, 1983]
properties β Specific properties to retrieve. Can be: - None: Returns core property set (default) - String: Single property name or comma-separated list - List: Multiple property names Supports both snake_case (βmolecular_weightβ) and CamelCase (βMolecularWeightβ)
include_3d β If True and properties=None, includes 3D molecular descriptors in addition to core properties. Ignored when properties is specified.
all_properties β If True, retrieves all ~40 available properties from PubChem. Mutually exclusive with properties and include_3d parameters.
- Returns:
- Results with columns:
input_identifier: Original input as provided
cid: PubChem Compound ID (string type)
status: βOKβ for success, exception name for failures
[property columns]: Requested properties in snake_case format
Column order preserves the original properties parameter order. Failed lookups return rows with status != βOKβ and missing property values.
- Return type:
pd.DataFrame
- Raises:
ValueError β If unsupported properties are requested, or if all_properties=True is used with other property selection parameters
- Property Categories:
- Core properties (default): molecular_weight, molecular_formula, canonical_smiles,
isomeric_smiles, iupac_name, cas, synonyms, xlogp, tpsa, complexity, h_bond_donor_count, h_bond_acceptor_count, rotatable_bond_count, heavy_atom_count, charge, atom_stereo_count, bond_stereo_count, covalent_unit_count, in_ch_i, in_ch_i_key
3D properties: volume_3d, feature_count_3d, conformer_count_3d, etc.
Special properties: cas (CAS Registry Number), synonyms (list of names)
Examples
>>> # Get core properties for a single compound >>> df = get_properties("aspirin") >>> print(df.columns.tolist()) ['input_identifier', 'cid', 'status', 'molecular_weight', ...]
>>> # Get specific properties for multiple compounds >>> df = get_properties(["aspirin", "caffeine"], properties=["molecular_weight", "xlogp"]) >>> print(df[["input_identifier", "molecular_weight", "xlogp"]])
>>> # Get all available properties >>> df = get_properties("aspirin", all_properties=True) >>> print(f"Retrieved {len(df.columns)} columns")
>>> # Include 3D descriptors with core set >>> df = get_properties("aspirin", include_3d=True)
>>> # Handle mixed input types and failures >>> df = get_properties([2244, "invalid_name", "CC(=O)O"]) >>> print(df[["input_identifier", "status"]])
Notes
Uses intelligent fallbacks for SMILES properties (canonical_smiles falls back to connectivity_smiles if canonical is unavailable)
Automatically handles API pagination for large batch requests
Results are cached to improve performance on repeated queries
All property names in output use snake_case format for consistency
CID column is returned as string type to handle large compound IDs
Object-Oriented Fetchers
- ChemInformant.cheminfo_api.get_compound(identifier: str | int) Compound[source]ο
Retrieve a validated
Compoundobject for a single identifier.The returned object is a curated wrapper around the most commonly used fields (molecular formula, molecular weight, SMILES, IUPAC name, XLogP, CAS, synonyms). For the full set of ~40 PubChem properties as a DataFrame, use
get_properties()withall_properties=True.- Parameters:
identifier β Chemical identifier (name, CID, or SMILES)
- Returns:
A
Compoundobject. Attributes use snake_case (e.g.compound.molecular_weight,compound.canonical_smiles); CamelCase aliases are accepted as input only.- Raises:
NotFoundError β If the identifier cannot be resolved to any CID.
AmbiguousIdentifierError β If the identifier maps to multiple CIDs.
RuntimeError β If data retrieval itself fails for a resolved CID.
Examples
>>> compound = get_compound("aspirin") >>> print(compound.molecular_weight) 180.16 >>> print(compound.canonical_smiles) 'CC(=O)OC1=CC=CC=C1C(=O)O'
- ChemInformant.cheminfo_api.get_compounds(identifiers: Iterable[str | int]) list[Compound][source]ο
Retrieve a list of
Compoundobjects, one per identifier.- Parameters:
identifiers β Iterable of chemical identifiers (names, CIDs, or SMILES).
- Returns:
A list of
Compoundobjects in input order.- Raises:
NotFoundError β If any identifier cannot be resolved.
AmbiguousIdentifierError β If any identifier maps to multiple CIDs.
RuntimeError β If data retrieval fails for any resolved CID.
Examples
>>> compounds = get_compounds(["aspirin", "caffeine"]) >>> for comp in compounds: ... print(f"{comp.input_identifier}: {comp.molecular_weight}") aspirin: 180.16 caffeine: 194.19
Note
For batch processing that tolerates failures, use
get_properties()which returns a DataFrame with per-rowstatusinformation instead of raising on the first failure.
Convenience Lookups
Basic Properties
- ChemInformant.cheminfo_api.get_weight(id_: str | int) float | None[source]ο
Get the molecular weight of a compound.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Molecular weight in g/mol, or None if compound not found
Examples
>>> get_weight("aspirin") 180.16 >>> get_weight(2244) # Same as above using CID 180.16
- ChemInformant.cheminfo_api.get_formula(id_: str | int) str | None[source]ο
Get the molecular formula of a compound.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Molecular formula string, or None if compound not found
Examples
>>> get_formula("aspirin") 'C9H8O4' >>> get_formula("water") 'H2O'
- ChemInformant.cheminfo_api.get_cas(id_: str | int) str | None[source]ο
Get the CAS Registry Number of a compound.
CAS (Chemical Abstracts Service) numbers are unique identifiers assigned to chemical substances by the American Chemical Society.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
CAS Registry Number as string, or None if not found
Examples
>>> get_cas("aspirin") '50-78-2' >>> get_cas("water") '7732-18-5'
- ChemInformant.cheminfo_api.get_iupac_name(id_: str | int) str | None[source]ο
Get the IUPAC (systematic) name of a compound.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
IUPAC name string, or None if compound not found
Examples
>>> get_iupac_name("aspirin") '2-acetyloxybenzoic acid' >>> get_iupac_name("water") 'oxidane'
SMILES and Identifiers
- ChemInformant.cheminfo_api.get_canonical_smiles(id_: str | int) str | None[source]ο
Get the canonical SMILES representation of a compound.
Canonical SMILES provide a unique string representation of molecular structure with consistent atom ordering and standardized conventions.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Canonical SMILES string, or None if compound not found
Examples
>>> get_canonical_smiles("aspirin") 'CC(=O)OC1=CC=CC=C1C(=O)O' >>> get_canonical_smiles(2244) 'CC(=O)OC1=CC=CC=C1C(=O)O'
- ChemInformant.cheminfo_api.get_isomeric_smiles(id_: str | int) str | None[source]ο
Get the isomeric SMILES representation of a compound.
Isomeric SMILES include stereochemical information and isotope specifications, providing more detailed structural information than canonical SMILES.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Isomeric SMILES string, or None if compound not found
Examples
>>> get_isomeric_smiles("glucose") 'C([C@@H]1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O)O'
- ChemInformant.cheminfo_api.get_inchi(id_: str | int) str | None[source]ο
Get the InChI (International Chemical Identifier) of a compound.
InChI is a standardized string representation developed by IUPAC for uniquely identifying chemical substances across databases.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
InChI string, or None if compound not found
Examples
>>> get_inchi("aspirin") 'InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)'
- ChemInformant.cheminfo_api.get_inchi_key(id_: str | int) str | None[source]ο
Get the InChI Key (hashed version of InChI) of a compound.
InChI Key is a fixed-length (27 character) hash of the InChI, designed for database searching and web queries.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
InChI Key string, or None if compound not found
Examples
>>> get_inchi_key("aspirin") 'BSYNRYMUTXBXSQ-UHFFFAOYSA-N'
Molecular Descriptors
- ChemInformant.cheminfo_api.get_xlogp(id_: str | int) float | None[source]ο
Get the XLogP value (octanol-water partition coefficient) of a compound.
XLogP is a key descriptor for drug discovery, indicating lipophilicity and membrane permeability. Values typically range from -3 to +10.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
XLogP value (log units), or None if compound not found
Examples
>>> get_xlogp("aspirin") 1.2 >>> get_xlogp("water") -0.7
- ChemInformant.cheminfo_api.get_tpsa(id_: str | int) float | None[source]ο
Get the Topological Polar Surface Area (TPSA) of a compound.
TPSA is a key descriptor for drug discovery, predicting membrane permeability and blood-brain barrier penetration. Values < 90 Ε² suggest good oral bioavailability.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
TPSA value in Ε² (square Angstroms), or None if compound not found
Examples
>>> get_tpsa("aspirin") 63.6
- ChemInformant.cheminfo_api.get_complexity(id_: str | int) float | None[source]ο
Get the molecular complexity score of a compound.
Complexity is a measure of structural intricacy based on symmetry, branching, and ring systems. Higher values indicate more complex structures.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Complexity score (unitless), or None if compound not found
Examples
>>> get_complexity("aspirin") 212
Mass Properties
- ChemInformant.cheminfo_api.get_exact_mass(id_: str | int) float | None[source]ο
Get the exact mass of a compound.
Exact mass is the sum of atomic masses using the most abundant isotopes. Used in mass spectrometry for precise compound identification.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Exact mass in Da (atomic mass units), or None if compound not found
Examples
>>> get_exact_mass("aspirin") 180.04225873
- ChemInformant.cheminfo_api.get_monoisotopic_mass(id_: str | int) float | None[source]ο
Get the monoisotopic mass of a compound.
Monoisotopic mass is calculated using the most abundant isotope of each element. Important for mass spectrometry and structural analysis.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Monoisotopic mass in Da, or None if compound not found
Examples
>>> get_monoisotopic_mass("aspirin") 180.04225873
Molecular Counts
- ChemInformant.cheminfo_api.get_h_bond_donor_count(id_: str | int) int | None[source]ο
Get the number of hydrogen bond donors in a compound.
Counts atoms that can donate hydrogen bonds (typically N, O with H). Important for drug design and predicting molecular interactions.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Number of H-bond donors, or None if compound not found
Examples
>>> get_h_bond_donor_count("aspirin") 1
- ChemInformant.cheminfo_api.get_h_bond_acceptor_count(id_: str | int) int | None[source]ο
Get the number of hydrogen bond acceptors in a compound.
Counts atoms that can accept hydrogen bonds (typically N, O). Key descriptor for drug-like properties and solubility prediction.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Number of H-bond acceptors, or None if compound not found
Examples
>>> get_h_bond_acceptor_count("aspirin") 4
- ChemInformant.cheminfo_api.get_rotatable_bond_count(id_: str | int) int | None[source]ο
Get the number of rotatable bonds in a compound.
Rotatable bonds are acyclic single bonds between non-terminal heavy atoms. Indicates molecular flexibility, important for drug binding and bioavailability.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Number of rotatable bonds, or None if compound not found
Examples
>>> get_rotatable_bond_count("aspirin") 3
- ChemInformant.cheminfo_api.get_heavy_atom_count(id_: str | int) int | None[source]ο
Get the number of heavy atoms (non-hydrogen atoms) in a compound.
Heavy atoms include all atoms except hydrogen. This is a basic measure of molecular size and complexity.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Number of heavy atoms, or None if compound not found
Examples
>>> get_heavy_atom_count("aspirin") 13
- ChemInformant.cheminfo_api.get_charge(id_: str | int) int | None[source]ο
Get the formal charge of a compound.
The total formal charge of the molecule, indicating whether itβs neutral (0), positively charged (+), or negatively charged (-).
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Formal charge (integer), or None if compound not found
Examples
>>> get_charge("aspirin") 0
Stereochemistry
- ChemInformant.cheminfo_api.get_atom_stereo_count(id_: str | int) int | None[source]ο
Get the number of stereocenters (chiral centers) in a compound.
Counts atoms with defined stereochemistry, important for understanding the three-dimensional structure and potential biological activity.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Number of stereocenters, or None if compound not found
Examples
>>> get_atom_stereo_count("glucose") 4
- ChemInformant.cheminfo_api.get_bond_stereo_count(id_: str | int) int | None[source]ο
Get the number of stereo bonds (E/Z double bonds) in a compound.
Counts double bonds with defined stereochemistry (cis/trans or E/Z). Important for understanding molecular geometry and reactivity.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Number of stereo bonds, or None if compound not found
Examples
>>> get_bond_stereo_count("retinol") 4
- ChemInformant.cheminfo_api.get_covalent_unit_count(id_: str | int) int | None[source]ο
Get the number of covalently bonded units in a compound.
For most organic molecules this is 1. Higher values indicate multiple separate molecular components or fragments.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Number of covalent units, or None if compound not found
Examples
>>> get_covalent_unit_count("aspirin") 1
Synonyms and Names
- ChemInformant.cheminfo_api.get_synonyms(id_: str | int) list[str][source]ο
Get all known synonyms (alternative names) for a compound.
Returns a comprehensive list of names including common names, brand names, systematic names, and other identifiers used for the compound.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
List of synonym strings, empty list if compound not found
Examples
>>> synonyms = get_synonyms("aspirin") >>> print(synonyms[:3]) # First few names ['aspirin', 'acetylsalicylic acid', '2-acetyloxybenzoic acid']
Visualization Functions
- ChemInformant.cheminfo_api.draw_compound(identifier: str | int) None[source]ο
Draw the 2D chemical structure of a compound.
Fetches the structure image from PubChem with full rate-limiting, retry logic, and caching, then displays it with matplotlib.
- Parameters:
identifier β A compound identifier (name, CID, or SMILES).
- Raises:
NotFoundError β If the identifier cannot be resolved to a valid compound.
AmbiguousIdentifierError β If the identifier maps to multiple CIDs.
ImportError β If the optional plotting dependencies (
matplotlib,Pillow) are not installed.RuntimeError β If the PubChem structure endpoint returns a non-success status or if the request fails after all retries.