Main API Interface (cheminfo_api)ο
This module is the main entry point for all user interactions. It is designed around two core philosophies to cater to different use cases:
Bulk Data Retrieval (Engine): The
get_properties()
function is the workhorse of this library. It achieves maximum efficiency by fetching multiple properties for multiple compounds in a single, consolidated API call. It returns a Pandas DataFrame, making it ideal for data analysis, scripting, and integration with the scientific computing ecosystem (e.g., RDKit, Scikit-learn).Convenience and Object-Oriented Access (Interface): A series of
get_<property>()
functions provide direct access to individual data points. For scenarios requiring comprehensive, type-safe data,get_compound()
returns a fully validated PydanticCompound
object.
Core Bulk Processing Function
- ChemInformant.cheminfo_api.get_properties(identifiers: int | str | List[int] | List[str], properties: str | List[str] | None = None, *, namespace: str = 'cid', include_3d: bool = False, all_properties: bool = False, **kwargs) DataFrame [source]ο
Retrieve chemical properties for one or more compounds from PubChem.
This function is the core interface for fetching molecular properties. It accepts various types of chemical identifiers and returns data in a standardized snake_case format with consistent column ordering and error handling.
- Parameters:
identifiers β Chemical identifier(s) to look up. Can be: - Single identifier: string name, CID number, or SMILES - List of identifiers: mixed types allowed Examples: βaspirinβ, 2244, βCC(=O)OC1=CC=CC=C1C(=O)Oβ
properties β Specific properties to retrieve. Can be: - None: Returns core property set (default) - String: Single property name or comma-separated list - List: Multiple property names Supports both snake_case (βmolecular_weightβ) and CamelCase (βMolecularWeightβ)
namespace β Input identifier namespace (currently only βcidβ supported)
include_3d β If True and properties=None, includes 3D molecular descriptors in addition to core properties. Ignored when properties is specified.
all_properties β If True, retrieves all ~40 available properties from PubChem. Mutually exclusive with properties and include_3d parameters.
**kwargs β Additional keyword arguments (for future compatibility)
- Returns:
- Results with columns:
input_identifier: Original input as provided
cid: PubChem Compound ID (string type)
status: βOKβ for success, exception name for failures
[property columns]: Requested properties in snake_case format
Column order preserves the original properties parameter order. Failed lookups return rows with status != βOKβ and missing property values.
- Return type:
pd.DataFrame
- Raises:
ValueError β If unsupported properties are requested, or if all_properties=True is used with other property selection parameters
- Property Categories:
- Core properties (default): molecular_weight, molecular_formula, canonical_smiles,
isomeric_smiles, iupac_name, cas, synonyms, xlogp, tpsa, complexity, h_bond_donor_count, h_bond_acceptor_count, rotatable_bond_count, heavy_atom_count, charge, atom_stereo_count, bond_stereo_count, covalent_unit_count, in_ch_i, in_ch_i_key
3D properties: volume_3d, feature_count_3d, conformer_count_3d, etc.
Special properties: cas (CAS Registry Number), synonyms (list of names)
Examples
>>> # Get core properties for a single compound >>> df = get_properties("aspirin") >>> print(df.columns.tolist()) ['input_identifier', 'cid', 'status', 'molecular_weight', ...]
>>> # Get specific properties for multiple compounds >>> df = get_properties(["aspirin", "caffeine"], properties=["molecular_weight", "xlogp"]) >>> print(df[["input_identifier", "molecular_weight", "xlogp"]])
>>> # Get all available properties >>> df = get_properties("aspirin", all_properties=True) >>> print(f"Retrieved {len(df.columns)} columns")
>>> # Include 3D descriptors with core set >>> df = get_properties("aspirin", include_3d=True)
>>> # Handle mixed input types and failures >>> df = get_properties([2244, "invalid_name", "CC(=O)O"]) >>> print(df[["input_identifier", "status"]])
Notes
Uses intelligent fallbacks for SMILES properties (canonical_smiles falls back to connectivity_smiles if canonical is unavailable)
Automatically handles API pagination for large batch requests
Results are cached to improve performance on repeated queries
All property names in output use snake_case format for consistency
CID column is returned as string type to handle large compound IDs
Object-Oriented Fetchers
- ChemInformant.cheminfo_api.get_compound(identifier: str | int) Compound [source]ο
Retrieve a complete Compound object with all available properties.
This function fetches all properties for a single compound and returns a structured Compound object with type validation and convenient access to all molecular data.
- Parameters:
identifier β Chemical identifier (name, CID, or SMILES)
- Returns:
Compound object with all available properties as attributes
- Raises:
RuntimeError β If the compound cannot be found or data retrieval fails
NotFoundError β If the identifier cannot be resolved
AmbiguousIdentifierError β If the identifier matches multiple compounds
Examples
>>> compound = get_compound("aspirin") >>> print(compound.MolecularWeight) 180.16 >>> print(compound.CanonicalSMILES) 'CC(=O)OC1=CC=CC=C1C(=O)O'
Note
This function uses CamelCase property names to match the Compound model. For DataFrame output with snake_case names, use get_properties() instead.
- ChemInformant.cheminfo_api.get_compounds(identifiers: Iterable[str | int]) List[Compound] [source]ο
Retrieve multiple Compound objects for a list of identifiers.
This function processes multiple chemical identifiers and returns a list of Compound objects. Failed lookups will raise exceptions.
- Parameters:
identifiers β Iterable of chemical identifiers (names, CIDs, or SMILES)
- Returns:
List of Compound objects in the same order as input identifiers
- Raises:
RuntimeError β If any compound cannot be found or data retrieval fails
NotFoundError β If any identifier cannot be resolved
AmbiguousIdentifierError β If any identifier matches multiple compounds
Examples
>>> compounds = get_compounds(["aspirin", "caffeine"]) >>> for comp in compounds: ... print(f"{comp.InputIdentifier}: {comp.MolecularWeight}") aspirin: 180.16 caffeine: 194.19
Note
For batch processing with error handling, consider using get_properties() which returns a DataFrame with status information for failed lookups.
Convenience Lookups
Basic Properties
- ChemInformant.cheminfo_api.get_weight(id_: str | int) float | None [source]ο
Get the molecular weight of a compound.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Molecular weight in g/mol, or None if compound not found
Examples
>>> get_weight("aspirin") 180.16 >>> get_weight(2244) # Same as above using CID 180.16
- ChemInformant.cheminfo_api.get_formula(id_: str | int) str | None [source]ο
Get the molecular formula of a compound.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Molecular formula string, or None if compound not found
Examples
>>> get_formula("aspirin") 'C9H8O4' >>> get_formula("water") 'H2O'
- ChemInformant.cheminfo_api.get_cas(id_: str | int) str | None [source]ο
Get the CAS Registry Number of a compound.
CAS (Chemical Abstracts Service) numbers are unique identifiers assigned to chemical substances by the American Chemical Society.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
CAS Registry Number as string, or None if not found
Examples
>>> get_cas("aspirin") '50-78-2' >>> get_cas("water") '7732-18-5'
- ChemInformant.cheminfo_api.get_iupac_name(id_: str | int) str | None [source]ο
Get the IUPAC (systematic) name of a compound.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
IUPAC name string, or None if compound not found
Examples
>>> get_iupac_name("aspirin") '2-acetyloxybenzoic acid' >>> get_iupac_name("water") 'oxidane'
SMILES and Identifiers
- ChemInformant.cheminfo_api.get_canonical_smiles(id_: str | int) str | None [source]ο
Get the canonical SMILES representation of a compound.
Canonical SMILES provide a unique string representation of molecular structure with consistent atom ordering and standardized conventions.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Canonical SMILES string, or None if compound not found
Examples
>>> get_canonical_smiles("aspirin") 'CC(=O)OC1=CC=CC=C1C(=O)O' >>> get_canonical_smiles(2244) 'CC(=O)OC1=CC=CC=C1C(=O)O'
- ChemInformant.cheminfo_api.get_isomeric_smiles(id_: str | int) str | None [source]ο
Get the isomeric SMILES representation of a compound.
Isomeric SMILES include stereochemical information and isotope specifications, providing more detailed structural information than canonical SMILES.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Isomeric SMILES string, or None if compound not found
Examples
>>> get_isomeric_smiles("glucose") 'C([C@@H]1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O)O'
- ChemInformant.cheminfo_api.get_inchi(id_: str | int) str | None [source]ο
Get the InChI (International Chemical Identifier) of a compound.
InChI is a standardized string representation developed by IUPAC for uniquely identifying chemical substances across databases.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
InChI string, or None if compound not found
Examples
>>> get_inchi("aspirin") 'InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)'
- ChemInformant.cheminfo_api.get_inchi_key(id_: str | int) str | None [source]ο
Get the InChI Key (hashed version of InChI) of a compound.
InChI Key is a fixed-length (27 character) hash of the InChI, designed for database searching and web queries.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
InChI Key string, or None if compound not found
Examples
>>> get_inchi_key("aspirin") 'BSYNRYMUTXBXSQ-UHFFFAOYSA-N'
Molecular Descriptors
- ChemInformant.cheminfo_api.get_xlogp(id_: str | int) float | None [source]ο
Get the XLogP value (octanol-water partition coefficient) of a compound.
XLogP is a key descriptor for drug discovery, indicating lipophilicity and membrane permeability. Values typically range from -3 to +10.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
XLogP value (log units), or None if compound not found
Examples
>>> get_xlogp("aspirin") 1.2 >>> get_xlogp("water") -0.7
- ChemInformant.cheminfo_api.get_tpsa(id_: str | int) float | None [source]ο
Get the Topological Polar Surface Area (TPSA) of a compound.
TPSA is a key descriptor for drug discovery, predicting membrane permeability and blood-brain barrier penetration. Values < 90 Ε² suggest good oral bioavailability.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
TPSA value in Ε² (square Angstroms), or None if compound not found
Examples
>>> get_tpsa("aspirin") 63.6
- ChemInformant.cheminfo_api.get_complexity(id_: str | int) float | None [source]ο
Get the molecular complexity score of a compound.
Complexity is a measure of structural intricacy based on symmetry, branching, and ring systems. Higher values indicate more complex structures.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Complexity score (unitless), or None if compound not found
Examples
>>> get_complexity("aspirin") 212
Mass Properties
- ChemInformant.cheminfo_api.get_exact_mass(id_: str | int) float | None [source]ο
Get the exact mass of a compound.
Exact mass is the sum of atomic masses using the most abundant isotopes. Used in mass spectrometry for precise compound identification.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Exact mass in Da (atomic mass units), or None if compound not found
Examples
>>> get_exact_mass("aspirin") 180.04225873
- ChemInformant.cheminfo_api.get_monoisotopic_mass(id_: str | int) float | None [source]ο
Get the monoisotopic mass of a compound.
Monoisotopic mass is calculated using the most abundant isotope of each element. Important for mass spectrometry and structural analysis.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Monoisotopic mass in Da, or None if compound not found
Examples
>>> get_monoisotopic_mass("aspirin") 180.04225873
Molecular Counts
- ChemInformant.cheminfo_api.get_h_bond_donor_count(id_: str | int) int | None [source]ο
Get the number of hydrogen bond donors in a compound.
Counts atoms that can donate hydrogen bonds (typically N, O with H). Important for drug design and predicting molecular interactions.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Number of H-bond donors, or None if compound not found
Examples
>>> get_h_bond_donor_count("aspirin") 1
- ChemInformant.cheminfo_api.get_h_bond_acceptor_count(id_: str | int) int | None [source]ο
Get the number of hydrogen bond acceptors in a compound.
Counts atoms that can accept hydrogen bonds (typically N, O). Key descriptor for drug-like properties and solubility prediction.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Number of H-bond acceptors, or None if compound not found
Examples
>>> get_h_bond_acceptor_count("aspirin") 4
- ChemInformant.cheminfo_api.get_rotatable_bond_count(id_: str | int) int | None [source]ο
Get the number of rotatable bonds in a compound.
Rotatable bonds are acyclic single bonds between non-terminal heavy atoms. Indicates molecular flexibility, important for drug binding and bioavailability.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Number of rotatable bonds, or None if compound not found
Examples
>>> get_rotatable_bond_count("aspirin") 3
- ChemInformant.cheminfo_api.get_heavy_atom_count(id_: str | int) int | None [source]ο
Get the number of heavy atoms (non-hydrogen atoms) in a compound.
Heavy atoms include all atoms except hydrogen. This is a basic measure of molecular size and complexity.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Number of heavy atoms, or None if compound not found
Examples
>>> get_heavy_atom_count("aspirin") 13
- ChemInformant.cheminfo_api.get_charge(id_: str | int) int | None [source]ο
Get the formal charge of a compound.
The total formal charge of the molecule, indicating whether itβs neutral (0), positively charged (+), or negatively charged (-).
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Formal charge (integer), or None if compound not found
Examples
>>> get_charge("aspirin") 0
Stereochemistry
- ChemInformant.cheminfo_api.get_atom_stereo_count(id_: str | int) int | None [source]ο
Get the number of stereocenters (chiral centers) in a compound.
Counts atoms with defined stereochemistry, important for understanding the three-dimensional structure and potential biological activity.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Number of stereocenters, or None if compound not found
Examples
>>> get_atom_stereo_count("glucose") 4
- ChemInformant.cheminfo_api.get_bond_stereo_count(id_: str | int) int | None [source]ο
Get the number of stereo bonds (E/Z double bonds) in a compound.
Counts double bonds with defined stereochemistry (cis/trans or E/Z). Important for understanding molecular geometry and reactivity.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Number of stereo bonds, or None if compound not found
Examples
>>> get_bond_stereo_count("retinol") 4
- ChemInformant.cheminfo_api.get_covalent_unit_count(id_: str | int) int | None [source]ο
Get the number of covalently bonded units in a compound.
For most organic molecules this is 1. Higher values indicate multiple separate molecular components or fragments.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
Number of covalent units, or None if compound not found
Examples
>>> get_covalent_unit_count("aspirin") 1
Synonyms and Names
- ChemInformant.cheminfo_api.get_synonyms(id_: str | int) List[str] [source]ο
Get all known synonyms (alternative names) for a compound.
Returns a comprehensive list of names including common names, brand names, systematic names, and other identifiers used for the compound.
- Parameters:
id β Chemical identifier (name, CID, or SMILES)
- Returns:
List of synonym strings, empty list if compound not found
Examples
>>> synonyms = get_synonyms("aspirin") >>> print(synonyms[:3]) # First few names ['aspirin', 'acetylsalicylic acid', '2-acetyloxybenzoic acid']
Visualization Functions
- ChemInformant.cheminfo_api.draw_compound(identifier: str | int)[source]ο
Draw the 2D chemical structure of a compound.
This function fetches the chemical structure image from PubChem and displays it using matplotlib. Requires matplotlib and PIL to be installed.
- Parameters:
identifier β A compound identifier (name, CID, or SMILES)
- Raises:
NotFoundError β If the identifier cannot be resolved to a valid compound
ImportError β If required dependencies (matplotlib, PIL) are not installed