Internal API Helpers (api_helpers)
This module serves as the low-level networking foundation for the ChemInformant library. It is responsible for handling direct communication with the PubChem API, encapsulating complexities such as HTTP session management, persistent caching, request rate limiting, and robust retry logic against network or server failures.
Functions within this module are considered internal implementation details and are not recommended for direct invocation by end-users. Users should interact with this library through the high-level functions defined in cheminfo_api
.
Note
Private functions within this module (those starting with an underscore, like _fetch_with_ratelimit_and_retry
) are not documented here as they represent internal implementation details.
—
Session and Cache Management
Functions for setting up and managing the cached HTTP session.
- ChemInformant.api_helpers.setup_cache(cache_name: str = 'pubchem_cache', backend: str = 'sqlite', expire_after: int = 604800, **kw: Any) None [source]
Configures and initializes the persistent cache for API requests.
This function sets up a
requests_cache
session that will store API responses on disk, significantly speeding up repeated queries and reducing network traffic.- Parameters:
cache_name (str) – The name of the cache file (without extension).
backend (str) – The cache backend to use (e.g., ‘sqlite’, ‘redis’).
expire_after (int) – Cache expiration time in seconds. Defaults to one week.
kw – Additional keyword arguments passed directly to the
requests_cache.CachedSession
constructor.
- ChemInformant.api_helpers.get_session() CachedSession [source]
Gets the current cached session, initializing it with defaults if necessary.
This ensures a single, consistent session is used for all API calls.
- Returns:
The active
requests_cache.CachedSession
instance.- Return type:
requests_cache.CachedSession
—
Data Fetching Functions
These functions wrap specific PubChem API endpoints to retrieve data.
- ChemInformant.api_helpers.get_cids_by_name(name: str) List[int] | None [source]
Fetches PubChem Compound IDs (CIDs) for a given chemical name.
This function searches PubChem’s database for compounds matching the provided chemical name. It can return multiple CIDs if the name matches multiple compounds.
- Parameters:
name – The chemical name to search for (e.g., “aspirin”, “acetylsalicylic acid”)
- Returns:
List of integer CIDs matching the name, or None if not found
Examples
>>> get_cids_by_name("aspirin") [2244] >>> get_cids_by_name("glucose") [5793, 64689, ...] # Multiple isomers/forms
Note
This function is used internally by get_properties() for name-to-CID resolution. End users should typically use get_properties() instead.
- ChemInformant.api_helpers.get_cids_by_smiles(smiles: str) List[int] | None [source]
Fetches PubChem Compound IDs (CIDs) for a given SMILES string.
This function searches PubChem for compounds with structures matching the provided SMILES representation. May return multiple CIDs for stereoisomers or different representations of the same molecule.
- Parameters:
smiles – The SMILES string representing the molecule (e.g., “CC(=O)OC1=CC=CC=C1C(=O)O” for aspirin)
- Returns:
List of integer CIDs matching the SMILES, or None if not found
Examples
>>> get_cids_by_smiles("CC(=O)OC1=CC=CC=C1C(=O)O") [2244] >>> get_cids_by_smiles("CCO") # Ethanol [702]
Note
This function is used internally by get_properties() for SMILES-to-CID resolution. End users should typically use get_properties() instead.
- ChemInformant.api_helpers.get_batch_properties(cids: List[int], props: List[str]) Dict[int, Dict[str, Any]] [source]
Fetches multiple properties for a batch of CIDs in a single request, handling API pagination automatically.
This is the core function for efficient bulk property retrieval from PubChem. It automatically handles API pagination when dealing with large batches and includes rate limiting and retry logic for reliable data fetching.
- Parameters:
cids – List of compound IDs to query
props – List of property names in CamelCase format (e.g., [“MolecularWeight”, “XLogP”, “CanonicalSMILES”]) Must use exact PubChem API property names
- Returns:
Dictionary mapping each CID to its properties. CIDs with no data or failed lookups map to empty dictionaries. Properties are returned using the original CamelCase names from PubChem.
Examples
>>> get_batch_properties([2244, 702], ["MolecularWeight", "XLogP"]) {2244: {"CID": 2244, "MolecularWeight": 180.16, "XLogP": 1.2}, 702: {"CID": 702, "MolecularWeight": 46.07, "XLogP": -0.31}}
Note
This function is used internally by get_properties()
Uses PubChem’s CamelCase property names, not snake_case
Automatically handles pagination for requests with >1000 compounds
End users should use get_properties() which provides snake_case output
- ChemInformant.api_helpers.get_cas_for_cid(cid: int) str | None [source]
Fetches the primary CAS Registry Number for a single CID using the PUG-View endpoint.
This function accesses PubChem’s detailed compound records to extract CAS numbers, which are unique chemical identifiers assigned by the Chemical Abstracts Service. Uses the PUG-View API to parse the “Names and Identifiers” section.
- Parameters:
cid – The PubChem compound ID to look up
- Returns:
The first found CAS Registry Number as a string (e.g., “50-78-2”), or None if no CAS number is found
Examples
>>> get_cas_for_cid(2244) # Aspirin '50-78-2' >>> get_cas_for_cid(702) # Ethanol '64-17-5'
Note
This function is used internally by get_properties() and get_cas(). It may be slower than standard property queries as it accesses detailed compound records rather than the property API.
- ChemInformant.api_helpers.get_synonyms_for_cid(cid: int) List[str] [source]
Fetches all known synonyms (alternative names) for a given CID.
This function retrieves the comprehensive list of names associated with a compound, including common names, systematic names, brand names, and other identifiers from PubChem’s synonyms database.
- Parameters:
cid – The PubChem compound ID to look up
- Returns:
List of synonym strings in order of preference/frequency. Returns empty list if no synonyms are found.
Examples
>>> get_synonyms_for_cid(2244) # Aspirin ['aspirin', 'acetylsalicylic acid', '2-acetyloxybenzoic acid', ...] >>> get_synonyms_for_cid(702) # Ethanol ['ethanol', 'ethyl alcohol', 'grain alcohol', ...]
Note
This function is used internally by get_properties() and get_synonyms(). The first synonym in the list is typically the preferred/most common name.