Usage Guide#
This guide demonstrates how to use the ChemInformant
library to retrieve chemical information from PubChem easily and robustly.
Importing the Library#
The recommended way to import ChemInformant
is using the alias ci
:
import ChemInformant as ci
Retrieving Compound Information#
The primary function for retrieving comprehensive data is ci.info()
. You can provide either a compound name (string) or a PubChem CID (integer).
By Name:
try:
# Retrieve data for Aspirin by its common name
aspirin = ci.info("Aspirin")
print(f"Successfully retrieved data for CID: {aspirin.cid}")
# Expected output: Successfully retrieved data for CID: 2244
except ci.NotFoundError:
print("Aspirin not found.")
except ci.AmbiguousIdentifierError as e:
# This block would run if "Aspirin" mapped to multiple CIDs
print(f"Aspirin is ambiguous: {e.cids}")
By CID:
try:
# Retrieve data for Ethanol using its PubChem CID
ethanol = ci.info(702)
print(f"Successfully retrieved data for compound with formula: {ethanol.molecular_formula}")
# Expected output: Successfully retrieved data for compound with formula: C2H6O
except ci.NotFoundError:
print("CID 702 not found.")
# AmbiguousIdentifierError is not expected for CID lookups,
# but other errors (network, etc.) could potentially occur.
except Exception as e:
print(f"An unexpected error occurred: {e}")
Accessing Retrieved Data#
The ci.info()
function returns a CompoundData
object, which is a Pydantic model. This means the data is structured, validated, and easily accessible via attributes.
If a specific piece of information couldn’t be fetched or doesn’t exist for a compound, the corresponding attribute will usually be None
(or an empty list []
for synonyms
).
# Assuming 'aspirin' is the CompoundData object from the previous example
if aspirin:
print(f"CID: {aspirin.cid}")
print(f"Input Identifier Used: {aspirin.input_identifier}") # Shows what you passed to info()
print(f"Common Name: {aspirin.common_name}") # Often the input name or first synonym
print(f"CAS: {aspirin.cas}")
print(f"UNII: {aspirin.unii}")
print(f"Molecular Formula: {aspirin.molecular_formula}")
# Molecular weight is automatically converted to float or None
print(f"Molecular Weight: {aspirin.molecular_weight}")
print(f"Canonical SMILES: {aspirin.canonical_smiles}")
print(f"IUPAC Name: {aspirin.iupac_name}")
print(f"Description: {aspirin.description}")
print(f"Synonyms (first 5): {aspirin.synonyms[:5]}")
# Access the computed PubChem URL
print(f"PubChem URL: {aspirin.pubchem_url}")
Handling Potential Errors#
ChemInformant
raises specific exceptions for common scenarios, allowing you to handle them gracefully:
NotFoundError
: Raised when the provided identifier (name or CID) cannot be found in PubChem.AmbiguousIdentifierError
: Raised only when a provided name maps to multiple PubChem CIDs. The error object has an attributecids
containing the list of potential matches.
It’s good practice to wrap calls, especially those using names, in try...except
blocks:
identifier = "glucose" # This name is often ambiguous
try:
compound_data = ci.info(identifier)
print(f"Found {compound_data.common_name} (CID: {compound_data.cid})")
except ci.NotFoundError:
print(f"Identifier '{identifier}' was not found.")
except ci.AmbiguousIdentifierError as e:
print(f"Identifier '{identifier}' is ambiguous. Potential CIDs: {e.cids}")
# Example: Decide how to proceed, e.g., query the first potential CID
try:
first_cid_info = ci.info(e.cids[0])
print(f"Info for first ambiguous CID ({e.cids[0]}): {first_cid_info.iupac_name}")
except ci.NotFoundError:
print(f"Could not retrieve info for CID {e.cids[0]}")
except Exception as e:
# Catch other potential issues like network errors, validation errors
print(f"An unexpected error occurred: {e}")
Using Convenience Functions#
For quickly retrieving just a single piece of information, ChemInformant
provides several convenience functions (like ci.cas()
, ci.wgt()
, ci.syn()
, etc.).
These functions are essentially wrappers around ci.info()
but simplify error handling:
* They return the requested value upon success.
* They return None
if the compound is not found, the name is ambiguous, or the specific property is missing/couldn’t be fetched.
* ci.syn()
returns an empty list []
in case of failure.
# Get CAS for Aspirin by name
aspirin_cas = ci.cas("Aspirin")
print(f"Aspirin CAS: {aspirin_cas}")
# Expected output: Aspirin CAS: 50-78-2
# Get weight for Ethanol by CID
ethanol_weight = ci.wgt(702)
print(f"Ethanol Weight: {ethanol_weight}")
# Expected output: Ethanol Weight: 46.07
# Get synonyms for water by name
water_synonyms = ci.syn("Water")
print(f"Water Synonyms (first 3): {water_synonyms[:3]}")
# Expected output: Water Synonyms (first 3): ['Water', 'H2O', ...]
# Example of failure (NotFound) - returns None
notfound_cas = ci.cas("NonExistentCompound")
print(f"CAS for NonExistentCompound: {notfound_cas}")
# Expected output: CAS for NonExistentCompound: None
# Example of failure (Ambiguous) - returns None
ambiguous_weight = ci.wgt("glucose")
print(f"Weight for glucose: {ambiguous_weight}")
# Expected output: Weight for glucose: None
Batch Data Retrieval#
To efficiently retrieve data for multiple compounds, use ci.get_multiple_compounds()
. This function optimizes lookups by using PubChem’s batch API capabilities where possible and integrating with the cache.
It accepts a list containing a mix of compound names (str) and CIDs (int). It returns a dictionary where: * Keys: Are the original identifiers you provided in the input list. * Values: Are either:
A
CompoundData
object if the lookup for that identifier was successful.An
Exception
object (e.g.,NotFoundError
,AmbiguousIdentifierError
,ValueError
for invalid input, or potentially network errors) if the lookup failed for that specific identifier.
identifiers_list = ["Water", 2244, "NonExistent", "glucose", -5, 702] # Mix of names, CIDs, invalid inputs
batch_results = ci.get_multiple_compounds(identifiers_list)
print(f"--- Batch Results ({len(batch_results)} entries) ---")
for identifier, result in batch_results.items():
print(f"Identifier: {repr(identifier)}") # Use repr() to see type clearly
if isinstance(result, ci.CompoundData):
print(f" Result: Success! CID={result.cid}, Formula={result.molecular_formula}")
elif isinstance(result, ci.NotFoundError):
print(f" Result: Failed - Not Found")
elif isinstance(result, ci.AmbiguousIdentifierError):
print(f" Result: Failed - Ambiguous (CIDs: {result.cids})")
elif isinstance(result, ValueError):
print(f" Result: Failed - Invalid Input ({result})")
else:
# Catch other potential errors like network issues during batch fetch
print(f" Result: Failed - Unexpected Error ({type(result).__name__}: {result})")
print("--- End of Batch Results ---")
Caching API Responses#
A core feature of ChemInformant
is its built-in automatic caching, powered by requests-cache
.
Default Behavior: API responses are automatically cached to a SQLite database (
pubchem_cache.sqlite
in your current working directory). Cached entries expire after 7 days by default. This dramatically speeds up subsequent requests for the same information and improves resilience to temporary network problems.Configuration: You can customize the caching behavior (e.g., change the cache location, backend, or expiration time) using
ci.setup_cache()
. Important: Callsetup_cache()
before making any otherChemInformant
calls if you want to change the defaults.
import ChemInformant as ci
import tempfile
import os
import time
# --- Example 1: Use an in-memory cache (fast, but lost when script ends) ---
print("Configuring in-memory cache...")
ci.setup_cache(backend='memory', expire_after=60) # Cache for 60 seconds
start_time = time.time()
water_info1 = ci.info("Water")
print(f"First call took: {time.time() - start_time:.4f}s")
start_time = time.time()
water_info2 = ci.info("Water") # Should be faster
print(f"Second call (cached) took: {time.time() - start_time:.4f}s")
print("-" * 20)
# --- Example 2: Use a specific file and longer expiry ---
# Must call setup_cache again to change settings
temp_dir = tempfile.gettempdir()
cache_file = os.path.join(temp_dir, "my_chem_cache")
print(f"Configuring file cache: {cache_file}.sqlite")
ci.setup_cache(cache_name=cache_file, backend='sqlite', expire_after=3600) # 1 hour
start_time = time.time()
aspirin_info1 = ci.info("Aspirin")
print(f"First call took: {time.time() - start_time:.4f}s")
start_time = time.time()
aspirin_info2 = ci.info("Aspirin") # Should be faster
print(f"Second call (cached) took: {time.time() - start_time:.4f}s")
Further Information#
For detailed information on specific functions and the CompoundData
model, please refer to the API Reference documentation.