Caching Guideο
This guide provides a comprehensive overview of the caching mechanism in ChemInformant. You will learn how to use the setup_cache()
function to enable, configure, and manage the cache, significantly improving script performance and ensuring reproducible results.
β
1. Why Caching is Essentialο
When interacting with web APIs like PubChem, you may encounter rate limits or temporary server issues (like 503 Service Unavailable
errors). A persistent cache solves these problems by storing API responses on your local disk.
Speed: Subsequent identical requests are served instantly from the cache, reducing execution time from seconds to milliseconds.
Reliability: It makes your scripts robust against temporary network failures or API downtime.
Reproducibility: Your analysis scripts can be re-run offline, yielding the exact same data.
β
2. Quick Startο
Enabling the cache is straightforward. Simply call the setup_cache()
function once at the beginning of your script.
import ChemInformant as ci
# Call this once to enable the default 7-day cache
ci.setup_cache()
# All subsequent API calls will now automatically use the cache
info = ci.get_compound("Aspirin")
3. Core Configuration: Understanding setup_cache()
ο
The setup_cache()
function provides a rich set of parameters, allowing you to fine-tune caching behavior.
- ChemInformant.api_helpers.setup_cache(cache_name: str = 'pubchem_cache', backend: str = 'sqlite', expire_after: int = 604800, **kw: Any) None [source]
Configures and initializes the persistent cache for API requests.
This function sets up a
requests_cache
session that will store API responses on disk, significantly speeding up repeated queries and reducing network traffic.- Parameters:
cache_name (str) β The name of the cache file (without extension).
backend (str) β The cache backend to use (e.g., βsqliteβ, βredisβ).
expire_after (int) β Cache expiration time in seconds. Defaults to one week.
kw β Additional keyword arguments passed directly to the
requests_cache.CachedSession
constructor.
3.1. expire_after: Setting Cache Freshnessο
This parameter controls how long cached responses remain valid. You can provide the value in several convenient formats.
from datetime import timedelta
import ChemInformant as ci
# Method 1: Use a human-readable string ("d" for day, "h" for hour)
ci.setup_cache(expire_after="1d")
# Method 2: Use seconds directly (e.g., 1 hour = 3600 seconds)
ci.setup_cache(expire_after=3600)
# Special values: -1 means the cache never expires, 0 disables caching
ci.setup_cache(expire_after=-1)
3.2. backend: Choosing a Storage Backendο
You can choose the most suitable storage backend for your use case.
Backend |
Dependencies |
Use Case |
---|---|---|
|
(none) |
Default option. The most versatile file-based cache, ideal for single-machine environments. |
|
(none) |
In-memory cache. The fastest option, but the cache is discarded when the script finishes. |
|
|
Use when you need to share a cache across multiple processes or machines. |
β
4. Managing Your Cacheο
4.1. Clearing the Cacheο
Manual Deletion: The most direct way is to delete the entire cache directory.
# On Linux/macOS rm -rf ~/.data/cheminformant/cache/
Programmatic Clearing: Get the session object via get_session() and call its clear method.
session = ci.get_session() session.cache.clear() # This will delete all cached entries
4.2. Temporarily Disabling the Cacheο
If you want to run a script without using the persistent cache, simply switch the backend to 'memory'
.
ci.setup_cache(backend="memory") # The cache for this session will be discarded on exit
β
5. Advanced Usage: Fine-Grained Path Control with PyStowο
For advanced users who need precise control over data storage locations, ChemInformant leverages the powerful PyStow library for path management. This allows you to easily redirect the cache directory to other disks or adhere to specific operating system standards.
5.1. Understanding How PyStow Worksο
PyStowβs core mission is to provide a unified, predictable, and configurable storage location for Python applications.
Default Behavior: It creates a
.data
directory in your userβs home folder and then organizes data in subdirectories named after the application (in this case, cheminformant). * Linux/macOS Default Path:~/.data/cheminformant/cache/
* Windows Default Path:%USERPROFILE%\.data\cheminformant\cache\
5.2. Customizing Paths via Environment Variablesο
This is the most flexible and recommended way to configure PyStow, as it requires no code changes.
Environment Variable |
Description & Effect |
---|---|
|
Recommended Usage. Sets a new base directory for ChemInformant only. This is the most precise and non-interfering way to configure paths.
Example: |
|
Global Configuration. Sets a new global base directory for all applications that use PyStow. A good choice if you want to centralize all your scientific data.
Example: |
|
Follow System Standards. Setting this variable to |
|
Modify Default Directory Name. Replaces the default |
5.3. Configuration Priorityο
PyStow follows a clear priority order to determine the final path:
The application-specific variable (
CHEMINFORMANT_HOME
) has the highest priority.If not set, the global variable (
PYSTOW_HOME
) is used.If neither is set, the default path is used (based on
~/.data
or the XDG specification).
Diving Deeper into PyStow
This guide only covers the most common features of PyStow as used in ChemInformant. If you want to explore more advanced use cases, such as dynamic configuration via the Python API, path lookups, and more, we highly recommend reading PyStowβs official GitHub repository.