Caching Guide

This guide provides a comprehensive overview of the caching mechanism in ChemInformant. You will learn how to use the setup_cache() function to enable, configure, and manage the cache, significantly improving script performance and ensuring reproducible results.

β€”

1. Why Caching is Essential

When interacting with web APIs like PubChem, you may encounter rate limits or temporary server issues (like 503 Service Unavailable errors). A persistent cache solves these problems by storing API responses on your local disk.

  • Speed: Subsequent identical requests are served instantly from the cache, reducing execution time from seconds to milliseconds.

  • Reliability: It makes your scripts robust against temporary network failures or API downtime.

  • Reproducibility: Your analysis scripts can be re-run offline, yielding the exact same data.

β€”

2. Quick Start

Enabling the cache is straightforward. Simply call the setup_cache() function once at the beginning of your script.

Enabling the default cache
import ChemInformant as ci

# Call this once to enable the default 7-day cache
ci.setup_cache()

# All subsequent API calls will now automatically use the cache
info = ci.get_compound("Aspirin")

3. Core Configuration: Understanding setup_cache()

The setup_cache() function provides a rich set of parameters, allowing you to fine-tune caching behavior.

ChemInformant.api_helpers.setup_cache(cache_name: str = 'pubchem_cache', backend: str = 'sqlite', expire_after: int = 604800, **kw: Any) None[source]

Configures and initializes the persistent cache for API requests.

This function sets up a requests_cache session that will store API responses on disk, significantly speeding up repeated queries and reducing network traffic.

Parameters:
  • cache_name (str) – The name of the cache file (without extension).

  • backend (str) – The cache backend to use (e.g., β€˜sqlite’, β€˜redis’).

  • expire_after (int) – Cache expiration time in seconds. Defaults to one week.

  • kw – Additional keyword arguments passed directly to the requests_cache.CachedSession constructor.

3.1. expire_after: Setting Cache Freshness

This parameter controls how long cached responses remain valid. You can provide the value in several convenient formats.

Different ways to set expiration
from datetime import timedelta
import ChemInformant as ci

# Method 1: Use a human-readable string ("d" for day, "h" for hour)
ci.setup_cache(expire_after="1d")

# Method 2: Use seconds directly (e.g., 1 hour = 3600 seconds)
ci.setup_cache(expire_after=3600)

# Special values: -1 means the cache never expires, 0 disables caching
ci.setup_cache(expire_after=-1)

3.2. backend: Choosing a Storage Backend

You can choose the most suitable storage backend for your use case.

Available Cache Backends

Backend

Dependencies

Use Case

'sqlite'

(none)

Default option. The most versatile file-based cache, ideal for single-machine environments.

'memory'

(none)

In-memory cache. The fastest option, but the cache is discarded when the script finishes.

'redis'

redis

Use when you need to share a cache across multiple processes or machines.

β€”

4. Managing Your Cache

4.1. Clearing the Cache

  • Manual Deletion: The most direct way is to delete the entire cache directory.

    # On Linux/macOS
    rm -rf ~/.data/cheminformant/cache/
    
  • Programmatic Clearing: Get the session object via get_session() and call its clear method.

    session = ci.get_session()
    session.cache.clear() # This will delete all cached entries
    

4.2. Temporarily Disabling the Cache

If you want to run a script without using the persistent cache, simply switch the backend to 'memory'.

ci.setup_cache(backend="memory") # The cache for this session will be discarded on exit

β€”

5. Advanced Usage: Fine-Grained Path Control with PyStow

For advanced users who need precise control over data storage locations, ChemInformant leverages the powerful PyStow library for path management. This allows you to easily redirect the cache directory to other disks or adhere to specific operating system standards.

5.1. Understanding How PyStow Works

PyStow’s core mission is to provide a unified, predictable, and configurable storage location for Python applications.

  • Default Behavior: It creates a .data directory in your user’s home folder and then organizes data in subdirectories named after the application (in this case, cheminformant). * Linux/macOS Default Path: ~/.data/cheminformant/cache/ * Windows Default Path: %USERPROFILE%\.data\cheminformant\cache\

5.2. Customizing Paths via Environment Variables

This is the most flexible and recommended way to configure PyStow, as it requires no code changes.

Core PyStow Environment Variables

Environment Variable

Description & Effect

CHEMINFORMANT_HOME

Recommended Usage. Sets a new base directory for ChemInformant only. This is the most precise and non-interfering way to configure paths. Example: export CHEMINFORMANT_HOME=/mnt/ssd/data

PYSTOW_HOME

Global Configuration. Sets a new global base directory for all applications that use PyStow. A good choice if you want to centralize all your scientific data. Example: export PYSTOW_HOME=/data/

PYSTOW_USE_APPDIRS

Follow System Standards. Setting this variable to true makes PyStow follow the XDG Base Directory Specification. - Linux/macOS: The cache will be stored in ~/.cache/cheminformant/ - Windows: The cache will be stored in %LOCALAPPDATA%\ChemInformant\Cache\

PYSTOW_NAME

Modify Default Directory Name. Replaces the default .data directory name with one you specify. Example: export PYSTOW_NAME=.my_apps_data

5.3. Configuration Priority

PyStow follows a clear priority order to determine the final path:

  1. The application-specific variable (CHEMINFORMANT_HOME) has the highest priority.

  2. If not set, the global variable (PYSTOW_HOME) is used.

  3. If neither is set, the default path is used (based on ~/.data or the XDG specification).

Diving Deeper into PyStow

This guide only covers the most common features of PyStow as used in ChemInformant. If you want to explore more advanced use cases, such as dynamic configuration via the Python API, path lookups, and more, we highly recommend reading PyStow’s official GitHub repository.