Core API Reference

Main Functions

process_references()

The primary function for processing citations.

Signature:

def process_references(
    input_content: str,
    input_type: str,
    template_name: str,
    output_format: str,
    interactive_callback: Callable[[List[Dict]], int]
) -> Dict[str, Any]

Parameters:

  • input_content (str): The reference content to process

  • input_type (str): Type of input - “txt” or “bib” (required)

  • template_name (str): Template name to use (e.g., “journal_article_full”) (required)

  • output_format (str): Output format - “bibtex”, “apa”, or “mla” (required)

  • interactive_callback (Callable): Function to handle ambiguous matches. Takes a list of candidate dicts and returns the selected index (0-based), or -1 to skip (required)

Returns:

A dictionary with keys:

  • results (List[str]): List of formatted citation strings

  • report (dict): Processing report containing:

    • total (int): Total number of entries processed

    • succeeded (int): Number of successfully processed entries

    • failed_entries (List[Dict]): List of failed entries with error details

Example:

from onecite import process_references

result = process_references(
    input_content="10.1038/nature14539",
    input_type="txt",
    template_name="journal_article_full",
    output_format="bibtex",
    interactive_callback=lambda candidates: 0  # Auto-select first match
)

# Access results
for citation in result['results']:
    print(citation)

# Check report
print(f"Succeeded: {result['report']['succeeded']}/{result['report']['total']}")

Data Classes

RawEntry

A TypedDict representing an unprocessed reference entry (Stage 1).

Type Definition:

class RawEntry(TypedDict, total=False):
    id: int
    raw_text: str
    doi: Optional[str]
    url: Optional[str]
    query_string: Optional[str]
    original_entry: Optional[Dict[str, Any]]

Attributes:

  • id (int): Entry identifier

  • raw_text (str): The raw reference text

  • doi (str, optional): Digital Object Identifier if detected

  • url (str, optional): URL if detected

  • query_string (str, optional): Search query string

  • original_entry (dict, optional): Preserved original BibTeX entry fields

IdentifiedEntry

A TypedDict representing an entry after identification from data sources (Stage 2).

Type Definition:

class IdentifiedEntry(TypedDict, total=False):
    id: int
    raw_text: str
    doi: Optional[str]
    arxiv_id: Optional[str]
    url: Optional[str]
    metadata: Optional[Dict[str, Any]]
    status: str

Attributes:

  • id (int): Entry identifier

  • raw_text (str): Original raw text

  • doi (str, optional): Digital Object Identifier

  • arxiv_id (str, optional): arXiv identifier

  • url (str, optional): Conference or other URL

  • metadata (dict, optional): Additional metadata from various sources

  • status (str): Status - ‘identified’ or ‘identification_failed’

CompletedEntry

A TypedDict representing a fully processed entry with all metadata (Stage 3).

Type Definition:

class CompletedEntry(TypedDict, total=False):
    id: int
    doi: str
    status: str
    bib_key: str
    bib_data: Dict[str, Any]

Attributes:

  • id (int): Entry identifier

  • doi (str): Digital Object Identifier

  • status (str): Status - ‘completed’ or ‘enrichment_failed’

  • bib_key (str): BibTeX citation key (e.g., “LeCun2015Deep”)

  • bib_data (dict): Complete bibliographic data with all fields

Note: CompletedEntry is a TypedDict without methods. Use the FormatterModule from pipeline.py to convert entries to different output formats.

Classes

TemplateLoader

Manages citation templates by loading YAML template files.

Constructor:

TemplateLoader(templates_dir: Optional[str] = None)

Parameters:

  • templates_dir (str, optional): Custom template directory path. If None, uses the built-in onecite/templates/ directory.

Methods:

  • load_template(template_name: str) -> Dict[str, Any]: Load a YAML template by name. Returns the template dictionary or a default template if not found.

Example:

from onecite import TemplateLoader

# Use default templates directory
loader = TemplateLoader()
template = loader.load_template("journal_article_full")
print(template['name'])

# Use custom templates directory
custom_loader = TemplateLoader(templates_dir="/path/to/templates")
custom_template = custom_loader.load_template("my_custom_template")

PipelineController

Manages the 4-stage processing pipeline (Parse → Identify → Enrich → Format).

Constructor:

PipelineController(use_google_scholar: bool = False)

Parameters:

  • use_google_scholar (bool): Whether to enable Google Scholar as a data source. Default is False.

Methods:

  • process(input_content: str, input_type: str, template_name: str, output_format: str, interactive_callback: Callable) -> Dict[str, Any]: Execute the complete 4-stage processing pipeline

Note: Most users should use the process_references() function instead, which provides a simpler interface. PipelineController is a lower-level API for advanced use cases.

Example:

from onecite import PipelineController

controller = PipelineController()
result = controller.process(
    input_content="10.1038/nature14539",
    input_type="txt",
    template_name="journal_article_full",
    output_format="bibtex",
    interactive_callback=lambda candidates: 0
)

print(result['results'])

Exceptions

All exceptions inherit from OneCiteError.

OneCiteError

Base exception for all OneCite errors.

ValidationError

Raised when entry validation fails.

try:
    result = process_references("")
except ValidationError as e:
    print(f"Validation failed: {e}")

ParseError

Raised when parsing input fails.

try:
    result = process_references("invalid input", input_type="bib")
except ParseError as e:
    print(f"Parse failed: {e}")

ResolverError

Raised when data source resolution fails.

try:
    result = process_references("nonexistent doi")
except ResolverError as e:
    print(f"Resolver failed: {e}")

Advanced Usage

Custom Data Processing

from onecite import process_references

# For most use cases, use process_references directly
result = process_references(
    input_content="10.1038/nature14539",
    input_type="txt",
    template_name="journal_article_full",
    output_format="bibtex",
    interactive_callback=lambda candidates: 0
)

print('\n\n'.join(result['results']))

# Access the processing report
report = result['report']
print(f"Total: {report['total']}, Succeeded: {report['succeeded']}")

Working with Templates

from onecite import TemplateLoader, process_references

# Load a template to inspect it
loader = TemplateLoader()
template = loader.load_template("journal_article_full")
print(f"Template name: {template['name']}")
print(f"Entry type: {template['entry_type']}")

# To use a custom template, place it in onecite/templates/ directory
# then reference it by name
result = process_references(
    input_content="10.1038/nature14539",
    input_type="txt",
    template_name="your_custom_template",  # without .yaml extension
    output_format="bibtex",
    interactive_callback=lambda candidates: 0
)

Next Steps