Core API Reference
Main Functions
process_references()
The primary function for processing citations.
Signature:
def process_references(
input_content: str,
input_type: str,
template_name: str,
output_format: str,
interactive_callback: Callable[[List[Dict]], int]
) -> Dict[str, Any]
Parameters:
input_content
(str): The reference content to processinput_type
(str): Type of input - “txt” or “bib” (required)template_name
(str): Template name to use (e.g., “journal_article_full”) (required)output_format
(str): Output format - “bibtex”, “apa”, or “mla” (required)interactive_callback
(Callable): Function to handle ambiguous matches. Takes a list of candidate dicts and returns the selected index (0-based), or -1 to skip (required)
Returns:
A dictionary with keys:
results
(List[str]): List of formatted citation stringsreport
(dict): Processing report containing:total
(int): Total number of entries processedsucceeded
(int): Number of successfully processed entriesfailed_entries
(List[Dict]): List of failed entries with error details
Example:
from onecite import process_references
result = process_references(
input_content="10.1038/nature14539",
input_type="txt",
template_name="journal_article_full",
output_format="bibtex",
interactive_callback=lambda candidates: 0 # Auto-select first match
)
# Access results
for citation in result['results']:
print(citation)
# Check report
print(f"Succeeded: {result['report']['succeeded']}/{result['report']['total']}")
Data Classes
RawEntry
A TypedDict representing an unprocessed reference entry (Stage 1).
Type Definition:
class RawEntry(TypedDict, total=False):
id: int
raw_text: str
doi: Optional[str]
url: Optional[str]
query_string: Optional[str]
original_entry: Optional[Dict[str, Any]]
Attributes:
id
(int): Entry identifierraw_text
(str): The raw reference textdoi
(str, optional): Digital Object Identifier if detectedurl
(str, optional): URL if detectedquery_string
(str, optional): Search query stringoriginal_entry
(dict, optional): Preserved original BibTeX entry fields
IdentifiedEntry
A TypedDict representing an entry after identification from data sources (Stage 2).
Type Definition:
class IdentifiedEntry(TypedDict, total=False):
id: int
raw_text: str
doi: Optional[str]
arxiv_id: Optional[str]
url: Optional[str]
metadata: Optional[Dict[str, Any]]
status: str
Attributes:
id
(int): Entry identifierraw_text
(str): Original raw textdoi
(str, optional): Digital Object Identifierarxiv_id
(str, optional): arXiv identifierurl
(str, optional): Conference or other URLmetadata
(dict, optional): Additional metadata from various sourcesstatus
(str): Status - ‘identified’ or ‘identification_failed’
CompletedEntry
A TypedDict representing a fully processed entry with all metadata (Stage 3).
Type Definition:
class CompletedEntry(TypedDict, total=False):
id: int
doi: str
status: str
bib_key: str
bib_data: Dict[str, Any]
Attributes:
id
(int): Entry identifierdoi
(str): Digital Object Identifierstatus
(str): Status - ‘completed’ or ‘enrichment_failed’bib_key
(str): BibTeX citation key (e.g., “LeCun2015Deep”)bib_data
(dict): Complete bibliographic data with all fields
Note: CompletedEntry is a TypedDict without methods. Use the FormatterModule
from pipeline.py
to convert entries to different output formats.
Classes
TemplateLoader
Manages citation templates by loading YAML template files.
Constructor:
TemplateLoader(templates_dir: Optional[str] = None)
Parameters:
templates_dir
(str, optional): Custom template directory path. If None, uses the built-inonecite/templates/
directory.
Methods:
load_template(template_name: str) -> Dict[str, Any]
: Load a YAML template by name. Returns the template dictionary or a default template if not found.
Example:
from onecite import TemplateLoader
# Use default templates directory
loader = TemplateLoader()
template = loader.load_template("journal_article_full")
print(template['name'])
# Use custom templates directory
custom_loader = TemplateLoader(templates_dir="/path/to/templates")
custom_template = custom_loader.load_template("my_custom_template")
PipelineController
Manages the 4-stage processing pipeline (Parse → Identify → Enrich → Format).
Constructor:
PipelineController(use_google_scholar: bool = False)
Parameters:
use_google_scholar
(bool): Whether to enable Google Scholar as a data source. Default is False.
Methods:
process(input_content: str, input_type: str, template_name: str, output_format: str, interactive_callback: Callable) -> Dict[str, Any]
: Execute the complete 4-stage processing pipeline
Note: Most users should use the process_references()
function instead, which provides a simpler interface. PipelineController is a lower-level API for advanced use cases.
Example:
from onecite import PipelineController
controller = PipelineController()
result = controller.process(
input_content="10.1038/nature14539",
input_type="txt",
template_name="journal_article_full",
output_format="bibtex",
interactive_callback=lambda candidates: 0
)
print(result['results'])
Exceptions
All exceptions inherit from OneCiteError
.
OneCiteError
Base exception for all OneCite errors.
ValidationError
Raised when entry validation fails.
try:
result = process_references("")
except ValidationError as e:
print(f"Validation failed: {e}")
ParseError
Raised when parsing input fails.
try:
result = process_references("invalid input", input_type="bib")
except ParseError as e:
print(f"Parse failed: {e}")
ResolverError
Raised when data source resolution fails.
try:
result = process_references("nonexistent doi")
except ResolverError as e:
print(f"Resolver failed: {e}")
Advanced Usage
Custom Data Processing
from onecite import process_references
# For most use cases, use process_references directly
result = process_references(
input_content="10.1038/nature14539",
input_type="txt",
template_name="journal_article_full",
output_format="bibtex",
interactive_callback=lambda candidates: 0
)
print('\n\n'.join(result['results']))
# Access the processing report
report = result['report']
print(f"Total: {report['total']}, Succeeded: {report['succeeded']}")
Working with Templates
from onecite import TemplateLoader, process_references
# Load a template to inspect it
loader = TemplateLoader()
template = loader.load_template("journal_article_full")
print(f"Template name: {template['name']}")
print(f"Entry type: {template['entry_type']}")
# To use a custom template, place it in onecite/templates/ directory
# then reference it by name
result = process_references(
input_content="10.1038/nature14539",
input_type="txt",
template_name="your_custom_template", # without .yaml extension
output_format="bibtex",
interactive_callback=lambda candidates: 0
)
Next Steps
See Python API Reference for usage examples
Check Advanced Usage for complex scenarios
Review Custom Templates for custom formatting