Python API Reference ==================== This guide covers using OneCite as a Python library in your own code. Basic Usage ----------- Simple Citation Processing ~~~~~~~~~~~~~~~~~~~~~~~~~~~ :: from onecite import process_references # Process a simple reference result = process_references( input_content="10.1038/nature14539", input_type="txt", template_name="journal_article_full", output_format="bibtex", interactive_callback=lambda candidates: 0 # Auto-select first match ) # Print results for citation in result['results']: print(citation) The Result Dictionary ~~~~~~~~~~~~~~~~~~~~~ The ``process_references`` function returns a dictionary containing: - ``results`` (List[str]): List of formatted citation strings - ``report`` (dict): Processing report with the following keys: - ``total`` (int): Total number of entries processed - ``succeeded`` (int): Number of successfully processed entries - ``failed_entries`` (List[Dict]): List of failed entries with error details :: result = process_references( input_content="10.1038/nature14539", input_type="txt", template_name="journal_article_full", output_format="bibtex", interactive_callback=lambda candidates: 0 ) print(f"Total: {result['report']['total']}") print(f"Succeeded: {result['report']['succeeded']}") print(f"Failed: {len(result['report']['failed_entries'])}") Processing Different Input Formats ----------------------------------- Plain Text Input ~~~~~~~~~~~~~~~~ :: from onecite import process_references txt_content = """ 10.1038/nature14539 Vaswani et al., 2017, Attention is all you need Smith (2020) Neural Architecture Search """ result = process_references( input_content=txt_content, input_type="txt", template_name="journal_article_full", output_format="bibtex", interactive_callback=lambda candidates: 0 ) # Access results print('\n\n'.join(result['results'])) BibTeX Input ~~~~~~~~~~~~ :: from onecite import process_references bibtex_content = """ @article{LeCun2015, title = {Deep Learning}, author = {LeCun, Yann and Bengio, Yoshua and Hinton, Geoffrey}, journal = {Nature}, year = {2015} } """ result = process_references( input_content=bibtex_content, input_type="bib", template_name="journal_article_full", output_format="bibtex", interactive_callback=lambda candidates: 0 ) print('\n\n'.join(result['results'])) Output Formats -------------- :: # BibTeX format result = process_references( input_content="10.1038/nature14539", input_type="txt", template_name="journal_article_full", output_format="bibtex", interactive_callback=lambda candidates: 0 ) # APA format result = process_references( input_content="10.1038/nature14539", input_type="txt", template_name="journal_article_full", output_format="apa", interactive_callback=lambda candidates: 0 ) # MLA format result = process_references( input_content="10.1038/nature14539", input_type="txt", template_name="journal_article_full", output_format="mla", interactive_callback=lambda candidates: 0 ) Interactive Selection with Callbacks ------------------------------------- For handling ambiguous references programmatically, use a callback function: :: from onecite import process_references def auto_select_best(candidates): """Always select the first (best match) candidate""" return 0 # Return the index of the selected candidate (0-based) result = process_references( input_content="Deep learning Hinton", input_type="txt", template_name="journal_article_full", output_format="bibtex", interactive_callback=auto_select_best ) print('\n\n'.join(result['results'])) Custom Callback Logic ~~~~~~~~~~~~~~~~~~~~~ :: def smart_selector(candidates): """Select candidate with most complete metadata""" best_idx = 0 best_score = 0 for idx, candidate in enumerate(candidates): # Score based on number of fields score = sum(1 for v in candidate.values() if v) if score > best_score: best_score = score best_idx = idx return best_idx result = process_references( input_content="Deep learning nature 2015", input_type="txt", template_name="journal_article_full", output_format="bibtex", interactive_callback=smart_selector ) print('\n\n'.join(result['results'])) Advanced Data Structures ------------------------ OneCite defines three TypedDict classes representing different stages of the processing pipeline: RawEntry ~~~~~~~~ A TypedDict representing an unprocessed reference entry (Stage 1): :: from onecite import RawEntry from typing import Dict, Any, Optional # RawEntry is a TypedDict with these fields: entry: RawEntry = { 'id': 1, 'raw_text': "10.1038/nature14539", 'doi': "10.1038/nature14539", 'url': None, 'query_string': None, 'original_entry': None } IdentifiedEntry ~~~~~~~~~~~~~~~ A TypedDict representing an entry after identification from data sources (Stage 2): :: from onecite import IdentifiedEntry # IdentifiedEntry includes fields like: # id, raw_text, doi, arxiv_id, url, metadata, status CompletedEntry ~~~~~~~~~~~~~~~ A TypedDict representing a fully processed entry with all metadata (Stage 3): :: from onecite import CompletedEntry # CompletedEntry includes fields like: # id, doi, status, bib_key, bib_data **Note:** These are TypedDict classes without methods. They are primarily used internally by the pipeline. Most users should interact with OneCite through the ``process_references()`` function. Working with Templates ---------------------- Load and inspect templates:: from onecite import TemplateLoader loader = TemplateLoader() # Load a specific template template = loader.load_template("journal_article_full") print(f"Template name: {template['name']}") print(f"Entry type: {template['entry_type']}") print(f"Fields: {[f['name'] for f in template['fields']]}") # Use a custom templates directory custom_loader = TemplateLoader(templates_dir="/path/to/templates") custom_template = custom_loader.load_template("my_template") Using the Pipeline Controller ------------------------------ For advanced use cases requiring more control over the processing pipeline: :: from onecite import PipelineController # Create controller (optionally enable Google Scholar) controller = PipelineController(use_google_scholar=False) # Process with full control result = controller.process( input_content="10.1038/nature14539", input_type="txt", template_name="journal_article_full", output_format="bibtex", interactive_callback=lambda candidates: 0 ) print('\n\n'.join(result['results'])) **Note:** Most users should use ``process_references()`` instead, which is simpler and provides the same functionality. Error Handling -------------- Handling Exceptions ~~~~~~~~~~~~~~~~~~~ :: from onecite import process_references, ValidationError, ParseError try: result = process_references( input_content="invalid_reference", input_type="txt", template_name="journal_article_full", output_format="bibtex", interactive_callback=lambda candidates: 0 ) except ValidationError as e: print(f"Validation error: {e}") except ParseError as e: print(f"Parse error: {e}") except Exception as e: print(f"Processing error: {e}") Processing Files ---------------- Reading from File ~~~~~~~~~~~~~~~~~ :: from onecite import process_references # Read from file with open("references.txt", "r", encoding="utf-8") as f: content = f.read() result = process_references( input_content=content, input_type="txt", template_name="journal_article_full", output_format="bibtex", interactive_callback=lambda candidates: 0 ) # Write to file output_content = '\n\n'.join(result['results']) with open("output.bib", "w", encoding="utf-8") as f: f.write(output_content) Complete Example ---------------- :: from onecite import process_references # Read references with open("my_references.txt", "r", encoding="utf-8") as f: references = f.read() # Process with APA format result = process_references( input_content=references, input_type="txt", template_name="journal_article_full", output_format="apa", interactive_callback=lambda candidates: 0 # Auto-select first match ) # Check results report = result['report'] print(f"Total entries: {report['total']}") print(f"Successfully processed: {report['succeeded']}") print(f"Failed: {len(report['failed_entries'])}") if report['failed_entries']: print("\nFailed entries:") for failed in report['failed_entries']: print(f" - Entry {failed['id']}: {failed.get('error', 'Unknown error')}") # Save output output_content = '\n\n'.join(result['results']) with open("formatted_refs.txt", "w", encoding="utf-8") as f: f.write(output_content) print("\nDone!") API Reference ------------- See :doc:`api/core` for the complete API documentation. Next Steps ---------- - Learn :doc:`mcp_integration` for AI assistant integration - Explore :doc:`templates` for custom formatting - Check :doc:`faq` for common questions