Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Unreleased
[0.1.1] - 2026-04-17
Maintenance release focused on aligning the abstract-retrieval semantics across code, templates, docs, tests, and metadata. No breaking public-API changes; the one renamed kwarg keeps its old name as a deprecated alias for this release cycle.
Added
Abstract retrieval now falls back through a DOI-only cascade when CrossRef does not return an abstract: Semantic Scholar (
/paper/DOI:{doi}?fields=abstract) → PubMed (ESearch DOI→PMID, then EFetch PMID→abstract). The cascade is only invoked when the user’s original raw input carried a DOI; DOIs inferred by fuzzy search do not trigger it, so a possibly-wrong candidate does not cost extra roundtrips. In particular, a local BibTeX entry with nodoifield — regardless of whether other stages would later resolve one — does not trigger the abstract cascade.Semantic Scholar search results now carry the
abstractfield, which propagates through_convert_search_metadatainto the final BibTeX output whenever the identification stage already resolved the entry through SS.EnricherModule._get_semantic_scholar_abstract(doi)helper for DOI-based Semantic Scholar abstract retrieval. Handles404/429gracefully by returningNone._complete_fieldsgained anallow_abstract_fallbackkwarg (defaultFalse) that gates the new cascade._enrich_single_entrypassesTrueonly when the raw entry contributed a DOI.Default
journal_article_fulltemplate now listsabstractas an optional field so the declaration matches what the enricher emits. The olderjournal_article_with_abstracttemplate is retained as a compatibility alias and will stay available for at least one release cycle.Regression test
test_enrich_single_entry_no_doi_in_raw_skips_abstract_fallbackpinning the “no-DOI-in raw ⇒ no Semantic-Scholar/PubMed network call” guarantee at the_enrich_single_entrylayer.
Changed
_get_pubmed_abstractnow requires a DOI and no longer falls back to PubMed title search. The removed title-based path empirically returned the abstract of an unrelated paper (e.g. the Zhang 2020 AI Review DOI10.1007/s10462-019-09792-7pulled the abstract of a different RSI segmentation paper), which is strictly worse than returningNonefor downstream semantic cross-checks such as thesciskill.Abstract coverage on an internal 10-DOI cross-publisher spot-check rose from 4/9 to 8/9. This number is a local indicator, not a release gate: reproducing it requires a live network and the probe scripts are no longer in the repository.
Deprecated
_complete_fields(..., allow_pubmed_fallback=...)is deprecated in favour ofallow_abstract_fallback. The old name still works for one release cycle and emitsDeprecationWarning. It was renamed because the flag actually gates the entire Semantic-Scholar + PubMed cascade, not PubMed alone.
Removed
IdentifierModule._check_doi_content_consistencyand theconsistency_score/low_consistencywarning path. A fuzzy string-similarity score on bibliographic fields is not a reliable signal for detecting fabricated references, and it was only emitted as alogger.warningthat downstream tools could not act on. Citation-authenticity verification belongs at the abstract-vs-claim semantic layer in the consuming tool, not at the bibliographic-string layer here.
[0.1.0] - 2026-04-17
First formal PyPI release since 0.0.12.
Added
RST documentation using Sphinx
Full API reference documentation
FAQ section with common questions
Contributing guidelines
Pre-commit hooks configuration
Google-style docstrings with Args/Returns for all public API functions
Auto-deploy documentation to GitHub Pages via CI
Changed
Split monolithic pipeline.py (~3000 lines) into a proper
onecite/pipeline/package with one module per stageUnify CrossRef request and parsing methods, with
User-Agentandmailtoset per CrossRef etiquetteRewrite fuzzy-search scoring as a weighted title/author/year/venue model with three confidence tiers
Simplify identifier routing; CrossRef and Semantic Scholar are the always-on sources, with signal-based PubMed / Google Books / OpenAIRE / BASE queries
Use
bibtexparser.dumps()for BibTeX renderingExpose
use_google_scholaras a real CLI flag and API parameterClarify that templates define metadata-field requirements and a fallback BibTeX entry type, not output formatting
Refactored exception hierarchy
Added type hints to Python API
Removed
APA and MLA output renderers; the CLI now rejects anything other than
--output-format bibtex. Use pandoc or citeproc-py to convert the generated BibTeX to APA / MLAHard-coded “well-known paper” shortcut that masked failures on the main example input
MCP integration page and all related references
.readthedocs.yml(docs now hosted on GitHub Pages)docs/_build/build artifacts from repository
Fixed
OpenAlex and dblp no longer listed as data sources — they were never wired into the code
docs/api/pipeline.rstrewritten to match the real modules; removed references to nonexistent classes / methodsREADME and docs
@inproceedingsexample now usesbooktitleinstead ofjournal = "arXiv preprint"Crossref author names parsed as
given familySemantic Scholar HTTP 429 handled cleanly
Previously-unused exception classes now raised in the right places
CONTRIBUTING.mddocumentspip install -e .[dev]instead of the non-existentrequirements.txtURL-bearing entries no longer queried twice
Fallback paths mark entries as
identification_failedrather than fabricating invented metadataCrossRef and Semantic Scholar response parsing edge cases
API documentation using incorrect return value fields
Version number inconsistencies across metadata files
Python version requirement inconsistencies in docs (3.7 -> 3.10)
[0.0.11] - 2024-10-19
Added
Custom YAML-based template system
Support for multiple output formats (BibTeX, APA, MLA)
Interactive mode for ambiguous reference selection
Support for DOI, arXiv, PMID, ISBN, and GitHub identifiers
Integration with 9 major academic data sources
Test suite
Changed
Refactored core processing pipeline
Reordered data source priority (CrossRef first for DOI queries)
Clearer error messages on failed lookups
Fixed
Encoding issues with non-ASCII characters in author names
DOI parsing for URLs with trailing query strings
Python 3.10 compatibility issues
[0.0.10] - 2024-10-01
Added
Initial Python API
Basic citation processing
Support for journal articles and conference papers
Changed
Better title matching for fuzzy searches
Fixed
PubMed API response handling
Semantic Scholar rate limit handling
[0.0.9] and Earlier
See GitHub Releases for details on older versions.
Upgrade Guide
From 0.0.10 to 0.0.11
Breaking Changes: None
New Features:
Custom template support - create YAML templates for custom formats
APA and MLA formats - use
--output-format apaor--output-format mlaInteractive mode - use
--interactiveflag for ambiguous references
Migration:
No migration needed. All existing functionality is backward compatible. New features are opt-in.
Version History
Latest Stable: 0.1.1
Python Support:
3.10+
3.11+
Requirements:
See pyproject.toml for current dependencies.
Getting Help
Check Frequently Asked Questions (FAQ) for common issues
Search GitHub Issues
Ask in GitHub Discussions
See Contributing to OneCite to report bugs or suggest features
Release Strategy
Versioning:
OneCite follows Semantic Versioning:
MAJOR.MINOR.PATCH
MAJOR: Breaking API changes
MINOR: New backward-compatible features
PATCH: Bug fixes
Release Cadence:
Major releases: Annually or for major features
Minor releases: Quarterly
Patch releases: For critical bugs
Support:
Latest version: Full support
Previous major version: Limited support
Older versions: Community support only
Deprecation Policy
Features marked as deprecated will:
Be announced in release notes
Work for at least one minor version
Be removed in the next major version
Breaking Changes Policy
Breaking changes are:
Announced in advance
Clearly documented
Provided with migration guide
Only released in major versions
Credits
Contributors and acknowledgments:
OneCite Team
Open source community
Data source providers (CrossRef, PubMed, arXiv, etc.)
All contributors on GitHub
See the GitHub Contributors page for a full list.
Next Steps
Check Quick Start Guide to get started
Read Contributing to OneCite to contribute
See Frequently Asked Questions (FAQ) for common questions