A Neural Corpus Indexer for Document Retrieval
Date:
Current SOTA for document retrieval solutions mainly follow an index-retrieve, where the index is hard to be directly optimized for the final retrieval target.
This paper uses an end-to-end deep neural network unifying training and indexing stages can significantly improve the recall performance of traditionalmethods.
Neural Corpus Indexer (NCI), a sequence-to-sequence network that generates relevant document identifiers directly for a designated query.
They optimize it with
- A prefix-aware weight-adaptive decoder architecture
- Leveraging tailored techniques including
- Query generation
- Semantic document identifiers
- Consistency-based regularization
Leave a Comment