A Neural Corpus Indexer for Document Retrieval

Date:

Current SOTA for document retrieval solutions mainly follow an index-retrieve, where the index is hard to be directly optimized for the final retrieval target.

This paper uses an end-to-end deep neural network unifying training and indexing stages can significantly improve the recall performance of traditionalmethods.

Neural Corpus Indexer (NCI), a sequence-to-sequence network that generates relevant document identifiers directly for a designated query.

They optimize it with

  • A prefix-aware weight-adaptive decoder architecture
  • Leveraging tailored techniques including
    • Query generation
    • Semantic document identifiers
    • Consistency-based regularization

Powerpoint for this talk

Reference Paper

Leave a Comment