Keyword spotting (or more simply, word spotting) is a problem that was historically first defined in the context of speech processing.
A special case of keyword spotting is wake word (also called hot word) detection used by personal digital assistants such as Alexa or Siri to activate the dormant speaker, in other words "wake up" when their name is spoken.
In the United States, the National Security Agency has made use of keyword spotting since at least 2006. This technology allows analysts to search through large volumes of recorded conversations and isolate mentions of suspicious keywords. Recordings can be indexed and analysts can run queries over the database to find conversations of interest. IARPA funded research into keyword spotting in the Babel program.
Some algorithms used for this task are:
- Sliding window and garbage model
- K-best hypothesis
- Iterative Viterbi decoding
- Convolutional neural network on Mel-frequency cepstrum coefficients
- Transformer-based small-footprint keyword spotting
In document image processing
Keyword spotting in document image processing can be seen as an instance of the more generic problem of content-based image retrieval (CBIR).
Given a query, the goal is to retrieve the most relevant instances of words in a collection of scanned documents.
