STDTOOLS is a package for indexing and searching a collection of weighted finite state automata. Although it is designed with the Spoken Term Detection (STD) task in mind, it can be used for realizing a variety of information retrieval tasks with little to no modification. It is built upon OpenFST, a library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs).
These utilities construct and search timed factor transducers. A factor transducer is an efficient inverted index of a collection of finite state automata. This version of the STDTOOLS package embeds timing of the states of the input automata into the factor transducer arc weights. Timing information is supplied in the form of state potentials (auxiliary cost information about states). For a detailed description of the timed factor transducer structure as well as the algorithm used for its construction: http://busim.ee.boun.edu.tr/~dogan/files/dogancan-TASLP-revision.pdf
stdtools-0.2: STDTOOLS version 0.2, 64-bit, ...
stdindex, stdsearch, stdprocess - spoken term detection utilities
Running the application This is only a preview of the work with these tools. For details see appropriate man pages.
- PREPROCESS - reads a list of FST-IN POTENTIAL-IN FST-OUT POTENTIAL-OUT quadruplets from file in.list or standard input. It processes (pruning, arc clustering, etc.) the input files and writes them to the specified output files.
$ stdprocess in.list out.list
in1.fst in1.pot out1.fst out1.pot
in2.fst in2.pot out2.fst out2.pot
- INDEX - reads a list of FST POTENTIAL tuples from file in.list or standard input. It writes a timed factor transducer to file index.fst or standard output.
$ stdindex in.list index.fst
Input Format of the in.list file:
- SEARCH - reads index from file index.fst and queries from file query.far or standard input. It writes search results to file out.list or standard output.
$ stdsearch index.fst query.far out.list
- locally using the following commands:
$ man stdtools
$ man fstar
open-source, distributed under the terms of the Apache license.
Jan Vavruška (vavruska (at) kky.zcu.cz)