STDTOOLS

Z MetaCentrum
Přejít na: navigace, hledání


Description

STDTOOLS is a package for indexing and searching a collection of weighted finite state automata. Although it is designed with the Spoken Term Detection (STD) task in mind, it can be used for realizing a variety of information retrieval tasks with little to no modification. It is built upon OpenFST, a library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs).

These utilities construct and search timed factor transducers. A factor transducer is an efficient inverted index of a collection of finite state automata. This version of the STDTOOLS package embeds timing of the states of the input automata into the factor transducer arc weights. Timing information is supplied in the form of state potentials (auxiliary cost information about states). For a detailed description of the timed factor transducer structure as well as the algorithm used for its construction: http://busim.ee.boun.edu.tr/~dogan/files/dogancan-TASLP-revision.pdf

Availability

Module stdtools-0.2: STDTOOLS version 0.2, 64-bit, ...

Use

stdindex, stdsearch, stdprocess - spoken term detection utilities

Running the application This is only a preview of the work with these tools. For details see appropriate man pages.

  • PREPROCESS - reads a list of FST-IN POTENTIAL-IN FST-OUT POTENTIAL-OUT quadruplets from file in.list or standard input. It processes (pruning, arc clustering, etc.) the input files and writes them to the specified output files.
$ stdprocess in.list out.list

Input Format:

in1.fst in1.pot out1.fst out1.pot

in2.fst in2.pot out2.fst out2.pot


  • INDEX - reads a list of FST POTENTIAL tuples from file in.list or standard input. It writes a timed factor transducer to file index.fst or standard output.
$ stdindex in.list index.fst

Input Format of the in.list file:

in1.fst in1.pot

in2.fst in2.pot

  • SEARCH - reads index from file index.fst and queries from file query.far or standard input. It writes search results to file out.list or standard output.
$ stdsearch index.fst query.far out.list

Documentation

  • locally using the following commands:
    • $ man stdtools
    • $ man fstar

License

open-source, distributed under the terms of the Apache license.

Supported platforms

amd64

Program administrator

Jan Vavruška (vavruska (at) kky.zcu.cz)

Homepage

http://www.busim.ee.boun.edu.tr/~dogan/research.html