tacl search¶

usage: tacl search [-h] [-v] [-m] [-r RAM] [-t {cbeta,latin,pagel}]
                   DATABASE CORPUS CATALOGUE [NGRAMS ...]

Output results of searching the database for the supplied n-grams that occur
within labelled witnesses.

positional arguments:
  DATABASE              Path to database file.
  CORPUS                Path to corpus.
  CATALOGUE             Path to catalogue file.
  NGRAMS                Path to file containing list of n-grams to search for,
                        with one n-gram per line. (default: None)

options:
  -h, --help            show this help message and exit
  -v, --verbose         Display debug information; multiple -v options
                        increase the verbosity. (default: None)
  -m, --memory          Use RAM for temporary database storage.
                        
                        This may cause an out of memory error, in which case
                        run the command without this switch. (default: False)
  -r RAM, --ram RAM     Number of gigabytes of RAM to use. (default: 3)
  -t {cbeta,latin,pagel}, --tokenizer {cbeta,latin,pagel}
                        Type of tokenizer to use. The "cbeta" tokenizer is
                        suitable for the Chinese CBETA corpus (tokens are
                        single characters or workaround clusters within square
                        brackets). The "pagel" tokenizer is for use with the
                        transliterated Tibetan corpus (tokens are sets of word
                        characters plus some punctuation used to transliterate
                        characters). (default: cbeta)

If multiple paths to files containing n-grams are given, the combined set of
n-grams from all files will be searched for.

If no path is given, the results will include all n-grams found for all of the
labelled witnesses in the catalogue.

Due to encoding issues, you may need to set the environment variable
PYTHONIOENCODING to "utf-8".
tacl search¶

TACL

Navigation

Related Topics