up | Inhaltsverzeichniss | Kommentar

Manual page for IXBUILD(1)

ixbuild - build inverted indexes on file system subtrees

SYNOPSIS

/usr/bin/ixbuild [ -aAcCdfgloprsuv ] [ -Dfile ] [ -Ffile ] [ -Llanguage ] [ -M# ] [ -Nfile ] [ -P# ] [ -Sfile ] [ -Tfile ] [ -ystring ] [ path ... ]

DESCRIPTION

ixbuild creates or updates indexes for the files or directories named on the command line. For each directory named on the command line, or for the current directory by default, ixbuild creates or updates the associated index. Each index is located in a file named .index.store at the root of its subtree.

An index is a special kind of file used by the Indexing Kit, called a store file, which has an IXStoreDirectory containing an IXFileFinder (named ``FileFinder''). The IXFileFinder is responsible for actual manipulation of the indexes, and is accessible through the IXStoreDirectory by applications that use the Indexing Kit, whose documentation is available online in Digital Librarian.

ixbuild makes use of several special files when first creating an index. The contents of these files are incorporated into the index itself, so they aren't referenced when an index is updated. However, if the index is deleted, and rebuilt from scratch, these files will be used again, so you may not want to delete them. Here are brief descriptions of the files, their uses, and formats:

.index.ftype contains information about the types of files that will be included in the index. A file's type is used to determine how tokens (words) should be extracted from it, or how to convert it to a form that the Indexing Kit can index. Each line in this file should be of the form:


typename pattern format offset filename

Each field must be separated from the next by exactly one tab. Any field may be ``-'', in which case the field won't be used. typename is the name that should be used for the type; for example, ``man'' or ``ps''. pattern is a sequence of characters within a file that may be used to identify it (for example, ``%!PS''); if pattern begins with a `/', or if the format is regex (see below) it's interpreted as a regular expression. format is the data type of pattern; it may be one of byte, short, long, regex, or string. string is the default format. offset is the unit offset into the file at which pattern is expected to occur. The unit is that of formatthat is, if format is long, offset is measured in amounts of 4 bytes. filename is a filename that should be matched to the type; it may contain wildcards (for example, ``*.rtf''). This might be the ftype entry for PostScript files, for example:


ps	%!PS	string	0	-

.index.itype contains the names of types of files (as defined in .index.ftype) that will not be included in the index. Each type name should be on a separate line.

.index.iname contains the base names (without paths) of files that will not be included in the index. The filename must be exact; shell wildcards are not allowed. Each file name should be on a separate line.

.index.swords contains stop words, which will not be included in the index. Each word should be on a separate line, and should be in post-processed form (that is, if you use case folding, all stop words should be lowercase, and if you use stem reduction, all words should be stems only).

.index.domain contains a weighting domain used for peculiarity weighting (see the IXWeightingDomain and IXAttributeParser class specifications in the Indexing Kit documentation). You can use the ixparse.1 command to convert histogram or NEXTSTEP Release 2 WFTable files to domain format.

OPTIONS

The following options control how an index is built or updated. Using them with an existing index will alter its configuration (for example, changing its weighting type); if you want the configuration of an index to be retained when updating it, specify the -o option.

--
Lists these options.
-a
Use absolute weighting. The weight of a token (word) is its number of occurrences in the files of the directory.
-A
Don't fold plural word forms. The default is to do plural folding.
-c
Clean indexes after updating, removing out-of-date information.
-C
Don't fold case to lower case. The default is to fold case.
-d
Cross device boundaries (mounted disks, for example).
-Dfile
Use the supplied weighting domain file (default .index.domain). This is used for generating peculiarity weights.
-f
Use frequency weighting (number of occurrences / total tokens).
-Ffile
Use the supplied file type table file (default .index.ftype).
-g
Generate descriptions automatically from file contents.
-l
Traverse symbolic links.
-Llanguage
Parse files as though they contain text in the language language. If no language is specified, the system default language is used.
-M#
Use the supplied minimum weight; words below this weight are dropped from the index.
-Nfile
Use the supplied ignored name list file (default .index.iname)
-o
Don't reset options when updating an existing index.
-p
Use peculiarity weighting in conjunction with a weighting domain (see -D).
-P#
Use the supplied percentage passed; words below this percentage are dropped from the index.
-r
Reduce words to stems; writer -> write. The default is not to do this.
-s
Build indexes for a static collection (that is, for directories whose files won't change).
-Sfile
Use the supplied stop words file (default .index.swords).
-Tfile
Use the supplied ignored type list file (default .index.itype).
-u
Disable automatic updating for index.
-v
Generate verbose output.
-ystring
Use the supplied punctuation string to delimit words; for example, -y".,; ".

FILES

.index.store	an index file created by ixbuild
.index.ftype	file type table
.index.iname	ignored file names
.index.itype	ignored file types
.index.swords	stop words (dropped from index)
.index.domain	weighting domain

SEE ALSO

ixsearch(1), ixparse(1), Indexing Kit Documentation in NEXTSTEP General Reference


index | Inhaltsverzeichniss | Kommentar

Created by unroff & hp-tools. © somebody (See intro for details). All Rights Reserved. Last modified 11/5/97