Comrite Unix Man page/Perldoc/Info page, English-Chinese Dictionary, Chinese-English Dictionary

arrow

Command: man perldoc info search(apropos)  


 
ARROW(1)                         User Commands                        ARROW(1)



NAME
       arrow - manual page for arrow

SYNOPSIS
       arrow [OPTION...] [ARG...]

DESCRIPTION
       Arrow  is  a  document  retrieval front-end to libbow, it uses TFIDF to
       retrieve relevant documents.

EXAMPLES
       If you have a database of documents in foo you would just need to  type
       arrow  --index  foo to create the database. You could then make queries
       by typing arrow --query then typing your query, and pressing Control-D.

       If  you  want  to  make  many queries, it will be more efficient to run
       arrow as a server, and query it multiple times without restarts by com-
       municating  through  a  socket.   Type,  for  example,  arrow  --query-
       server=9876 and access it through port number 9876.  For example:  tel-
       net  localhost 9876 In this mode there is no need to press Control-D to
       end a query.  Simply type your query on one line, and press return.

OPTIONS
       General options

              For building data structures from text files:

       -i, --index
              tokenize training documents found  under  ARG...,  build  weight
              vectors, and save them to disk

              For  doing  document  retrieval  using the data structures built
              with -i:

       -c, --compare=FILE
              Print the TFIDF cosine similarity metric of the query with  this
              FILE.

       -n, --num-hits-to-show=N
              Show  the  N  documents  that are most similar to the query text
              (default N=1)

       -q, --query[=FILE]
              tokenize input from stdin [or FILE], then  print  document  most
              like it

       --query-forking-server=PORTNUM
              Run  arrow  in  socket  server  mode, forking a new process with
              every connection.  Allows multiple simultaneous connections.

       --query-server=PORTNUM Run arrow in socket server mode.

              Diagnostics

       --print-coo
              Print word co-occurrence statistics.

       --print-idf
              Print, in unsorted order the IDF of all  words  in  the  model's
              vocabulary

       --annotations=FILE
              The  sarray  file  containing  annotations  for the files in the
              index

       -b, --no-backspaces
              Don't use backspace when verbosifying progress (good for use  in
              emacs)

       -d, --data-dir=DIR
              Set  the  directory  in  which  to  read/write  word-vector data
              (default=~/.<program_name>).

       --random-seed=NUM
              The non-negative integer to use for seeding  the  random  number
              generator

       --score-precision=NUM
              The  number  of decimal digits to print when displaying document
              scores

       -v, --verbosity=LEVEL
              Set amount of info printed while  running;  (0=silent,  1=quiet,
              2=show-progess,...5=max)

              Lexing options

       --append-stoplist-file=FILE
              Add words in FILE to the stoplist.

       --exclude-filename=FILENAME
              When  scanning  directories for text files, skip files with name
              matching FILENAME.

       -g, --gram-size=N
              Create tokens for all 1-grams,... N-grams.

       -h, --skip-header
              Avoid lexing news/mail headers by  scanning  forward  until  two
              newlines.

       --istext-avoid-uuencode
              Check  for uuencoded blocks before saying that the file is text,
              and say no if there are many lines of the same length.

       --lex-pipe-command=SHELLCMD
              Pipe files through this shell command before lexing them.

       --max-num-words-per-document=N
              Only tokenize the first N words in each document.

       --no-stemming
              Do not modify lexed words with a stemming function. (usually the
              default, depending on lexer)

       --replace-stoplist-file=FILE
              Empty  the  default stoplist, and add space-delimited words from
              FILE.

       -s, --no-stoplist
              Do not toss lexed words that appear in the stoplist.

       --shortest-word=LENGTH Toss lexed words that are shorter than LENGTH.
              Default is usually 2.

       -S, --use-stemming
              Modify lexed words with the `Porter' stemming function.

       --use-stoplist
              Toss lexed words that appear  in  the  stoplist.   (usually  the
              default SMART stoplist, depending on lexer)

       --use-unknown-word
              When  used in conjunction with -O or -D, captures all words with
              occurrence counts below threshold as the `<unknown>' token

       --xxx-words-only
              Only tokenize words with `xxx' in them

              Mutually exclusive choice of lexers

       --flex-mail
              Use a mail-specific flex lexer

       --flex-tagged
              Use a tagged flex lexer

       -H, --skip-html
              Skip HTML tokens when lexing.

       --lex-alphanum
              Use a special lexer that includes digits in  tokens,  delimiting
              tokens only by non-alphanumeric characters.

       --lex-infix-string=ARG  Use  only the characters after ARG in each word
       for
              stoplisting  and  stemming.  If a word does not contain ARG, the
              entire word is used.

       --lex-suffixing
              Use a special lexer that adds suffixes depending on  Email-style
              headers.

       --lex-white
              Use a special lexer that delimits tokens by whitespace only, and
              does not change the contents of the token at  all---no  downcas-
              ing,  no  stemming, no stoplist, nothing.  Ideal for use with an
              externally-written   lexer   interfaced    to    rainbow    with
              --lex-pipe-cmd.

              Feature-selection options

       -D, --prune-vocab-by-doc-count=N
              Remove words that occur in N or fewer documents.

       -O, --prune-vocab-by-occur-count=N
              Remove words that occur less than N times.

       -T, --prune-vocab-by-infogain=N
              Remove  all  but the top N words by selecting words with highest
              information gain.

              Weight-vector setting/scoring method options

       --binary-word-counts
              Instead of using integer  occurrence  counts  of  words  to  set
              weights, use binary absence/presence.

       --event-document-then-word-document-length=NUM
              Set  the normalized length of documents when --event-model=docu-
              ment-then-word

       --event-model=EVENTNAME
              Set what objects will be considered the `events' of  the  proba-
              bilistic  model.  EVENTNAME can be one of: word, document, docu-
              ment-then-word.

              Default is `word'.

       --infogain-event-model=EVENTNAME
              Set what objects will be considered the `events'  when  informa-
              tion  gain  is  calculated.   EVENTNAME  can  be  one  of: word,
              document, document-then-word.

              Default is `document'.

       -m, --method=METHOD
              Set the word  weight-setting  method;  METHOD  may  be  one  of:
              tfidf_words,     tfidf_log_words,     tfidf_log_occur,    tfidf,
              default=naivebayes.

       --print-word-scores
              During scoring, print the contribution  of  each  word  to  each
              class.

       --smoothing-dirichlet-filename=FILE
              The file containing the alphas for the dirichlet smoothing.

       --smoothing-dirichlet-weight=NUM
              The  weighting factor by which to muliply the alphas for dirich-
              let smoothing.

       --smoothing-goodturing-k=NUM
              Smooth word probabilities for  words  that  occur  NUM  or  less
              times. The default is 7.

       --smoothing-method=METHOD
              Set  the method for smoothing word probabilities to avoid zeros;
              METHOD may be one of: goodturing, laplace, mestimate, wittenbell

       --uniform-class-priors When setting weights, calculating infogain and
              scoring, use equal prior probabilities on classes.

       -?, --help
              Give this help list

       --usage
              Give a short usage message

       -V, --version
              Print program version

       Mandatory  or  optional arguments to long options are also mandatory or
       optional for any corresponding short options.

REPORTING BUGS
       Please report bugs related to this program to Andrew  McCallum  <mccal-
       lum AT cs.edu>.  If  the  bugs  are related to the Debian package send
       bugs to submit AT bugs.org

SEE ALSO
       archer(1), crossbow(1), rainbow(1).

       The full documentation for arrow will be provided as a Texinfo  manual.
       If the info and arrow programs are properly installed at your site, the
       command

              info arrow

       should give you access to the complete manual.

       You  can  also  find  documentation   and   updates   for   libbow   at
       http://www.cs.cmu.edu/~mccallum/bow




arrow 0.2                        November 2002                        ARROW(1)
 

©2005 Comrite