Comrite Unix Man page/Perldoc/Info page, English-Chinese Dictionary, Chinese-English Dictionary

dictfmt

Command: man perldoc info search(apropos)  


 
DICTFMT(1)                                                          DICTFMT(1)



NAME
       dictfmt - formats a DICT protocol dictionary database

SYNOPSIS
       dictfmt  -c5|-t|-e|-f|-h|-j|-p [options]  basename

DESCRIPTION
       dictfmt takes a file, FILE, on stdin, and creates a dictionary database
       named basename.dict, that conforms to the DICT protocol.  It also  cre-
       ates  an  index  file  named  basename.index.  By default, the index is
       sorted according to the C locale, and only alphanumeric characters  and
       spaces  are  used  in  sorting,  however  this  may be changed with the
       --locale and --allchars options.  ( basename is commonly chosen to cor-
       respond to the basename of FILE , but this is not mandatory.)

       Unless  the  database is extremely small, it is highly recommended that
       basename.dict be  compressed  with  /usr/bin/dictzip  to  create  base-
       name.dict.dz.  (dictzip is included in the dictd source package.)

       FILE  may  be  in  any  of  the several formats described by the format
       options -c5, -t, -e, -f, -h, -j, or -p.  Exactly one of  these  options
       must be given.

       dictfmt   prepends   several  headers  are  to  the  .dict  file.   The
       00-database-url header gives the value of the -u option as the  URL  of
       the   site   from  which  the  original  database  was  obtained.   The
       00-database-short header gives the value of the -s option as the  short
       name  of  the  dictionary.   (This "short name" is the identifying name
       given by the "dict- D" option.)  If the -u and/or -s options are  omit-
       ted,  these values will be shown as "unknown", which is undesirable for
       a publicly distributed database.

       The date of conversion (formatting) is given  in  the  00-database-info
       header.   All  text  in  the input file prior to the first headword (as
       defined by the appropriate  formatting  option)  is  appended  to  this
       header.   All  text  in  the input file following a headword, up to the
       next headword, is copied unchanged to the .dict file.


FORMATTING OPTIONS
       -c5    FILE is formatted with headwords preceded by 5  or  more  under-
              score  characters (_) and a blank line.  All text until the next
              headword is considered the definition.  Any leading `@'  charac-
              ters are stripped out, but the file is otherwise unchanged. This
              option was written to format the CIA WORLD FACTBOOK 1995.

       -t     -c5, --without-info and --without-headword options are  implied.
              Use  this  option,  if an input database comes from dictunformat
              utility.

       -e     FILE is in html  format,  with  the  headword  tagged  as  bold.
              (<B>headword - </B>)
              This  option  was  written to format EASTON'S 1897 BIBLE DICTIO-
              NARY.  A typical entry from Easton is:

              <A NAME="T0000005">
              <B>Abagtha - </B>
              one of the seven eunuchs  in  Ahasuerus's  court  (Esther  1:10;
              2:21).

              This is converted to:
              Abagtha
                 one  of  the seven eunuchs in Ahasuerus's court (Esther 1:10;
              2:21).

              The heading "<A NAME="T0000005"> is omitted,  and  the  headword
              `Abagtha' is indexed.

              NOTE:  This option should be used with caution.  It removes sev-
              eral html tags (enough to format Easton properly), but not  all.
              The  Makefile  that was originally written to format dict-easton
              uses sed scripts to modify certain cross reference tags.  It may
              be  necessary  to  pipe  the input file through a sed script, or
              hack the source of dictfmt in order  to  properly  format  other
              html databases.

       -f     FILE  is formatted with the headwords starting in column 0, with
              the definition indented at least one space (or tab character) on
              subsequent  lines.  The third line starting in column 0 is taken
              as the first headword , and the first two lines starting in col-
              umn  0 are treated as part of the 00-database-info header.  This
              option was written to format the F.O.L.D.O.C.

       -h     FILE is formatted with the headwords starting in column 0,  fol-
              lowed  by  a  comma,  with the definition continuing on the same
              line.  All text  before  the  first  single  character  line  is
              included  in  00-database-info  header,  and lines with only one
              character are omitted from the .dict file.  The  first  headword
              is  on  the line following the first single character line.  The
              headword is indexed; the text of the file is not changed.   This
              option was written to format HITCHCOCK'S BIBLE NAMES DICTIONARY.

       -j     FILE is formatted with headwords starting in col 0, enclosed  in
              colons,  followed by the definition.  The colons surrounding the
              headword are removed, and the headword is indexed.  Lines begin-
              ning  with  '*',  '=', or '-' are also removed.  All text before
              the first headword is included in the headers.  This option  was
              written to format the JARGON FILE.
              NOTE:  Some  recent versions of the JARGON FILE had three blanks
              inserted before the first colon at each headword.  These must be
              removed  before processing with dictfmt.  (sed scripts have been
              used for this purpose. ed, awk, or perl scripts are also  possi-
              ble.)

       -p     FILE  is  formatted  with `%h' in column 0, followed by a blank,
              followed by the headword, optionally followed by a line contain-
              ing  `%d'  in  column 0.  The definition starts on the following
              line.  The first line beginning '%h' and  any  lines  beginnning
              '%d'  are  stripped  from  the .dict file, and '%h ' is stripped
              from in front of the headword.  All text before the first  head-
              word is included in the headers.  The second line beginning '%h'
              is taken as the first headword.  This option was written to for-
              mat Jay Kominek's elements database.


OPTIONS
       -u url Specifies  the  URL  of the site from which the raw database was
              obtained.    If   this   option   is   specified,   00-database-
              url/00databaseurl  headword  and  appropriate definition will be
              ignored.

       -s name
              Specifies the name and, optionally, the version and date, of the
              database.   (If  this  contains  spaces, it must be quoted.)  If
              this  option  is  specified,   00-database-short/00databaseshort
              headword and appropriate definition will be ignored.

       -L     display license and copyright information

       -V     display version information

       -D     output debugging information

       --help display a help message

       --locale locale
              specifies  the  locale used for sorting.  if no locale is speci-
              fied, the "C" locale is used.

       --allchars
              use all characters (not only alphanumeric and space) in  sorting
              the index

       --headword-separator sep
              sets  the headword separator, which allows several words to have
              the same definition.  For example, if '--headword-separator %%%'
              is  given,  and  the  input  file contains 'autumn%%%fall', both
              'autumn' and 'fall' will be indexed as  headwords, with the same
              definition.

       --break-headwords
              multiple  headwords  will  be  written  on separate lines in the
              .dict file.  For use with '--headword-separator.

       --without-headword
              headwords will not be included in .dict file

       --without-header
              header will not be copied to DB info entry

       --without-url
              URL will not be copied to DB info entry

       --without-time
              time of creation will not be copied to DB info entry

       --without-info
              DB info entry will not  be  created.   This  may  be  useful  if
              00-database-info  headword  is expected from stdin (dictunformat
              outputs it).

       --columns columns
              By default dictfmt wraps strings read from stdin to 72  columns.
              This  option changes this default. If it is set to zero or nega-
              tive value, wrapping is off.

       --default-strategy strategy
              Sets the default search strategy for the database.  It  will  be
              used     instead     of    strategy    '.'.     Special    entry
              00-database-default-strategy is created for this purpose.   This
              option  may  be useful, for example, for dictionaries containing
              mainly phrases but the single words.   In  any  case,  use  this
              option if you are absolutely sure what you are doing.

CREDITS
       dictfmt  was  written  by  Rik  Faith (faith AT cs.edu) as part of the
       dict-misc package.  dictfmt is distributed under the terms of  the  GNU
       General  Public  License.  If you need to distribute under other terms,
       write to the author.

AUTHOR
       This   manual   page   was    written    by    Robert    D.    Hilliard
       <hilliard AT debian.org> .


SEE ALSO
       dict(1),  dictd(8),  dictzip(1),  dictunformat(1), http://www.dict.org,
       RFC 2229



                               25 December 2000                     DICTFMT(1)
 

©2005 Comrite