Unix Man page/Perldoc/Info page, English-Chinese Dictionary,
Chinese-English Dictionary
Lingua::Stem::En(3pm) User Contributed Perl DocumentationLingua::Stem::En(3pm) NAME Lingua::Stem::En - Porter's stemming algorithm for 'generic' English SYNOPSIS use Lingua::Stem::En; my $stems = Lingua::Stem::En::stem({ -words => $word_list_reference, -locale => 'en', -exceptions => $exceptions_hash, }); DESCRIPTION This routine applies the Porter Stemming Algorithm to its parameters, returning the stemmed words. It is derived from the C program "stemmer.c" as found in freewais and elsewhere, which contains these notes: Purpose: Implementation of the Porter stemming algorithm documented in: Porter, M.F., "An Algorithm For Suffix Stripping," Program 14 (3), July 1980, pp. 130-137. Provenance: Written by B. Frakes and C. Cox, 1986. I have re-interpreted areas that use Frakes and Cox's "WordSize" func- tion. My version may misbehave on short words starting with "y", but I can't think of any examples. The step numbers correspond to Frakes and Cox, and are probably in Porter's article (which I've not seen). Porter's algorithm still has rough spots (e.g current/currency, -ings words), which I've not attempted to cure, although I have added support for the British -ise suffix. CHANGES 1999.06.15 - Changed to '.pm' module, moved into Lingua::Stem namespace, optionalized the export of the 'stem' routine into the caller's namespace, added named parameters 1999.06.24 - Switch core implementation of the Porter stemmer to the one written by Jim Richardson <jimr AT maths.au> 2000.08.25 - 2.11 Added stemming cache 2000.09.14 - 2.12 Fixed *major* :( implementation error of Porter's algorithm Error was entirely my fault - I completely forgot to include rule sets 2,3, and 4 starting with Lingua::Stem 0.30. -- Benjamin Franz 2003.09.28 - 2.13 Corrected documentation error pointed out by Simon Cozens. METHODS stem({ -words => \@words, -locale => 'en', -exceptions => \%exceptions }); Stems a list of passed words using the rules of US English. Returns an anonymous array reference to the stemmed words. Example: my $stemmed_words = Lingua::Stem::En::stem({ -words => \@words, -locale => 'en', -exceptions => \%exceptions, }); stem_caching({ -level => 0|1|2 }); Sets the level of stem caching. '0' means 'no caching'. This is the default level. '1' means 'cache per run'. This caches stemming results during a single call to 'stem'. '2' means 'cache indefinitely'. This caches stemming results until either the process exits or the 'clear_stem_cache' method is called. clear_stem_cache; Clears the cache of stemmed words NOTES This code is almost entirely derived from the Porter 2.1 module written by Jim Richardson. SEE ALSO Lingua::Stem AUTHOR Jim Richardson, University of Sydney jimr AT maths.au or http://www.maths.usyd.edu.au:8000/jimr.html Integration in Lingua::Stem by Benjamin Franz, FreeRun Technologies, snowhare AT nihongo.org or http://www.nihongo.org/snowhare/ COPYRIGHT Jim Richardson, University of Sydney Benjamin Franz, FreeRun Technolo- gies This code is freely available under the same terms as Perl. BUGS TODO perl v5.8.7 2004-07-25 Lingua::Stem::En(3pm) |