Comrite Unix Man page/Perldoc/Info page, English-Chinese Dictionary, Chinese-English Dictionary

Geo::Coder::US--3pm

Command: man perldoc info search(apropos)  


 
US(3pm)               User Contributed Perl Documentation              US(3pm)



NAME
       Geo::Coder::US - Geocode (estimate latitude and longitude for) any US
       address

SYNOPSIS
         use Geo::Coder::US;

         Geo::Coder::US->set_db( "geocoder.db" );

         my @matches = Geo::Coder::US->geocode(
                           "1600 Pennsylvania Ave., Washington, DC" );

         my @matches = Geo::Coder::US->geocode(
                           "42nd & Broadway New York NY" )

         my ($ora) = Geo::Coder::US->geocode(
                           "1005 Gravenstein Hwy N, 95472" );

         print "O'Reilly is located at $ora->{lat} degrees north, "
                                      "$ora->{long} degrees east.\n";

DESCRIPTION
       Geo::Coder::US provides a complete facility for geocoding US addresses,
       that is, estimating the latitude and longitude of any street address or
       intersection in the United States, using the TIGER/Line data set from
       the US Census Bureau.  Geo::Coder::US uses Geo::TigerLine to parse this
       data, and DB_File to store a highly compressed distillation of it, and
       Geo::StreetAddress::US to parse addresses into normalized components
       suitable for looking up in its database.

       You can find a live demo of this code at <http://geocoder.us/>;. The
       demo.cgi script is included in eg/ directory distributed with this mod-
       ule, along with a whole bunch of other goodies. See
       Geo::Coder::US::Import for how to build your own Geo::Coder::US
       database.

       Consider using a web service to access this geocoder over the Internet,
       rather than going to all the trouble of building a database yourself.
       See eg/soap-client.pl, eg/xmlrpc-client.pl, and eg/rest-client.pl for
       different examples of working clients for the rpc.geocoder.us geocoder
       web service.

METHODS
       In general, the only methods you are likely to need to call on
       Geo::Coder::US are set_db() and geocode(). The following documentation
       is included for completeness's sake, and for the benefit of developers
       interested in using bits of the module's internals.

       Note: Calling conventions for address and intersection specifiers are
       discussed in the following section on CALLING CONVENTIONS.

       Geo::Coder::US->geocode( $string )
           Given a string containing a street address or intersection, return
           a list of specifiers including latitude and longitude for all
           matching entities in the database. To keep from churning over the
           entire database, the given address string must contain either a
           city and state, or a ZIP code (or both), or geocode() will return
           undef.

           geocode() will attempt to normalize directional prefixes and suf-
           fixes, street types, and state abbreviations, as well as substitute
           TIGER/Line's idea of the "primary street name", if an alternate
           street name was provided instead.

           If geocode() can parse the address, but not find a match in the
           database, it will return a hashref containing the parsed and nor-
           malized address or intersection, but without the "lat" and "long"
           keys specifying the location. If geocode() cannot even parse the
           address, it will return undef. Be sure to check for the existence
           of "lat" and "long" keys in the hashes returned from geocode()
           before attempting to use the values! This serves to distinguish
           between addresses that cannot be found versus addresses that are
           completely unparseable.

           geocode() attempts to be as forgiving as possible when geocoding an
           address.  If you say "Mission Ave" and all it knows about is "Mis-
           sion St", then "Mission St" is what you'll get back. If you leave
           off directional identifiers, geocode() will return address geocoded
           in all the variants it can find, i.e. both "N Main St" and "S Main
           St".

           Don't be surprised if geocoding an intersection returns more than
           one lat/long pair for a single intersection. If one of the streets
           curves greatly or doglegs even slightly, this will be the likely
           outcome.

           geocode() is probably the method you want to use. See more in the
           following section on the structure of the returned address and
           intersection specifiers.

       Geo::Coder::US->geocode_address( $string )
           Works exactly like geocode(), but only parses addresses.

       Geo::Coder::US->geocode_intersection( $string )
           Works exactly like geocode(), but only parses intersections.

       Geo::Coder::US->filter_ranges( $spec, @candidates )
           Filters a list of address specifiers (presumably from the database)
           against a query specifier, filtering by prefix, type, suffix, or
           primary name if possible. Returns a list of matching specifiers.
           filter_ranges() will ignore a filtering step if it would result in
           no specifiers being returned. You probably won't need to use this.

       Geo::Coder::US->find_ranges( $address_spec )
           Given a normalized address specifier, return all the address ranges
           in the database that appear to cover that address. find_ranges()
           ignores prefix, suffix, and type fields in the specifier for search
           purposes, and then filters against them ex post facto. The inten-
           tion for find_ranges() to find the closest match possible in pref-
           erence to returning nothing. You probably want to use
           lookup_ranges() instead, which will call find_ranges() for you.

       Geo::Coder::US->lookup_ranges( $address_spec, @ranges )
           Given an address specifier and (optionally) some address ranges
           from the database, interpolate the street address into the street
           segment referred to by the address range, and return a latitude and
           longitude for the given address within each of the given ranges. If
           @ranges is not given, lookup_ranges() calls find_ranges() with the
           given address specifier, and uses those returned. You probably want
           to just use geocode() instead, which also parses an address string
           and determines whether it's a proper address or an intersection
           automatically.

       Geo::Coder::US->find_segments( $intersection_spec )
           Given a normalized intersection specifier, find all of the street
           segments in the database matching the two given streets in the
           given locale or ZIP code.  find_segments() ignores prefix, suffix,
           and type fields in the specifier for search purposes, and then
           filters against them ex post facto. The intention for find_seg-
           ments() to find the closest match possible in preference to return-
           ing nothing. You probably want to use lookup_intersection()
           instead, which will call find_segments() for you.

       Geo::Coder::US->lookup_intersection( $intersection_spec )
           Given an intersection specifier, return all of the intersections in
           the database between the two streets specified, plus a latitude and
           longitude for each intersection. You probably want to just use
           geocode() instead, which also parses an address string and deter-
           mines whether it's a proper address or an intersection automati-
           cally.

CALLING CONVENTIONS
       Most Geo::Coder::US methods take a reference to a hash containing
       address or intersection information as one of their arguments. This
       "address specifier" hash may contain any of the following fields for a
       given address:

       ADDRESS SPECIFIER


       number
           House or street number.

       prefix
           Directional prefix for the street, such as N, NE, E, etc.  A given
           prefix should be one to two characters long.

       street
           Name of the street, without directional or type qualifiers.

       type
           Abbreviated street type, e.g. Rd, St, Ave, etc. See the USPS offi-
           cial type abbreviations at
           <http://www.usps.com/ncsc/lookups/abbr_suffix.txt>; for a list of
           abbreviations used.

       suffix
           Directional suffix for the street, as above.

       city
           Name of the city, town, or other locale that the address is situ-
           ated in.

       state
           The state which the address is situated in, given as its two-letter
           postal abbreviation. See
           <http://www.usps.com/ncsc/lookups/abbr_state.txt>; for a list of
           abbreviations used.

       zip Five digit ZIP postal code for the address, including leading zero,
           if needed.

       lat The latitude of the address, as returned by geocode() et al. If you
           provide this to as part of an argument to a Geo::Coder::US method,
           it will be ignored.

       long
           The longitude of the address, as returned by geocode() et al. If
           you provide this to as part of an argument to a Geo::Coder::US
           method, it will be ignored.

       INTERSECTION SPECIFIER


       prefix1, prefix2
           Directional prefixes for the streets in question.

       street1, street2
           Names of the streets in question.

       type1, type2
           Street types for the streets in question.

       suffix1, suffix2
           Directional suffixes for the streets in question.

       city
           City or locale containing the intersection, as above.

       state
           State abbreviation, as above.

       zip Five digit ZIP code, as above.

       lat, long
           A single latitude and longitude for the intersection, as specified
           above. If you provide these values as part of an argument to a
           Geo::Coder::US method, they will be ignored.

BUGS, CAVEATS, MISCELLANY
       The TIGER/Line data is notoriously buggy and inaccurate, but it seems
       to work reasonably well for urban areas. Geo::Coder::US uses interpola-
       tion to estimate the position of a particular address within a block,
       which means that it will necessarily be slightly inaccurate. Hey, it's
       only 14 meters off for my house, which is better than the 300 meter
       error given by another prominent geocoder, and definitely close enough
       for navigation.

       In rural areas, TIGER/Line doesn't give names for lots of putative
       roads, even if the roads have names. Maybe the sign blew down the day
       before the Census agents got there, assuming there was ever a sign.
       What can you do? Similarly, lots of rural areas have official county
       subdivision names that an ordinary user would never think to give.
       Probably the right thing to do is map in names from a ZIP code
       database, but that data's not in TIGER/Line. What can you do? In gen-
       eral, you should expect the geocoder to be a lot more accurate in urban
       versus rural areas.

       There may be many kinds of US street addresses which Geo::Coder::US
       can't parse. In particular, Geo::Coder::US strips out letters and
       dashes from house numbers, which may cause ambiguous results in certain
       parts of the country (particularly rural Michigan and Illinois, I
       think). Mea culpa. Send patches.

       The full TIGER/Line data set is one heck of a lot of data -- about four
       gigabytes compressed, and over 24 gigs uncompressed. The BerkeleyDB
       database covering the whole US runs to 750+ megabytes uncompressed, or
       about 305 megs compressed. Unfortunately, I am not at present able to
       offer copies for download.

       It would be nice to see a version of this for other countries, e.g.
       Geo::Coder::CA, Geo::Coder::DK, Geo::Coder::UK, with the same methods.
       Contact your local legislator about why the public geographic data for
       your country isn't freely available like it is in the US. If street
       address data is freely available for your country of choice, what are
       you waiting for?

TODO
       Reverse geocoding methods, to retrieve the nearest street address from
       a given lat/long. This would probably necessitate using a R-tree or
       some other spatial indexing algorithm.

       A metaphone index, for doing fuzzy matching on misspelled street and
       place names.

SEE ALSO
       DB_File(3pm), Geo::TigerLine(3pm), Geo::Coder::US::Import(3pm)

       TIGER/Line is a registered trademark of the US Census Bureau. Find out
       more, and get the latest TIGER/Line files from <http://www.cen-
       sus.gov/geo/www/tiger/>. Actually, the best place to download the data
       from is <http://www2.census.gov/geo/tiger/tiger2004fe/>;.

       You can find a live demo of this code at <http://geocoder.us/>;. Our
       consultancy, Locative Technologies, offers service and support for this
       software on a contractual basis.

AUTHORS
       Schuyler Erle <schuyler AT nocat.net>

       Jo Walsh <jo AT frot.org>

       Geo::Coder::US incorporates a patch submitted by John P. Linderman.
       Submit a useful patch and get your name added here, too!

COPYRIGHT AND LICENSE
       Copyright (C) 2004 by Schuyler Erle and Jo Walsh

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself, either Perl version 5.8.3 or, at
       your option, any later version of Perl 5 you may have available.



perl v5.8.7                       2005-05-15                           US(3pm)
 

©2005 Comrite