Geocoding 40 millions US addresses totally free with PostGIS 2.1 on debian 8

Postgis 2.1 ( on debian 8) make geocoding US addresses (  about 40 Million ) quite easy, and it is totally free! Thanks for the PostGIS and PostGRESQL open source communities.

To import all the US tiger data from ftp://ftp2.census.gov/geo/tiger/, you need at least 110 G disk partition for postgresql, I would recommend 200G to be safe. And  another  40G disk space to download tiger data ( postgis2.1 will generate a script to use /gisdata to store all the downloaded data ).

Now step by step:

(1)  postgis and postgresql

apt-get install postgresql-9.4-postgis-2.1 postgresql-9.4-postgis-2.1-scripts postgresql-9.4-postgis-scripts

(2) createdb comrite_geocoder
psql -d comrite_geocoder -c “CREATE EXTENSION postgis;”

psql -d comrite_geocoder -c “CREATE EXTENSION postgis_topology;”
psql -d comrite_geocoder -c “CREATE EXTENSION fuzzystrmatch;”

psql -d comrite_geocoder -c “CREATE EXTENSION postgis_tiger_geocoder;”

now you should able to test if the PostGIS installation is correct or not .

SELECT ‘foss4g2013′ As event, n.*
FROM normalize_address(’30 South 7th Street, Minneapolis, MN 55402′) As n;

(3)  now import nation data:

psql comrite_geocoder -c “SELECT loader_generate_nation_script(‘sh’); ” -A -o nation.sh

you may need to adjust nation.sh something like and change as following if you run as postgres user:

HOST=””

PGDATABASE=comrite_geocoder and

PGBIN=/usr/bin,

PGPASSWORD=””

(4) the next step is to load all US states data:

psql comrite_geocoder -c “SELECT loader_generate_script(ARRAY[‘AL’, ‘AK’,’AZ’,’AR’], ‘sh’);” -A -o A.sh

the above A.sh will load AL, AK, AZ, AR data, again you have to adjust those settings.

you can write a script to install all US states. I have written a script to adjust to my needs. it take about 24 hours to import all those data.

(5) testing

Now you should be able to test geocode as:

SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat,
(addy).address As stno, (addy).streetname As street,
(addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip
FROM geocode(‘300 WATER STREET , MONTGOMERY, ALABAMA 36104′) As g;

It should give the result like this:

rating | lon | lat | stno | street | styp | city | st | zip
——–+——————+——————+——+——–+——+————+—-+——-
0 | -86.313876308318 | 32.3804685457777 | 300 | Water | St | Montgomery | AL | 36104
(1 row)

Notes: to make the import painless, I would recommend you download tiger data first on your local network.

 

Please rate this


Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>