On 12/2/07, Steven Fodstad <flarelocke@hotpop.com> wrote:
Sorry for not responding earlier. The haskell-cafe list is hard to keep
up with.
The process of finding geographic (lat/long) coordinates from a text
address is called geocoding. Obviously extracting the parts of an
address is part of that, so you might find better results looking for
geocoding, rather than the more general and more difficult topic of
extracting structure from unstructured data. Unfortunately, I don't
have any references at hand on that part of geocoding.
Hi Steven,
The idea of using the geocoding approach seems appealing. I already
thought of using geocoding for address validation (after the parsing)
but not of looking at how geocoding tools parse addresses. But I'm not
sure geocoding tools would be suitable to handle my addresses. I used a
few geocoding tools and usually you have to provide the address in a
very specific format if you want it to be recognized. Also most of the
time it work quite well for US addresses but not for other countries
addresses.
I need to recognize very specific parts of an address. More than what a
geocoding tools will require. Like dock #, doors, suite #, contact
person, etc...
I'm currently using the ZipFourCE web service from BCCSoftware for
validating my addresses against the USPS address database. This tool is
built for parsing and correcting addresses but I just use it for
validation as it's not "smart enough" to parse them or maybe they are
just too scrambled for the parsing to be automated using an out of the
box tool. ;-)
Thanks for your input,
Olivier.