
On 12/2/07, Steven Fodstad
Sorry for not responding earlier. The haskell-cafe list is hard to keep up with.
The process of finding geographic (lat/long) coordinates from a text address is called geocoding. Obviously extracting the parts of an address is part of that, so you might find better results looking for geocoding, rather than the more general and more difficult topic of extracting structure from unstructured data. Unfortunately, I don't have any references at hand on that part of geocoding.
Hi Steven, The idea of using the geocoding approach seems appealing. I already thought of using geocoding for address validation (after the parsing) but not of looking at how geocoding tools parse addresses. But I'm not sure geocoding tools would be suitable to handle my addresses. I used a few geocoding tools and usually you have to provide the address in a very specific format if you want it to be recognized. Also most of the time it work quite well for US addresses but not for other countries addresses. I need to recognize very specific parts of an address. More than what a geocoding tools will require. Like dock #, doors, suite #, contact person, etc... I'm currently using the ZipFourCE web service from BCCSoftware for validating my addresses against the USPS address database. This tool is built for parsing and correcting addresses but I just use it for validation as it's not "smart enough" to parse them or maybe they are just too scrambled for the parsing to be automated using an out of the box tool. ;-) Thanks for your input, Olivier.
participants (1)
-
Olivier Boudry