google-like "do you mean?" feature

16 Apr 2009

      I'm thinking of writing a parser to load files that my customers have created. I'm a 
software requirements engineer; the data consists of the customers' thoughts in 
response to the latest release of the requirements doc. In fact, the files will 
probably be copies of the requirements doc itself, into which customers have entered 
their notes and made changes. The original requirements doc will have a format that 
can be parsed; probably something simple like lines marked with codes like

//customer={customer name goes here}
//requirement=
{requirement text goes here}

When I parse the documents that come back from the customers, they are likely to 
contain some errors. Field names may be mangled or misspelled. Customer names may be 
entered in unrecognizable variants (e.g. someone named "Michael" is indicated as 
"Mike") and so forth.

I was thinking that it might be useful to have a Google-like "do you mean this?" 
feature. If the field name is //customer=, then the parser might recognize a huge 
list of variants like //ustomer=, //customor=, etc... that is, recognize them well 
enough to continue parsing and give a decent error message in context.

Any ideas how to go about this?

I don't think I would create a parser language that includes every variant, but 
instead the field names would be tokens that could be passed to another routine. The 
variants could be generated ahead of time. I would limit the number of variants to 
something manageable, like 10,000 for each field name.

Thanks,
Mike

Michael Mossey

Andy Smith

Simon Michael

Robin Green

Michael Mossey

Max Bolingbroke

tags

participants (5)