
I'm thinking of writing a parser to load files that my customers have created. I'm a software requirements engineer; the data consists of the customers' thoughts in response to the latest release of the requirements doc. In fact, the files will probably be copies of the requirements doc itself, into which customers have entered their notes and made changes. The original requirements doc will have a format that can be parsed; probably something simple like lines marked with codes like //customer={customer name goes here} //requirement= {requirement text goes here} When I parse the documents that come back from the customers, they are likely to contain some errors. Field names may be mangled or misspelled. Customer names may be entered in unrecognizable variants (e.g. someone named "Michael" is indicated as "Mike") and so forth. I was thinking that it might be useful to have a Google-like "do you mean this?" feature. If the field name is //customer=, then the parser might recognize a huge list of variants like //ustomer=, //customor=, etc... that is, recognize them well enough to continue parsing and give a decent error message in context. Any ideas how to go about this? I don't think I would create a parser language that includes every variant, but instead the field names would be tokens that could be passed to another routine. The variants could be generated ahead of time. I would limit the number of variants to something manageable, like 10,000 for each field name. Thanks, Mike