
An important question here is whether you want to notice when a Roman numeral is invalid, e.g., iix, or not. From a parsing point of view context-free grammars are not ideal. We have the patterns i{1,3} | iv | vi{1,3} | ix units x{1,3} | xl | lx{1,3} | xc tens c{1,3} | cd | dc{1,3} | cm hundreds m{1,3} | mↁ | ↁm{1,3} | mↂ thousands so we want to write a grammar rule like roman(N) --> group('m', 'ↁ', 'ↂ', M), group('c', 'd', 'm', C), group('x', 'l', 'c', X), group('i', 'v', 'x', I), {N is M*1000 + C*100 + X*10 + I}. group(U, F, T, N) --> ( [U,T] -> {N = 9} ; [U,F] -> {N = 4} ; [U,U,U] -> {N = 3} ; [U,U] -> {N = 2} ; [U] -> {N = 1} ; [F,U,U,U] -> {N = 8} ; [F,U,U] -> {N = 7} ; [F,U] -> {N = 6} ; [F] -> {N = 5} ). using DCG notation. This is not context-free. It is an attribute grammar with attributes passed down the parse tree (U, F, T) and passed up the parse tree (N). Parser combinators are the easy way to do this in Haskell; see 'parsec' or any number of parser combinator tutorials. For that matter, it's pretty trivial to do this particular one with plain code that pretty much mirrors the structure shown above. Just write something that takes (as many arguments as you want) then a list of characters coming in and returns a pair with a computed answer and the remaining characters. But wait! That _is_ a parser combinator! I guess parser combinators can't be that hard after all (:-). Good luck finding the five thousand character (ↁ) or the ten thousand character (ↂ) or any of the other Unicode "I'm not a letter, no, honest, I'm really not, I'm a Roman numeral" characters on your keyboard.