
Devin Mullins wrote:
apfelmus wrote:
Yes. "Just" an injection problem is an understatement. And its the implementation of the abstract data type that determines how fast things are. Who said that it may not simply be a newtyped String ?
I think the attraction to the SafeString example is that it's simple and effective for the task at hand -- in other words, pragmatic. Suggesting that in order to avoid HTML injection people re-read the HTML spec and invent a complete ADT to represent all the little corner cases they probably won't ever use is exactly the kind of answer that would scare Joe Six-Pack Hockey Programmer away.
/me shrugs. Can't help it that HTML is crazily complex. But what I wanted to say is not that you should create an algebraic data type that exactly represents valid HTML but rather that the basic idea underlying SafeString, namely using an abstract (≠ algebraic) data type for HTML should be a no-brainer and the first thing to do. To demonstrate, here's a 27 line HTML module. It's really simple, even simple-minded and still bug-prone, yet it already eliminates escaping and nesting bugs. module HTML (Html, Attr(..), toString, tag, text) where newtype Html = Html { toString :: String } data Attr = String := String tag :: String -> [Attr] -> [Html] -> Html text :: String -> Html tag name attrs body = Html $ "<" ++ name ++ attrs' ++ ">" ++ body' ++ "" ++ name ++ ">" where attrs' = concatMap attr attrs attr (name := value) = " " ++ name ++ "=\"" ++ escapeHtml value ++ "\"" body' = concatMap toString body text = Html . escapeHtml escapeHtml s = concatMap esc s where esc '<' = "<" esc '>' = ">" esc '&' = "&" esc '"' = """ esc '\'' = "'" esc x = [x] Example usage: *HTML> toString $ tag "b" [] [tag "i" [] [text "<>"], text "test"] "<b><i><></i>test</b>" These are just the very basics, extend them ad libitum. For instance by using Data.DList (i.e. ShowS) instead of String for O(1) concatenation. Or reading the HTML spec and doing something about newlines in attributes. Or checking the spec for the format of tag and attribute names. Or making Html a forest of nodes instead of a tree. I don't understand why there are still escaping bugs in this world. Regards, apfelmus