Re: [web-devel] xml-types IsString instance for Name causes crashes

I wrote:
I noticed when looking at the IsString instance for Name: it can introduce crashes into a program if someone accidentally puts a '{' at the beginning of a Name string.
Or accidentally omits the '}' in Clark notation. The way xml-types is now, it cannot be used in an environment where code is not allowed to introduce any additional risk of crashes. That is quite common in commercial development. John Millikin wrote:
If you are concerned that a developer may cause exceptions by writing incorrect code, it would be better to use the Name constructor directly.
The only way to ensure that the IsString will not be used would be to remove the instance entirely from xml-types. In GHC there is no way to prevent an instance from being imported when you import any part of a module. But that would be a shame - I have been using xml-types extensively, and I don't think I have used the Name constructor directly even once. It's just so much neater to use the IsString instance. If you don't like my suggestion of what fromString should do when Clark notation fails to parse, that's OK, do something else. As long as it doesn't crash. Let me put it another way. The IsString class was introduced mainly for ByteString and Text; it's a way for other string implementations to hook into Haskell's built-in special syntax for string literals. I think your Name type does qualify though - it really is just a kind of string, with special support for XML namespaces. But people would be really, really surprised if their program crashed at runtime (possibly for a customer) just because of certain characters they included in a string literal. It is very important for any implementation of the fromString method to be total. Thanks, Yitz

Well, first, there's no additional risk of crashes. The code won't
ever *crash*, it will merely throw an exception, which can be caught
by the standard exception handling computations. If your application
requires high availability, you should be catching exceptions anyway,
since they can arise from circumstances outside of the author's
control.
Second, GHC's IsString feature is not enabled by default. It requires
the OverloadedStrings language extension to be enabled, which requires
either an explicit pragma or a command-line flag. There is no risk in
merely defining the instance, because users must opt in to use it.
Third, the IsString instance is largely for convenience. It allows
something like this code:
---------------------------------------------------------------
name = case parseClark "{urn:foo}bar" of
Just (ns, ln) -> Name ln ns Nothing
Nothing -> error "string literal is incorrect"
---------------------------------------------------------------
to be shortened to simply:
---------------------------------------------------------------
name = "{urn:foo}bar"
---------------------------------------------------------------
String literals are inherently untyped. There is no way to verify at
compile-time that they are correct. IMO, it's better for the code to
fail obviously (by throwing an exception) than quietly (by accepting
the bad Name and doing something with it). Thus, I disagree that
IsString instances should be total -- IMO, they should always throw an
exception on invalid input, else it becomes difficult to notice typos
without extensive testing.
If you are truly concerned about your developers introducting errors
via incorrect string literals, I have three suggestions:
1) Use the Name constructor explicitly. That's what it's for. You can
guarantee that the contents of the various fields are exactly as you
want them.
2) If you don't care about the prefix, you can write a simple
constructor function:
---------------------------------------------------------------
name :: String -> String -> Name
name ns ln = Name (Data.Text.pack ln) (Just (Data.Text.pack ns)) Nothing
---------------------------------------------------------------
3) Since the exception is *always* thrown when the literal is invalid,
even simple tests of the new functionality will discover the problem.
You could even require the names themselves be tested, though that
might be overkill.
On Wed, Jun 1, 2011 at 00:31, Yitzchak Gale
I wrote:
I noticed when looking at the IsString instance for Name: it can introduce crashes into a program if someone accidentally puts a '{' at the beginning of a Name string.
Or accidentally omits the '}' in Clark notation.
The way xml-types is now, it cannot be used in an environment where code is not allowed to introduce any additional risk of crashes. That is quite common in commercial development.
John Millikin wrote:
If you are concerned that a developer may cause exceptions by writing incorrect code, it would be better to use the Name constructor directly.
The only way to ensure that the IsString will not be used would be to remove the instance entirely from xml-types. In GHC there is no way to prevent an instance from being imported when you import any part of a module.
But that would be a shame - I have been using xml-types extensively, and I don't think I have used the Name constructor directly even once. It's just so much neater to use the IsString instance.
If you don't like my suggestion of what fromString should do when Clark notation fails to parse, that's OK, do something else. As long as it doesn't crash.
Let me put it another way. The IsString class was introduced mainly for ByteString and Text; it's a way for other string implementations to hook into Haskell's built-in special syntax for string literals. I think your Name type does qualify though - it really is just a kind of string, with special support for XML namespaces. But people would be really, really surprised if their program crashed at runtime (possibly for a customer) just because of certain characters they included in a string literal. It is very important for any implementation of the fromString method to be total.
Thanks, Yitz

On Wed, Jun 1, 2011 at 7:04 PM, John Millikin
Well, first, there's no additional risk of crashes. The code won't ever *crash*, it will merely throw an exception, which can be caught by the standard exception handling computations. If your application requires high availability, you should be catching exceptions anyway, since they can arise from circumstances outside of the author's control.
Second, GHC's IsString feature is not enabled by default. It requires the OverloadedStrings language extension to be enabled, which requires either an explicit pragma or a command-line flag. There is no risk in merely defining the instance, because users must opt in to use it.
Third, the IsString instance is largely for convenience. It allows something like this code:
--------------------------------------------------------------- name = case parseClark "{urn:foo}bar" of Just (ns, ln) -> Name ln ns Nothing Nothing -> error "string literal is incorrect" ---------------------------------------------------------------
to be shortened to simply:
--------------------------------------------------------------- name = "{urn:foo}bar" ---------------------------------------------------------------
String literals are inherently untyped. There is no way to verify at compile-time that they are correct. IMO, it's better for the code to fail obviously (by throwing an exception) than quietly (by accepting the bad Name and doing something with it). Thus, I disagree that IsString instances should be total -- IMO, they should always throw an exception on invalid input, else it becomes difficult to notice typos without extensive testing.
If you are truly concerned about your developers introducting errors via incorrect string literals, I have three suggestions:
1) Use the Name constructor explicitly. That's what it's for. You can guarantee that the contents of the various fields are exactly as you want them.
2) If you don't care about the prefix, you can write a simple constructor function:
--------------------------------------------------------------- name :: String -> String -> Name name ns ln = Name (Data.Text.pack ln) (Just (Data.Text.pack ns)) Nothing ---------------------------------------------------------------
3) Since the exception is *always* thrown when the literal is invalid, even simple tests of the new functionality will discover the problem. You could even require the names themselves be tested, though that might be overkill.
Option 4* if it's *really* necessary: introduce a quasiquoter that checks for correctness at compile time. It would also improve performance (marginally) since the parsing would be taken care at compile time. Michael * Just because I said it doesn't mean I endorse it. I'm quite happy with IsString.

I wrote:
I noticed when looking at the IsString instance for Name: it can introduce crashes into a program if someone accidentally puts a '{' at the beginning of a Name string. Or accidentally omits the '}' in Clark notation. The way xml-types is now, it cannot be used in an environment where code is not allowed to introduce any additional risk of crashes. That is quite common in commercial development.
John Millikin wrote:
The code won't ever *crash*, it will merely throw an exception, which can be caught... Second, GHC's IsString feature is not enabled by default... Third, the IsString instance is largely for convenience...
Sorry, I guess I really wasn't making myself clear. I never raised any doubts about it being *possible*, or even easy, to write safe code using the xml-types library as it is now. In a large-scale software development environment, one way that risk is evaluated is by counting the number of ways that it is *possible* for a library to cause a crash. And yes, in this context raising an asynchronous exception that knocks your program all the way out to some last-chance exception handler in the outer IO layer counts as a crash. Since the whole idea of xml-types is for it to be a unifying standard, I'd like to see it usable in that kind of environment, too. In addition, I have already pointed out that semantically it doesn't make sense for a fromString implementation to return _|_. And it is easy to make a small change to the current implementation to avoid that. So let me turn the question around. Is there a compelling reason why, in some use case, the fromString must return _|_, rather than returning some text that will allow the application to handle the situation itself? Thanks, Yitz

I think Ghc does allow catching pure exceptions.
Am 07.06.2011 17:53 schrieb "Yitzchak Gale"
I wrote:
I noticed when looking at the IsString instance for Name: it can introduce crashes into a program if someone accidentally puts a '{' at the beginning of a Name string. Or accidentally omits the '}' in Clark notation. The way xml-types is now, it cannot be used in an environment where code is not allowed to introduce any additional risk of crashes. That is quite common in commercial development.
John Millikin wrote:
The code won't ever *crash*, it will merely throw an exception, which can be caught... Second, GHC's IsString feature is not enabled by default... Third, the IsString instance is largely for convenience...
Sorry, I guess I really wasn't making myself clear.
I never raised any doubts about it being *possible*, or even easy, to write safe code using the xml-types library as it is now.
In a large-scale software development environment, one way that risk is evaluated is by counting the number of ways that it is *possible* for a library to cause a crash. And yes, in this context raising an asynchronous exception that knocks your program all the way out to some last-chance exception handler in the outer IO layer counts as a crash.
Since the whole idea of xml-types is for it to be a unifying standard, I'd like to see it usable in that kind of environment, too.
In addition, I have already pointed out that semantically it doesn't make sense for a fromString implementation to return _|_. And it is easy to make a small change to the current implementation to avoid that.
So let me turn the question around. Is there a compelling reason why, in some use case, the fromString must return _|_, rather than returning some text that will allow the application to handle the situation itself?
Thanks, Yitz
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

On Tue, Jun 7, 2011 at 08:51, Yitzchak Gale
Since the whole idea of xml-types is for it to be a unifying standard, I'd like to see it usable in that kind of environment, too.
It is usable in such an environment -- simply do not use the IsString instance.
In addition, I have already pointed out that semantically it doesn't make sense for a fromString implementation to return _|_. And it is easy to make a small change to the current implementation to avoid that.
So let me turn the question around. Is there a compelling reason why, in some use case, the fromString must return _|_, rather than returning some text that will allow the application to handle the situation itself?
The string "foo}bar" is invalid; it must *never* be converted to a Name. Doing so would cause silent, unpredictable failures which cannot be tested for. IsString does not support reporting parse errors; therefore, any IsString instance for Name must be partial. if you're very concerned about it, I could add a flag like "NoIsString" which disables that particular instance. You could enable it in your build scripts. However, I will not implement a change which can cause arbitrary silent failure with no obvious cause.

Since the whole idea of xml-types is for it to be a unifying standard, I'd like to see it usable in that kind of environment, too.
It is usable in such an environment -- simply do not use the IsString instance.
Once you have the Name type rather than just Text - which is useful for people needing namespaces - the IsString instance is important to keep code from becoming really awkward. Perhaps the real problem here is including Clark notation in the IsString instance. Clark notation is very nice, but it doesn't really belong in the IsString instance. Clark notation could be a function, or a quasi-quoter. Perhaps the client library should be allowed to decide. But if it is included in the IsString instance, it absolutely cannot raise an asynchronous exception. That is a serious bug.
In addition, I have already pointed out that semantically it doesn't make sense for a fromString implementation to return _|_. And it is easy to make a small change to the current implementation to avoid that.
So let me turn the question around. Is there a compelling reason why, in some use case, the fromString must return _|_, rather than returning some text that will allow the application to handle the situation itself?
The string "foo}bar" is invalid; it must *never* be converted to a Name. Doing so would cause silent, unpredictable failures which cannot be tested for.
'}' `elem` name would do the trick nicely. The Name type already produces invalid XML. A client library that wants to avoid invalid names must already check for them. If a typo in Clark notation were another way to create invalid XML that wouldn't change anything. But if that really bothers you, let's discuss the many other options for introducing Clark notation other than in the IsString instance.
IsString does not support reporting parse errors;
Right, and it shouldn't. IsString is just a way of giving a different string type, like ByteString or Text, to Haskell's string literal syntax. There is no parsing to do beyond what the compiler already does. Any IsString instance should just take the contents of the string literal and incorporate it directly into the string type. There is no language I've ever heard of where you can cause a program to crash at *runtime* because of the particular characters you include in a string literal in your source code. Please don't make Haskell the first. That would be very ironic.
if you're very concerned about it, I could add a flag like "NoIsString" which disables that particular instance. You could enable it in your build scripts.
I wouldn't use that flag. I need and use the IsString instance. The instance must be defined in the module that defines the type. Thanks, Yitz

On Thu, Jun 9, 2011 at 01:27, Yitzchak Gale
Since the whole idea of xml-types is for it to be a unifying standard, I'd like to see it usable in that kind of environment, too.
It is usable in such an environment -- simply do not use the IsString instance.
Once you have the Name type rather than just Text - which is useful for people needing namespaces - the IsString instance is important to keep code from becoming really awkward.
I don't think the code is awkward without IsString. Compare: ------------------------------------------------- bar :: Name bar = "{urn:foo}bar bar :: Name bar = Name "bar" (Just "urn:foo") Nothing -- optional: utility function name :: Text -> Text -> Name name ns local = Name local (Just ns) Nothing bar :: Name bar = name "urn:foo" "bar" ------------------------------------------------- Using the Name constructor directly is more verbose, yes, but not very much. If you know that you'll always have names of a certain form (such as "has namespace, no prefix") then you can define a simple, type-safe utility function. The IsString instance is a purely optional syntactic sugar.
Perhaps the real problem here is including Clark notation in the IsString instance. Clark notation is very nice, but it doesn't really belong in the IsString instance. Clark notation could be a function, or a quasi-quoter. Perhaps the client library should be allowed to decide.
Is there an alternative syntax you'd prefer? IMO, IsString is only useful if it allows including both a namespace and local name. To do so, it has to parse the input, which can fail.
But if it is included in the IsString instance, it absolutely cannot raise an asynchronous exception. That is a serious bug.
To me, the choice is between raising an exception or removing IsString. IsString without namespaces is pointless. IsString without input checking is dangerous. If fromString cannot fail on invalid input, then it shouldn't be defined.
The Name type already produces invalid XML. A client library that wants to avoid invalid names must already check for them. If a typo in Clark notation were another way to create invalid XML that wouldn't change anything.
You're right -- it is already possible for Names to be invalid. There should probably be stricter input checking on names, to ensure they match the XML spec. Something like this: ---------------------------------------------------------------------- -- Constructor isn't exported data Name = Name Text (Maybe Text) (Maybe Text) -- exported name :: Text -> Maybe Text -> Maybe Text -> Maybe Name name = -- validates input -- exported, raises exception on failure name_ :: Text -> Maybe Text -> Maybe Text -> Name name_ local ns prefix = case name local ns prefix of Just n -> n Nothing -> error ("invalid name: error msg here") instance IsString Name where fromString = name_ . Data.Text.pack ----------------------------------------------------------------------
Right, and it shouldn't. IsString is just a way of giving a different string type, like ByteString or Text, to Haskell's string literal syntax. There is no parsing to do beyond what the compiler already does. Any IsString instance should just take the contents of the string literal and incorporate it directly into the string type.
IMO, the ByteString and Text instances for IsString are broken. ByteString should raise an exception if any (\c -> ord c > 255) Text should raise an exception if any of the characters are invalid Unicode I have had real failures in some of my programs as a result of these overly-liberal instances, which could have been caught *much* sooner if they'd simply raised an exception instead of silently returning a corrupt value.
There is no language I've ever heard of where you can cause a program to crash at *runtime* because of the particular characters you include in a string literal in your source code. Please don't make Haskell the first. That would be very ironic.
Again, this cannot cause crashes. Name's fromString throws exceptions, which can be caught. If your software needs to be reliable, it should *already* be catching and handling/logging unexpected exceptions, since they can be raised from literally any point in the code. Incorrect string literals can cause runtime crashes or exceptions in many languages; consider [[ char c = "abc"[3]; ]]
if you're very concerned about it, I could add a flag like "NoIsString" which disables that particular instance. You could enable it in your build scripts.
I wouldn't use that flag. I need and use the IsString instance. The instance must be defined in the module that defines the type.
Nobody *needs* the IsString instance. It's a trivial utility for use in small scripts and other simple programs. It saves you from typing a dozen characters in a utility module somewhere, nothing more. If you're concerned that someone might accidentally use it, and thus introduce errors into your program, I can offer the choice of disabling it. The normal way to construct names will still exist.

2011/6/9 John Millikin
On Thu, Jun 9, 2011 at 01:27, Yitzchak Gale
wrote: Since the whole idea of xml-types is for it to be a unifying standard, I'd like to see it usable in that kind of environment, too.
It is usable in such an environment -- simply do not use the IsString instance.
Once you have the Name type rather than just Text - which is useful for people needing namespaces - the IsString instance is important to keep code from becoming really awkward.
I don't think the code is awkward without IsString. Compare:
------------------------------------------------- bar :: Name bar = "{urn:foo}bar
bar :: Name bar = Name "bar" (Just "urn:foo") Nothing
-- optional: utility function
name :: Text -> Text -> Name name ns local = Name local (Just ns) Nothing
bar :: Name bar = name "urn:foo" "bar" -------------------------------------------------
Using the Name constructor directly is more verbose, yes, but not very much. If you know that you'll always have names of a certain form (such as "has namespace, no prefix") then you can define a simple, type-safe utility function. The IsString instance is a purely optional syntactic sugar.
Perhaps the real problem here is including Clark notation in the IsString instance. Clark notation is very nice, but it doesn't really belong in the IsString instance. Clark notation could be a function, or a quasi-quoter. Perhaps the client library should be allowed to decide.
Is there an alternative syntax you'd prefer?
IMO, IsString is only useful if it allows including both a namespace and local name. To do so, it has to parse the input, which can fail.
But if it is included in the IsString instance, it absolutely cannot raise an asynchronous exception. That is a serious bug.
To me, the choice is between raising an exception or removing IsString.
IsString without namespaces is pointless. IsString without input checking is dangerous. If fromString cannot fail on invalid input, then it shouldn't be defined.
The Name type already produces invalid XML. A client library that wants to avoid invalid names must already check for them. If a typo in Clark notation were another way to create invalid XML that wouldn't change anything.
You're right -- it is already possible for Names to be invalid. There should probably be stricter input checking on names, to ensure they match the XML spec. Something like this:
----------------------------------------------------------------------
-- Constructor isn't exported data Name = Name Text (Maybe Text) (Maybe Text)
-- exported name :: Text -> Maybe Text -> Maybe Text -> Maybe Name name = -- validates input
-- exported, raises exception on failure name_ :: Text -> Maybe Text -> Maybe Text -> Name name_ local ns prefix = case name local ns prefix of Just n -> n Nothing -> error ("invalid name: error msg here")
instance IsString Name where fromString = name_ . Data.Text.pack
----------------------------------------------------------------------
Right, and it shouldn't. IsString is just a way of giving a different string type, like ByteString or Text, to Haskell's string literal syntax. There is no parsing to do beyond what the compiler already does. Any IsString instance should just take the contents of the string literal and incorporate it directly into the string type.
IMO, the ByteString and Text instances for IsString are broken.
ByteString should raise an exception if any (\c -> ord c > 255)
Text should raise an exception if any of the characters are invalid Unicode
I have had real failures in some of my programs as a result of these overly-liberal instances, which could have been caught *much* sooner if they'd simply raised an exception instead of silently returning a corrupt value.
There is no language I've ever heard of where you can cause a program to crash at *runtime* because of the particular characters you include in a string literal in your source code. Please don't make Haskell the first. That would be very ironic.
Again, this cannot cause crashes. Name's fromString throws exceptions, which can be caught. If your software needs to be reliable, it should *already* be catching and handling/logging unexpected exceptions, since they can be raised from literally any point in the code.
Incorrect string literals can cause runtime crashes or exceptions in many languages; consider [[ char c = "abc"[3]; ]]
if you're very concerned about it, I could add a flag like "NoIsString" which disables that particular instance. You could enable it in your build scripts.
I wouldn't use that flag. I need and use the IsString instance. The instance must be defined in the module that defines the type.
Nobody *needs* the IsString instance. It's a trivial utility for use in small scripts and other simple programs. It saves you from typing a dozen characters in a utility module somewhere, nothing more.
If you're concerned that someone might accidentally use it, and thus introduce errors into your program, I can offer the choice of disabling it. The normal way to construct names will still exist.
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel
Note that the OverloadedStringLiterals together with the 'IsString' instances gives you more than just a nice syntactic abbreviation: the defined string literals are also lifted out of their context and defined as CAFs. Therefore, you can share the conversion from String to <your-type> during the whole run of a program. We exploit that in blaze-html to do escaping and UTF-8 encoding only once. Hence, in some contexts the IsString instances are necessary.

On Fri, Jun 10, 2011 at 07:20, Simon Meier
Note that the OverloadedStringLiterals together with the 'IsString' instances gives you more than just a nice syntactic abbreviation: the defined string literals are also lifted out of their context and defined as CAFs. Therefore, you can share the conversion from String to <your-type> during the whole run of a program. We exploit that in blaze-html to do escaping and UTF-8 encoding only once. Hence, in some contexts the IsString instances are necessary.
I'm afraid I don't understand this; could you explain the difference between these two definitions? Specifically, why would the OverloadedStrings version be more efficient? ------------------------------------------------------- foo :: Name foo = Name "foo" (Just "urn:bar") Nothing ------------------------------------------------------- ------------------------------------------------------- {-# LANGUAGE OverloadedStrings #-} foo :: Name foo = "{urn:bar}foo" -------------------------------------------------------

On Fri, Jun 10, 2011 at 9:40 PM, John Millikin
On Fri, Jun 10, 2011 at 07:20, Simon Meier
wrote: Note that the OverloadedStringLiterals together with the 'IsString' instances gives you more than just a nice syntactic abbreviation: the defined string literals are also lifted out of their context and defined as CAFs. Therefore, you can share the conversion from String to <your-type> during the whole run of a program. We exploit that in blaze-html to do escaping and UTF-8 encoding only once. Hence, in some contexts the IsString instances are necessary.
I'm afraid I don't understand this; could you explain the difference between these two definitions? Specifically, why would the OverloadedStrings version be more efficient?
------------------------------------------------------- foo :: Name foo = Name "foo" (Just "urn:bar") Nothing -------------------------------------------------------
------------------------------------------------------- {-# LANGUAGE OverloadedStrings #-} foo :: Name foo = "{urn:bar}foo" -------------------------------------------------------
Warning: IANAS[1]. I think these two *would* be identical, the difference would be if you don't declare foo at the top level like this, but instead use it inline twice. In such a case, the OverloadedStrings version would be CAFed, while the direct Name constructor would like the "foo" and "urn:bar" Text values be CAFed but not the Name value itself. Michael [1] I am not a Simon.
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

John Millikin wrote:
To me, the choice is between raising an exception or removing IsString.
That would be a shame, but removing it may be the only way out of this conundrum.
IsString without namespaces is pointless.
I am making good use of it in a project that doesn't involve namespaces at all. It would actually be a lot of work to back out at this point.
IsString without input checking is dangerous. If fromString cannot fail on invalid input, then it shouldn't be defined.
I appreciate your concerns, but Haskell has other means of providing such guarantees. Raising an asynchronous exception is just not an option in an IsString instance.
The Name type already produces invalid XML.
You're right -- it is already possible for Names to be invalid. There should probably be stricter input checking on names, to ensure they match the XML spec. Something like this...
Yes, as I mentioned earlier, newtype wrappers with hidden constructors is the way we would do that if we wanted to guarantee those kinds of things at the type level. You could then provide several constructor functions that either do or do not raise exceptions. See, for example, Data.Text.Encoding, Neil Mitchell's Safe library, Michael's xml-enumerator. But you certainly could not use the version that raises an exception for an IsString instance. In fact, I don't think an IsString instance makes sense at all for a validating type. So maybe just removing it really is the right thing to do after all. Thanks, Yitz

I don't think there is any kind of consensus for removing it.
Am 12.06.2011 13:21 schrieb "Yitzchak Gale"
John Millikin wrote:
To me, the choice is between raising an exception or removing IsString.
That would be a shame, but removing it may be the only way out of this conundrum.
IsString without namespaces is pointless.
I am making good use of it in a project that doesn't involve namespaces at all. It would actually be a lot of work to back out at this point.
IsString without input checking is dangerous. If fromString cannot fail on invalid input, then it shouldn't be defined.
I appreciate your concerns, but Haskell has other means of providing such guarantees. Raising an asynchronous exception is just not an option in an IsString instance.
The Name type already produces invalid XML.
You're right -- it is already possible for Names to be invalid. There should probably be stricter input checking on names, to ensure they match the XML spec. Something like this...
Yes, as I mentioned earlier, newtype wrappers with hidden constructors is the way we would do that if we wanted to guarantee those kinds of things at the type level. You could then provide several constructor functions that either do or do not raise exceptions. See, for example, Data.Text.Encoding, Neil Mitchell's Safe library, Michael's xml-enumerator.
But you certainly could not use the version that raises an exception for an IsString instance.
In fact, I don't think an IsString instance makes sense at all for a validating type. So maybe just removing it really is the right thing to do after all.
Thanks, Yitz
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

On Sun, Jun 12, 2011 at 2:27 PM, Aristid Breitkreuz
I don't think there is any kind of consensus for removing it.
I agree. If I can suddenly join the fray, let's take a step back for a second and reanalyze the issue here. We have this incredibly useful IsString instance for Name, which in all honesty is a lie: the type of fromString is "String -> Name", when not all Strings can be properly converted into XML names. There are two *separate* reasons for this: 1) Not all characters can be used in a name. Simple example: a less-than sign (<) is not allowed. For full information, see [1] and [2]. 2) xml-types implements Clark notation, which allows a very convenient way to define namespaces. (This is a feature I use a lot in my own code.) But missing a right brace invalidates the Clark notation. Ideally, the compiler would catch invalid Names and error out. Unfortunately, due to the way OverloadedStrings works, this isn't possible currently. (Though I think such a solution would be ideal, and is something we should consider separately.) I think we have three possible responses to the dilemma: a) Ignore invalid Names, and simply allow invalid XML to be generated. b) Throw an asynchronous exception. c) Realize that the IsString instance is not correct, and therefore remove it. Currently, xml-types follows option (a) for (1) above, and (b) for (2) above. I personally don't think either option is obviously better or worse, but I do in general prefer consistency. And I think that writing the validation rules for an XML name is outside the scope of xml-types, so I prefer option (a)... but not by any great margin. The one thing I'd hate to see happen is option (c). It's true that the instance of IsString is not really "correct," but the same argument could be made for ByteStrings as well, where characters over 255 get truncated (I believe). The fact is that invalid input here should be *incredibly* rare. I suppose a fourth option would be to force the String into a valid name, either through some escape mechanism or removing characters. But again, I personally think it's outside the scope of xml-types. Michael PS: Quasi-quoting is actual a great fit here as well, but it's just not nearly as convenient as OverloadedStrings. [1] http://www.w3.org/TR/xml/#NT-NameStartChar [2] http://www.w3.org/TR/xml/#NT-NameChar
Am 12.06.2011 13:21 schrieb "Yitzchak Gale"
: John Millikin wrote:
To me, the choice is between raising an exception or removing IsString.
That would be a shame, but removing it may be the only way out of this conundrum.
IsString without namespaces is pointless.
I am making good use of it in a project that doesn't involve namespaces at all. It would actually be a lot of work to back out at this point.
IsString without input checking is dangerous. If fromString cannot fail on invalid input, then it shouldn't be defined.
I appreciate your concerns, but Haskell has other means of providing such guarantees. Raising an asynchronous exception is just not an option in an IsString instance.
The Name type already produces invalid XML.
You're right -- it is already possible for Names to be invalid. There should probably be stricter input checking on names, to ensure they match the XML spec. Something like this...
Yes, as I mentioned earlier, newtype wrappers with hidden constructors is the way we would do that if we wanted to guarantee those kinds of things at the type level. You could then provide several constructor functions that either do or do not raise exceptions. See, for example, Data.Text.Encoding, Neil Mitchell's Safe library, Michael's xml-enumerator.
But you certainly could not use the version that raises an exception for an IsString instance.
In fact, I don't think an IsString instance makes sense at all for a validating type. So maybe just removing it really is the right thing to do after all.
Thanks, Yitz
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel

I agree with Michael here. My 2 cents:
I think the OverloadedStrings extension, for readability as well as
for improved performance (as Simon Meier said). Hence, I am certainly
opposed to removing it.
As for the ByteString and Text instances, I think their benefits
outweigh the fact that they break on some inputs. This can be compared
to the `head` function, which is certainly useful as well -- even
though it is a partial function. The case in which a developer puts
invalid unicode in his source file seems very unlikely. The ByteString
case is more dangerous, but then again, not less dangerous than using
Data.ByteString.Char8.pack!
The OverloadedStrings extension seems to be a very good fit for types
that can be converted from "almost all" strings. Text is obvious, but
also, for example, the Html type in blaze-html, and the YamlScalar
type in data-object-yaml.
In cases where the conversion from strings is less straightforward, or
a more complicated "syntax" is involved, I would argue to switch the
relevant code to QuasiQuoting -- but only when OverloadedStrings
causes too many runtime errors/invalid markup for developers.
Cheers,
Jasper
On Sun, Jun 12, 2011 at 2:23 PM, Michael Snoyman
On Sun, Jun 12, 2011 at 2:27 PM, Aristid Breitkreuz
wrote: I don't think there is any kind of consensus for removing it.
I agree. If I can suddenly join the fray, let's take a step back for a second and reanalyze the issue here. We have this incredibly useful IsString instance for Name, which in all honesty is a lie: the type of fromString is "String -> Name", when not all Strings can be properly converted into XML names. There are two *separate* reasons for this:
1) Not all characters can be used in a name. Simple example: a less-than sign (<) is not allowed. For full information, see [1] and [2]. 2) xml-types implements Clark notation, which allows a very convenient way to define namespaces. (This is a feature I use a lot in my own code.) But missing a right brace invalidates the Clark notation.
Ideally, the compiler would catch invalid Names and error out. Unfortunately, due to the way OverloadedStrings works, this isn't possible currently. (Though I think such a solution would be ideal, and is something we should consider separately.) I think we have three possible responses to the dilemma:
a) Ignore invalid Names, and simply allow invalid XML to be generated. b) Throw an asynchronous exception. c) Realize that the IsString instance is not correct, and therefore remove it.
Currently, xml-types follows option (a) for (1) above, and (b) for (2) above. I personally don't think either option is obviously better or worse, but I do in general prefer consistency. And I think that writing the validation rules for an XML name is outside the scope of xml-types, so I prefer option (a)... but not by any great margin.
The one thing I'd hate to see happen is option (c). It's true that the instance of IsString is not really "correct," but the same argument could be made for ByteStrings as well, where characters over 255 get truncated (I believe). The fact is that invalid input here should be *incredibly* rare.
I suppose a fourth option would be to force the String into a valid name, either through some escape mechanism or removing characters. But again, I personally think it's outside the scope of xml-types.
Michael
PS: Quasi-quoting is actual a great fit here as well, but it's just not nearly as convenient as OverloadedStrings.
[1] http://www.w3.org/TR/xml/#NT-NameStartChar [2] http://www.w3.org/TR/xml/#NT-NameChar
Am 12.06.2011 13:21 schrieb "Yitzchak Gale"
: John Millikin wrote:
To me, the choice is between raising an exception or removing IsString.
That would be a shame, but removing it may be the only way out of this conundrum.
IsString without namespaces is pointless.
I am making good use of it in a project that doesn't involve namespaces at all. It would actually be a lot of work to back out at this point.
IsString without input checking is dangerous. If fromString cannot fail on invalid input, then it shouldn't be defined.
I appreciate your concerns, but Haskell has other means of providing such guarantees. Raising an asynchronous exception is just not an option in an IsString instance.
The Name type already produces invalid XML.
You're right -- it is already possible for Names to be invalid. There should probably be stricter input checking on names, to ensure they match the XML spec. Something like this...
Yes, as I mentioned earlier, newtype wrappers with hidden constructors is the way we would do that if we wanted to guarantee those kinds of things at the type level. You could then provide several constructor functions that either do or do not raise exceptions. See, for example, Data.Text.Encoding, Neil Mitchell's Safe library, Michael's xml-enumerator.
But you certainly could not use the version that raises an exception for an IsString instance.
In fact, I don't think an IsString instance makes sense at all for a validating type. So maybe just removing it really is the right thing to do after all.
Thanks, Yitz
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel
_______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel
participants (6)
-
Aristid Breitkreuz
-
Jasper Van der Jeugt
-
John Millikin
-
Michael Snoyman
-
Simon Meier
-
Yitzchak Gale