
Hi, This might be a silly question for you guys, but is there a reason why yhc still uses PackedString in places instead of ByteString? Cheers, Creighton Hogg

Hi Creighton,
This might be a silly question for you guys, but is there a reason why yhc still uses PackedString in places instead of ByteString?
Two reasons: 1) ByteStrings are totally unreliable - they can't be made to work in GHC 6.4 and GHC 6.6 without features of Cabal that aren't even written yet. We did move to Data.Binary, which requires ByteStrings, but had to change back because it was a nightmare. If the benefits were compelling we could move, but it wasn't worth it just for binary serialisation. 2) When the code was written, there weren't ByteStrings, and no one has had the time to port to them. If someone ported Yhc to ByteStrings, we'd definitely accept a patch! Thanks Neil

On 6/14/07, Neil Mitchell
Hi Creighton,
This might be a silly question for you guys, but is there a reason why yhc still uses PackedString in places instead of ByteString?
Two reasons:
1) ByteStrings are totally unreliable - they can't be made to work in GHC 6.4 and GHC 6.6 without features of Cabal that aren't even written yet. We did move to Data.Binary, which requires ByteStrings, but had to change back because it was a nightmare. If the benefits were compelling we could move, but it wasn't worth it just for binary serialisation.
2) When the code was written, there weren't ByteStrings, and no one has had the time to port to them.
If someone ported Yhc to ByteStrings, we'd definitely accept a patch!
Well, if you don't mind I'd like to do it. I'm trying to understand your sourcecode anyway, so I might as well do something useful while I'm fussing with it. I guess I didn't really know about these reliability issues with ByteString though, do you have some anecdotes or references about it? Cheers

Hi
Well, if you don't mind I'd like to do it. I'm trying to understand your sourcecode anyway, so I might as well do something useful while I'm fussing with it. I guess I didn't really know about these reliability issues with ByteString though, do you have some anecdotes or references about it?
The problem isn't that the ByteString functions don't work - they do, and very reliably. The reliability thing is that on GHC 6.4 you have to include the fps library, in GHC 6.6 you have to include base (including fps breaks things), and in GHC 6.8 it may change again. The necessary features in Cabal that allow this are still being written. These are purely build system changes, so if you do the work of porting to ByteString, we'll figure out the build issues after. The one question you are bound to ask shortly is why we use ReversedPackedStrings, rather than PackedStrings in some cases - and the answer to that is there is no good reason. In the work if you fix all the cases where we have Reversed strings to normal strings that would be great! Thanks Neil

Neil Mitchell wrote:
The one question you are bound to ask shortly is why we use ReversedPackedStrings, rather than PackedStrings in some cases - and the answer to that is there is no good reason. In the work if you fix all the cases where we have Reversed strings to normal strings that would be great!
Though I can't recommend trying! Packed strings appear all over the compiler, sometimes reversed, sometimes not, intermixed with code that either assumes 'reversed-ness' or doesn't. It would be seriously hard work to change :-) Cheers Tom

On 6/19/07, Thomas Shackell
Neil Mitchell wrote:
The one question you are bound to ask shortly is why we use ReversedPackedStrings, rather than PackedStrings in some cases - and the answer to that is there is no good reason. In the work if you fix all the cases where we have Reversed strings to normal strings that would be great!
Though I can't recommend trying! Packed strings appear all over the compiler, sometimes reversed, sometimes not, intermixed with code that either assumes 'reversed-ness' or doesn't. It would be seriously hard work to change :-)
Well I think I've run into this problem...maybe. All but one of the tests pass, the one that doesn't is the timeLargeArray test, and it fails with this error yhc: superclassesI InfoUsed (Id 68) [(Type class,Ord,"Data.Ix ",510:8-510:10)] Now my main question is how some PackedStrings can be reversed and others not when it looks like everything that uses PackedString imports it from SysDeps? The only exception I could find was compiler98/DotNet/Show.hs.

Creighton Hogg wrote:
Well I think I've run into this problem...maybe. All but one of the tests pass, the one that doesn't is the timeLargeArray test, and it fails with this error yhc: superclassesI InfoUsed (Id 68) [(Type class,Ord,"Data.Ix",510:8-510:10)] Now my main question is how some PackedStrings can be reversed and others not when it looks like everything that uses PackedString imports it from SysDeps? The only exception I could find was compiler98/DotNet/Show.hs.
Hmmm that's interesting, did you change the compiler source code at all? You usually get those kind of errors because the compiler has ended up with a data structure that it wasn't expecting. Indeed all PackedStrings are imported from SysDeps, the problem is what happens after that. Sometimes they are used as string' = packString string and other times string' = packString (reverse string) There doesn't seem to be a lot of logic to the choices ... Cheers Tom

On 6/19/07, Thomas Shackell
Creighton Hogg wrote:
Well I think I've run into this problem...maybe. All but one of the tests pass, the one that doesn't is the timeLargeArray test, and it fails with this error yhc: superclassesI InfoUsed (Id 68) [(Type class,Ord,"Data.Ix",510:8-510:10)] Now my main question is how some PackedStrings can be reversed and others not when it looks like everything that uses PackedString imports it from SysDeps? The only exception I could find was compiler98/DotNet/Show.hs.
Hmmm that's interesting, did you change the compiler source code at all? You usually get those kind of errors because the compiler has ended up with a data structure that it wasn't expecting.
Indeed all PackedStrings are imported from SysDeps, the problem is what happens after that. Sometimes they are used as
string' = packString string
and other times
string' = packString (reverse string)
There doesn't seem to be a lot of logic to the choices ...
Yeah in line with Neal's suggestion, I had added a FakeString module that was imported by SysDeps as PackedString and just provided unpackPS = id and packString = id I had done something silly where that first one didn't build properly, but now that that's fixed I'm getting errors I don't really understand on 5 of the tests. I'll probably look at them tonight & ask questions about how chunks of the code work. I'm sorry that I had misunderstood the reversing thing. I had thought you meant that there were two different versions of PackedString, one of which required that trick and the other didn't, analogous to the four versions of ByteString. I guess I'm really confused then why you'd be reversing the strings in the first place. I'm appreciative of the help, Creighton

Creighton Hogg wrote:
Yeah in line with Neal's suggestion, I had added a FakeString module that was imported by SysDeps as PackedString and just provided unpackPS = id and packString = id I had done something silly where that first one didn't build properly, but now that that's fixed I'm getting errors I don't really understand on 5 of the tests. I'll probably look at them tonight & ask questions about how chunks of the code work.
hmm I would have thought that String and PackedString should behave identical, apart from the 'packedness'. Still there could be some subtle semantic difference ...
I guess I'm really confused then why you'd be reversing the strings in the first place.
Yeah. Yhc was developed from nhc98, which for some strange reason liked to reverse strings sometimes. We've never really discovered why and have avoided trying to fix it, so it's just sort of stayed that way. But be assured, it's just as confusing to us as it is to you ;-) Cheers Tom

On 6/19/07, Thomas Shackell
Creighton Hogg wrote:
Yeah in line with Neal's suggestion, I had added a FakeString module that was imported by SysDeps as PackedString and just provided unpackPS = id and packString = id I had done something silly where that first one didn't build properly, but now that that's fixed I'm getting errors I don't really understand on 5 of the tests. I'll probably look at them tonight & ask questions about how chunks of the code work.
hmm I would have thought that String and PackedString should behave identical, apart from the 'packedness'. Still there could be some subtle semantic difference ...
Yeah, a several of the five failures were caused by running out of memory. Now, I thought String vs. PackedString shouldn't be that much of a difference anymore? Are there strictness differences between the two that could be causing the String version to build up huge thunks? Maybe I'll be a wench and change it to ByteString.Char8 and see what happens.

Hi
Yeah in line with Neal's suggestion, I had added a FakeString module that was imported by SysDeps as PackedString and just provided unpackPS = id and packString = id I had done something silly where that first one didn't build properly, but now that that's fixed I'm getting errors I don't really understand on 5 of the tests. I'll probably look at them tonight & ask questions about how chunks of the code work.
hmm I would have thought that String and PackedString should behave identical, apart from the 'packedness'. Still there could be some subtle semantic difference ...
Yeah, a several of the five failures were caused by running out of memory. Now, I thought String vs. PackedString shouldn't be that much of a difference anymore? Are there strictness differences between the two that could be causing the String version to build up huge thunks? Maybe I'll be a wench and change it to ByteString.Char8 and see what happens.
Perhaps: packString x = sum (map ord x) `seq` x Will give you what you need. That function should have the same strictness as the original packString. Thanks Neil

On 6/19/07, Thomas Shackell
Neil Mitchell wrote:
Perhaps:
packString x = sum (map ord x) `seq` x
Will give you what you need. That function should have the same strictness as the original packString.
Or alternatively the slightly less evil
packString x = foldr (\ y ys -> y `seq` y : ys) [] x
I tried this latter one & I tried ByteString.Char8 and both of them failed the same ways that the basic String did. Apparently PackedString is a strange duck.

ndmitchell:
Hi Creighton,
This might be a silly question for you guys, but is there a reason why yhc still uses PackedString in places instead of ByteString?
Two reasons:
1) ByteStrings are totally unreliable - they can't be made to work in GHC 6.4 and GHC 6.6 without features of Cabal that aren't even written
I think Neil means here that you can't write a .cabal file, for code that uses Data.ByteString, that will work unchanged with ghc 6.4 and ghc 6.6, since the module is in a different package in each case. So, I'd probably rephrase that as "Due to cabal limitations ..." -- Don

Hi Creighton,
I think Neil means here that you can't write a .cabal file, for code that uses Data.ByteString, that will work unchanged with ghc 6.4 and ghc 6.6, since the module is in a different package in each case.
Neil means he's been bitten by ByteString and Cabal many many times... :-) After discussing with Malcolm, we think a cleaner (and potentially faster) way to go would be move to plain old String. Most of the strings in Yhc are short - something ByteString isn't particularly optimised for. String is also faster than PackedString, the only reason for PackedString was memory requirements, but that stopped being an issue years ago. So in essence, a port from PackedString to String is probably the best way to go! The way I'd go about this is: 1) Write a String implementation of the interface we use for PackedString 2) Start inlining the implementation manually, function by function That way you should have relatively little work to move to String, and should be able to then patch by patch move towards a proper use of String, without doing one big change. Thanks Neil

On 6/15/07, Neil Mitchell
Hi Creighton,
I think Neil means here that you can't write a .cabal file, for code that uses Data.ByteString, that will work unchanged with ghc 6.4 and ghc 6.6, since the module is in a different package in each case.
Neil means he's been bitten by ByteString and Cabal many many times... :-)
After discussing with Malcolm, we think a cleaner (and potentially faster) way to go would be move to plain old String. Most of the strings in Yhc are short - something ByteString isn't particularly optimised for. String is also faster than PackedString, the only reason for PackedString was memory requirements, but that stopped being an issue years ago.
So in essence, a port from PackedString to String is probably the best way to go!
The way I'd go about this is:
1) Write a String implementation of the interface we use for PackedString
2) Start inlining the implementation manually, function by function
That way you should have relatively little work to move to String, and should be able to then patch by patch move towards a proper use of String, without doing one big change.
Okay, that sounds good.
participants (4)
-
Creighton Hogg
-
dons@cse.unsw.edu.au
-
Neil Mitchell
-
Thomas Shackell