Announce: bytestring 0.9.1.0

Hey all, I'm pleased to announce a new major release of bytestring, the efficient string library for Haskell, suitable for high-performance scenarios. This release is primarily an (incremental) performance improvement release, though with some notable significant improvements, along with long term test coverage and quality control changes. Highlights: * a long term performance bug with Ord instances, involving very small strings, and Data.Map has been squashed. * everything's a little faster -- shootout problems showed a 1-5% speedup just by switching to the new library. Thanks goes to the Hac4 Haskell Hackathon organisers, in Gothenburg, Sweden, where the majority of this work to create this release took place. Key changes: * Data.Map short key performance greatly improved: - 'words Map' running time: 6.310s bytestring 0.9.0.1 1.071s bytestring 0.9.1.0 * Uses cheaper unsafeDupablePerformIO for allocation. - tail recursive tight loops (fixes obscure stack overflow) * Generally faster: - Shootout sum-file: 1.218s to 1.190s - Shooout fasta: 9.210s to 8.811s * 4-5x faster small substring search (breakSubstring/findSubstring/isInfixOf). * Extensive QuickCheck coverage reporting and improvements: - http://code.haskell.org/~dons/tests/bytestring/hpc_index.html Get the code: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring Note that if you upgrade to the new bytestring release, older packages built against previous releases will still require the old bytestring package. For best results, rebuild any bytestring-depending packages against the new library only. Cheers, Don

On 2008.04.20 15:09:33 -0700, Don Stewart
Hey all,
I'm pleased to announce a new major release of bytestring, the efficient string library for Haskell, suitable for high-performance scenarios.
This release is primarily an (incremental) performance improvement release, though with some notable significant improvements, along with long term test coverage and quality control changes.
Highlights:
* a long term performance bug with Ord instances, involving very small strings, and Data.Map has been squashed.
* everything's a little faster -- shootout problems showed a 1-5% speedup just by switching to the new library.
Thanks goes to the Hac4 Haskell Hackathon organisers, in Gothenburg, Sweden, where the majority of this work to create this release took place.
Key changes:
* Data.Map short key performance greatly improved: - 'words Map' running time: 6.310s bytestring 0.9.0.1 1.071s bytestring 0.9.1.0
* Uses cheaper unsafeDupablePerformIO for allocation. - tail recursive tight loops (fixes obscure stack overflow)
* Generally faster: - Shootout sum-file: 1.218s to 1.190s - Shooout fasta: 9.210s to 8.811s
* 4-5x faster small substring search (breakSubstring/findSubstring/isInfixOf).
* Extensive QuickCheck coverage reporting and improvements: - http://code.haskell.org/~dons/tests/bytestring/hpc_index.html
Get the code:
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring
Note that if you upgrade to the new bytestring release, older packages built against previous releases will still require the old bytestring package. For best results, rebuild any bytestring-depending packages against the new library only.
Cheers, Don
That's all good news; will this release of ByteString be used for GHC 6.8.3? I'm a little tired of linking everything against 0.9.0.1 just so I can use Yi (since GHC/the-GHC-API links against it). :) -- gwern NSDM USP Edens SAS kibo quarter NSES Gamma MP5k threat

That's all good news; will this release of ByteString be used for GHC 6.8.3? I'm a little tired of linking everything against 0.9.0.1 just so I can use Yi (since GHC/the-GHC-API links against it). :)
Indeed; this is the biggest issue I have with bytestring right now as it's interfered with my work with hs-plugins/GHC-api, especially considering I think the new cabal and ghc 6.8.3 should fix or at least warn about the library-version-mismatch issues (from what I've heard.) The only probable fix I can think of (other than doing compile-time hackery on both the C and haskell side to make symbol names differ across releases, which is fairly infeasible i'd think and would regardless kill portability since template-haskell is the only viable option on the haskell note,) is to factor out the usage of bytestring in the GHC-API, or just stick the source code to bytestring into the GHC source tree so it's not built as a package (so you wouldn't see it in, e.g. 'ghc-pkg describe ghc', although I would think the names would still show up in the symbol table, regardless.) But whatever route you choose, I'm not sure of the ramifications in general and it seems to be a tough cookie to solve properly, so we don't end up breaking things as we try and fix them. Don probably has better ideas. -- "It was in the days of the rains that their prayers went up, not from the fingering of knotted prayer cords or the spinning of prayer wheels, but from the great pray-machine in the monastery of Ratri, goddess of the Night." Roger Zelazny

mad.one:
That's all good news; will this release of ByteString be used for GHC 6.8.3? I'm a little tired of linking everything against 0.9.0.1 just so I can use Yi (since GHC/the-GHC-API links against it). :)
Indeed; this is the biggest issue I have with bytestring right now as it's interfered with my work with hs-plugins/GHC-api, especially considering I think the new cabal and ghc 6.8.3 should fix or at least warn about the library-version-mismatch issues (from what I've heard.)
The only probable fix I can think of (other than doing compile-time hackery on both the C and haskell side to make symbol names differ across releases, which is fairly infeasible i'd think and would regardless kill portability since template-haskell is the only viable option on the haskell note,) is to factor out the usage of bytestring in the GHC-API, or just stick the source code to bytestring into the GHC source tree so it's not built as a package (so you wouldn't see it in, e.g. 'ghc-pkg describe ghc', although I would think the names would still show up in the symbol table, regardless.)
But whatever route you choose, I'm not sure of the ramifications in general and it seems to be a tough cookie to solve properly, so we don't end up breaking things as we try and fix them. Don probably has better ideas.
The use of bytestring inside GHC is limited only to a little bit in the GHCi modules -- and could easily be replaced, I suspect. Doing so would remove one dependency from GHC's core, as well as making it easier to upgrade bytestring versions. Ian, have you looked at this? -- Don

On Sun, Apr 20, 2008 at 05:07:48PM -0700, Donald Bruce Stewart wrote:
The use of bytestring inside GHC is limited only to a little bit in the GHCi modules -- and could easily be replaced, I suspect. Doing so would remove one dependency from GHC's core, as well as making it easier to upgrade bytestring versions. Ian, have you looked at this?
I'd certainly be happy for bytestring to be removed from the bootlibs. When I last looked this was pretty easy to do. The only reason it's still in bootlibs at the moment is that you and/or Duncan were talking about rewriting some of the IO library on top of it, but if that happens in the future then we can always put it back into bootlibs again. Thanks Ian

On Sun, 2008-04-20 at 19:05 -0500, Austin Seipp wrote:
That's all good news; will this release of ByteString be used for GHC 6.8.3? I'm a little tired of linking everything against 0.9.0.1 just so I can use Yi (since GHC/the-GHC-API links against it). :)
Indeed; this is the biggest issue I have with bytestring right now as it's interfered with my work with hs-plugins/GHC-api, especially considering I think the new cabal and ghc 6.8.3 should fix or at least warn about the library-version-mismatch issues (from what I've heard.)
Cabal-1.4 does warn at configure time if a build is going to use inconsistent versions of dependent libraries. Trying to come up with an installation plan that avoids the problem is a good deal harder and in general isn't possible without having to rebuild lots of unrelated packages.
The only probable fix I can think of (other than doing compile-time hackery on both the C and haskell side to make symbol names differ across releases,
All haskell code does work this way, ghc puts the package name and version into the symbol name so it is possible to link several versions of a package into one program. The particular problem for bytestring is the C code that it uses. So while Data.ByteString.foo get's mapped to a symbol name something like "bytestring_0_9_1_0_DatazByteStringzfoo" the embedded C code gets no such mangling so fps_reverse from bytestring-0.9.0.1 clashes with fps_reverse from bytestring-0.9.0.4. Actually the gnu ld linker doesn't mind about the duplicate symbols and just picks ones of them. The ghci linker is a bit more paranoid and rejects the duplicates. I guess we could try and adjust the names of the C symbols to include the package name and version. Duncan

On Mon, Apr 21, 2008 at 01:27:20AM +0100, Duncan Coutts wrote:
I guess we could try and adjust the names of the C symbols to include the package name and version.
Or just rewrite the C bits in Haskell? There isn't much of it, and it doesn't look like there is a good reason that C should be able to do it more efficiently than Haskell impls. Thanks Ian
participants (5)
-
Austin Seipp
-
Don Stewart
-
Duncan Coutts
-
Gwern Branwen
-
Ian Lynagh