Re: [Haskell-cafe] List x ByteString x Lazy Bytestring

5 Dec 2011


      On Monday 05 December 2011, 14:14:56, John Sneer wrote:
...
I've used Haskell and GHC to solve particular real life application. 4
  tools were developed and their function is almost the same - they
  modify textual input according to patterns found in the text. Thus, it
Hmm, modification can be a problem for ByteStrings, since it entails 
copying. That could be worse for strict BytStrings than lazy, if in the 
lazy ByteString you can reuse many chunks.
...
is something like a compiler, the result is also a text and it is not
  parsed to tokens as patterns appear on a different level.
The tools differ in tasks and number of modifications performed,
  otherwise, in principal, they are very much similar.
I used lists (Prelude, Data.List) to develop the tools. After
  successfully completing the development, I've started to optimize the
  code to make the tools faster. After modification of some algorithms
  (which dropped the processing time notably), I started to change data
  structures. I swapped lists with lazy bytestrings. Nevertheless, what
  an unpleasant surprise, the processing speed dropped down,
  significantly / more then 30% time needed).
Two main possibilities:
1. your algorithm isn't suited for ByteStrings
2. you're doing it wrong

The above indicates 1., but without a more detailed description and/or 
code, it's impossible to tell.
...
So my questions follow:
- What kind of application is lazy bytestring suitable for?
Anything that involves examining large sequences of bytes (or ASCII 
[latin1/other single-byte encoding] text) basically sequentially (it's not 
good if you have to jump forwards and backwards a lot and far).
Also some types of modification of such data.
...
- Would it be worth using strict bytestring even if input files may be
large? (They would fit in memory, but may consume whole)
Probably not, see above. But see above.
...
- If bytestring is not suitable for text manipulation, is there
something faster than lists?
text has already been mentioned, but again, there are types of manipulation 
it's not well-suited for and where a linked list may be superior.
...
- It would be nice to have native sort for lazy bytestring - would it be
slower than  pack $ Data.List.sort $ unpack ?
The natural sort for ByteStrings would be a counting sort,
O(alphabet size + length), so for long ByteStrings, it should be 
significantly faster than pack . sort . unpack, but for short ones, it 
would be significantly slower.
...
- If bytestring is suitable for text manipulation could we have some
hGetTextualContents which translates Windows EOL (CR+LF) to LF?
Doing such a transformation would be kind of against the purpose of 
ByteStrings, I think.  Isn't the point of ByteStrings to get the raw bytes 
as efficiently as possible?

Re: [Haskell-cafe] List x ByteString x Lazy Bytestring

Daniel Fischer