
Hello!
I've used Haskell and GHC to solve particular real life application. 4 tools were developed and their function is almost the same - they modify textual input according to patterns found in the text. Thus, it
Hmm, modification can be a problem for ByteStrings, since it entails copying. That could be worse for strict BytStrings than lazy, if in the lazy ByteString you can reuse many chunks.
I understand now, that is probably the point.
Two main possibilities: 1. your algorithm isn't suited for ByteStrings 2. you're doing it wrong
The above indicates 1., but without a more detailed description and/or code, it's impossible to tell.
Yes, it seems that the (1) is the point, because I split and re-build the bytestream many times during processing.
So my questions follow: - What kind of application is lazy bytestring suitable for?
Anything that involves examining large sequences of bytes (or ASCII [latin1/other single-byte encoding] text) basically sequentially (it's not good if you have to jump forwards and backwards a lot and far). Also some types of modification of such data.
- Would it be worth using strict bytestring even if input files may be large? (They would fit in memory, but may consume whole)
Probably not, see above. But see above.
- If bytestring is not suitable for text manipulation, is there something faster than lists?
text has already been mentioned, but again, there are types of manipulation it's not well-suited for and where a linked list may be superior.
- It would be nice to have native sort for lazy bytestring - would it be slower than pack $ Data.List.sort $ unpack ?
The natural sort for ByteStrings would be a counting sort, O(alphabet size + length), so for long ByteStrings, it should be significantly faster than pack . sort . unpack, but for short ones, it would be significantly slower.
- If bytestring is suitable for text manipulation could we have some hGetTextualContents which translates Windows EOL (CR+LF) to LF?
Doing such a transformation would be kind of against the purpose of ByteStrings, I think. Isn't the point of ByteStrings to get the raw bytes as efficiently as possible?
OK, thank you very much for explanation. Best regards, John -- http://www.fastmail.fm - Email service worth paying for. Try it for free