
Alright, I've been hacking away at what I posted the other day, and I have something that works for files that will fit entirely into memory. And then I figured out why I've been restricted to files that fit entirely into memory... One of my functions is causing the entire thing to be read in, when, in the way I analyze the program, only a very small portion of the file should be read in. Here are the functions I've used to test this problem... import Data.Bits import qualified Data.ByteString.Lazy as BS import Foreign.C.Types ... {-# INLINE decodeLengthBits #-} decodeLengthBits :: BS.ByteString -> CInt decodeLengthBits doc = (shift (pieces !! 3) 24) .|. (shift (pieces !! 2) 16) .|. (shift (pieces !! 1) 8) .|. (pieces !! 0) where pieces::[CInt] = map fromIntegral . BS.unpack . BS.take 4 $ doc breakIntoDocuments :: RawDocument -> [RawDocument] breakIntoDocuments f | BS.length f > 0 = if len > 0 then (BS.take bytes f) : (breakIntoDocuments (BS.drop bytes f)) else (breakIntoDocuments (BS.drop bytes f)) | otherwise = [] where len = decodeLengthBits f bytes = fromIntegral (len * 2 + len * 4 + 4) and a main function of: main = do f <- B.readFile "Documents.bin" print (take 1 (breakIntoDocuments f)) Shouldn't the program only read in enough of the lazy byte-string to create the first return value of breakIntoDocuments? The return value of decodeLengthBits is only 277. I watched it, and it's reading in my whole 2gb file... -- Jeff

The (BS.length f) can only be computed by reading until the end of the file!
breakIntoDocuments :: RawDocument -> [RawDocument] breakIntoDocuments f | BS.length f > 0 = if len > 0 then (BS.take bytes f) : (breakIntoDocuments (BS.drop bytes f)) else (breakIntoDocuments (BS.drop bytes f)) | otherwise = [] where len = decodeLengthBits f bytes = fromIntegral (len * 2 + len * 4 + 4)
and a main function of:
main = do f <- B.readFile "Documents.bin" print (take 1 (breakIntoDocuments f))
Shouldn't the program only read in enough of the lazy byte-string to create the first return value of breakIntoDocuments? The return value of decodeLengthBits is only 277. I watched it, and it's reading in my whole 2gb file...
-- Jeff
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Chaddaï Fouché wrote:
You should be using BS.null f rather than BS.length f > 0.
While we're on the subject... anybody know a neat way to check, say, whether a list contains exactly 1 element? (Obviously pattern matching can do it, but that requires big case-expressions...)

On Fri, Jun 22, 2007 at 07:14:39PM +0100, Andrew Coppin wrote:
Chaddaï Fouché wrote:
You should be using BS.null f rather than BS.length f > 0.
While we're on the subject... anybody know a neat way to check, say, whether a list contains exactly 1 element? (Obviously pattern matching can do it, but that requires big case-expressions...)
data LazyNat = Zero | Succ LazyNat deriving(Eq,Ord) instance Enum LazyNat where succ = Succ pred (Succ x) = x toEnum 0 = Zero toEnum (x+1) = succ (toEnum x) fromEnum Zero = 0 fromEnum (Succ x) = fromEnum x + 1 instance Num LazyNat where -- this is a lie, the lifted naturals only -- form a *semi*ring. Sigh. fromIntegral = toEnum Zero + y = y Succ x + y = Succ (x + y) Zero * y = 0 Succ x * y = y + x * y abs = id signum 0 = 0 signum _ = 1 x - Zero = x Succ x - Succ y = x - y length' [] = Zero length' (x:xs) = Succ (length xs) null x = length' x == 0 one x = length' x == 1 atLeastFive x = length' x >= 5 Stefan

On Fri, 22 Jun 2007, Stefan O'Rear wrote:
length' [] = Zero length' (x:xs) = Succ (length xs)
also known as Data.List.genericLength. :-) See also http://darcs.haskell.org/htam/src/Number/PeanoNumber.hs

Hi
I have most of a Data.Nat library done, I should finish it off and release it...
Thanks
Neil
On 6/22/07, Stefan O'Rear
On Fri, Jun 22, 2007 at 07:14:39PM +0100, Andrew Coppin wrote:
Chaddaï Fouché wrote:
You should be using BS.null f rather than BS.length f > 0.
While we're on the subject... anybody know a neat way to check, say, whether a list contains exactly 1 element? (Obviously pattern matching can do it, but that requires big case-expressions...)
data LazyNat = Zero | Succ LazyNat deriving(Eq,Ord)
instance Enum LazyNat where succ = Succ pred (Succ x) = x
toEnum 0 = Zero toEnum (x+1) = succ (toEnum x)
fromEnum Zero = 0 fromEnum (Succ x) = fromEnum x + 1
instance Num LazyNat where -- this is a lie, the lifted naturals only -- form a *semi*ring. Sigh. fromIntegral = toEnum
Zero + y = y Succ x + y = Succ (x + y)
Zero * y = 0 Succ x * y = y + x * y
abs = id signum 0 = 0 signum _ = 1
x - Zero = x Succ x - Succ y = x - y
length' [] = Zero length' (x:xs) = Succ (length xs)
null x = length' x == 0
one x = length' x == 1
atLeastFive x = length' x >= 5
Stefan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

2007/6/22, Andrew Coppin
Chaddaï Fouché wrote:
You should be using BS.null f rather than BS.length f > 0.
While we're on the subject... anybody know a neat way to check, say, whether a list contains exactly 1 element? (Obviously pattern matching can do it, but that requires big case-expressions...)
Big case-expression ? isLength1 [x] = "Ok" isLength _ = "Nok" How is [x] big in any way ? If you need to test for more than one element you can just put put a guard with length -- Jedaï

On 6/23/07, Chaddaï Fouché
isLength1 [x] = "Ok" isLength _ = "Nok"
excellent.
How is [x] big in any way ? If you need to test for more than one element you can just put put a guard with length
Invoking length is more strict than is necessary, though this may not be a problem. If you want a lazier solution, you could try: hasLength 0 [] = True hasLength 0 (_:_) = False hasLength n [] = False hasLength n (_:rest) = hasLength (n - 1) rest This only evaluates at most enough of the list skeleton to verify whether or not it has the right length, where invoking length would evaluate the whole list skeleton. cheers, T. -- Dr Thomas Conway drtomc@gmail.com Silence is the perfectest herald of joy: I were but little happy, if I could say how much.

jeff:
Alright, I've been hacking away at what I posted the other day, and I have something that works for files that will fit entirely into memory. And then I figured out why I've been restricted to files that fit entirely into memory... One of my functions is causing the entire thing to be read in, when, in the way I analyze the program, only a very small portion of the file should be read in. Here are the functions I've used to test this problem...
import Data.Bits import qualified Data.ByteString.Lazy as BS import Foreign.C.Types ...
{-# INLINE decodeLengthBits #-} decodeLengthBits :: BS.ByteString -> CInt decodeLengthBits doc = (shift (pieces !! 3) 24) .|. (shift (pieces !! 2) 16) .|. (shift (pieces !! 1) 8) .|. (pieces !! 0) where pieces::[CInt] = map fromIntegral . BS.unpack . BS.take 4 $ doc
breakIntoDocuments :: RawDocument -> [RawDocument] breakIntoDocuments f | BS.length f > 0 = if len > 0
Argh! -- | /O(n\/c)/ 'length' returns the length of a ByteString as an 'Int64' length :: ByteString -> Int64 length (LPS ss) = L.foldl' (\n ps -> n + fromIntegral (P.length ps)) 0 ss that'll force the whole file (it is the sum of the length of each chunk). Try comparing against the null bytestring, -- | /O(1)/ Test whether a ByteString is empty. null :: ByteString -> Bool null (LPS []) = True null (_) = False :-)
then (BS.take bytes f) : (breakIntoDocuments (BS.drop bytes f)) else (breakIntoDocuments (BS.drop bytes f)) | otherwise = [] where len = decodeLengthBits f bytes = fromIntegral (len * 2 + len * 4 + 4)
and a main function of:
main = do f <- B.readFile "Documents.bin" print (take 1 (breakIntoDocuments f))
Shouldn't the program only read in enough of the lazy byte-string to create the first return value of breakIntoDocuments? The return value of decodeLengthBits is only 277. I watched it, and it's reading in my whole 2gb file...
-- Jeff
Got to be more lazy :-) -- Don
participants (9)
-
Andrew Coppin
-
Chaddaï Fouché
-
Chris Kuklewicz
-
dons@cse.unsw.edu.au
-
Henning Thielemann
-
Jefferson Heard
-
Neil Mitchell
-
Stefan O'Rear
-
Thomas Conway