Hi Cafe,
I'm working on inspecting some data that I'm trying to represent as records in Haskell and seeing about twice the memory footprint than I was expecting. I've got roughly 1.4 million records in a CSV file (400M on disk) that I parse in using bytestring-csv. bytestring-csv returns a [[ByteString]] (wrapped in `type`s) which I then convert into a list of records that have the following structure:

> 3 Int
> 1 Text Length 3
> 1 Text Length 11
> 12 Float
> 1 UTCTime

All fields are marked strict and have {-# UNPACK #-} pragmas (I'm guessing that doesn't do anything for non primitives). (Side note, is there a way to check if things are actually being unpacked?)

My back of the napkin memory estimates based on the assumption that nothing is being unpacked (and my very spotty understanding of Haskell data structures):

Platform: 64 Bit Linux
# Type (Sizeof type (occasionally a guess))

3 * Int (8)
14 * Char (4) -- Text is some kind of bytestring but I'm guessing it can't be worse than the same number of Char?
12 * Float (4)
18 * sizeOf (ptr) (8)
UTC: -- From what I can gather through :info in ghci
4 * (ptr) (8)
2 * Integer (16) -- Shouldn't be overly large, times are within 2012

List: (Pointer to element and next cons cell)
1408113 * 8 * 2

=
2513G + 21.5M
So even if the original bytestring file is being kept entirely in memory somehow that's not more than 3G.

I've written a small driver test program that just parses the CSV, finds the minimum value for a couple of the Float fields, and exits. In the process monitor the memory usage is 6.9G before the program exits. I've tried profiling with +RTS -hc but it ran for >3 hours without finishing, it normally finishes within 4 minutes. Anyone have any ideas for me? Things to try?
Thanks,
Andrew