Purely Functional Data Structures

Hi, is the above mentioned book still *the* authority on the subject? I bought the book, read about 10 pages and then put it back on the shelf. Um. In my app I have to deal with 4 csv files, each between 5 - 10 mb, and some static data. I had put all that data into an Sqlite3 database and used SQL on it. But, as the requirements keep changing the SQL becomes a bit messy. I guess we've all had that experience. So I'm wondering if I will find clues in this book how to do my querying and handling of moderately large data in a more haskellish way and be able to drop the SQL. Your suggestions appreciated. Günther

gue.schmidt:
Hi,
is the above mentioned book still *the* authority on the subject?
I bought the book, read about 10 pages and then put it back on the shelf. Um. In my app I have to deal with 4 csv files, each between 5 - 10 mb, and some static data.
I had put all that data into an Sqlite3 database and used SQL on it. But, as the requirements keep changing the SQL becomes a bit messy. I guess we've all had that experience.
So I'm wondering if I will find clues in this book how to do my querying and handling of moderately large data in a more haskellish way and be able to drop the SQL.
Use the fine libraries on http://hackage.haskell.org. E.g. bytestring-csv then load that into a finite map? These days it is rare to have to roll your own new data structures... -- don

Hi Don, damn, that was quick! And thx, I'll look into that. The reading it in wasn't much of a problem, I had been able to use MS-ODBC for that, there's a driver for ODBC files. The problem is more the type of data structure I'd be reading it into. In SQL I would have the data indexed by several different columns, if I use maps I'd only have one key, so if I need to lookup data in the map by a value that is not the key the lookups will become quite expensive. Any suggestions, what do you do in these cases? Günther Don Stewart schrieb:
gue.schmidt:
Hi,
is the above mentioned book still *the* authority on the subject?
I bought the book, read about 10 pages and then put it back on the shelf. Um. In my app I have to deal with 4 csv files, each between 5 - 10 mb, and some static data.
I had put all that data into an Sqlite3 database and used SQL on it. But, as the requirements keep changing the SQL becomes a bit messy. I guess we've all had that experience.
So I'm wondering if I will find clues in this book how to do my querying and handling of moderately large data in a more haskellish way and be able to drop the SQL.
Use the fine libraries on http://hackage.haskell.org.
E.g. bytestring-csv then load that into a finite map?
These days it is rare to have to roll your own new data structures...
-- don

At Sun, 08 Mar 2009 00:13:14 +0100, G?uenther Schmidt wrote:
In SQL I would have the data indexed by several different columns, if I use maps I'd only have one key, so if I need to lookup data in the map by a value that is not the key the lookups will become quite expensive.
happstack-ixset offers a data-type similar to Map except that you can have multiple keys. You can even have keys that are calculated from the data but don't actually appear in the data itself. For, example, if your ixset just contains Strings, one of the keys could be the length of the String. happstack-ixset (and its dependencies) also offers compact serialization/deserialization of the ixset to disk, data migration options, and a smattering of other features that may or may not be useful to you. While happstack-ixset is built to work with happstack, it is does not depend on the happstack http server or persistent store layer, so it should be useful even if you are not being an application server. - jeremy

Hi Jeremy, I had used HAppS-IxSet before and was very happy with it, it offered pretty much everything I needed. I switched (back) to SQL once I had hit a bump in the road that I wasn't able to fix, a stack-overflow that occurred once I ran the code against the largest sample data I had. It occurred because I could not make the inserts into the set strict. However since then my haskell skills have improved somewhat and right now I'm giving it another go. At that time it seemed that not many people were using IxSet so no-one was really able to help me with it. I'm glad that even though the original project is discontinued someone else took up the torch. Günther Jeremy Shaw schrieb:
At Sun, 08 Mar 2009 00:13:14 +0100, G?uenther Schmidt wrote:
In SQL I would have the data indexed by several different columns, if I use maps I'd only have one key, so if I need to lookup data in the map by a value that is not the key the lookups will become quite expensive.
happstack-ixset offers a data-type similar to Map except that you can have multiple keys. You can even have keys that are calculated from the data but don't actually appear in the data itself. For, example, if your ixset just contains Strings, one of the keys could be the length of the String.
happstack-ixset (and its dependencies) also offers compact serialization/deserialization of the ixset to disk, data migration options, and a smattering of other features that may or may not be useful to you.
While happstack-ixset is built to work with happstack, it is does not depend on the happstack http server or persistent store layer, so it should be useful even if you are not being an application server.
- jeremy

At Sun, 08 Mar 2009 02:28:43 +0100, G?uenther Schmidt wrote:
[1
] Hi Jeremy, I had used HAppS-IxSet before and was very happy with it, it offered pretty much everything I needed. I switched (back) to SQL once I had hit a bump in the road that I wasn't able to fix, a stack-overflow that occurred once I ran the code against the largest sample data I had. It occurred because I could not make the inserts into the set strict.
However since then my haskell skills have improved somewhat and right now I'm giving it another go.
If you still run into the issue, and it can be best solved by adding a strict version of insert (or something similar), definitely submit a patch. Now that the project is alive again, the patch should be accepted and applied in a matter of hours. -- jeremy

So...is there some reason this is in the hApps package?
On Sat, Mar 7, 2009 at 9:04 PM, Jeremy Shaw
At Sun, 08 Mar 2009 00:13:14 +0100, G?uenther Schmidt wrote:
In SQL I would have the data indexed by several different columns, if I use maps I'd only have one key, so if I need to lookup data in the map by a value that is not the key the lookups will become quite expensive.
happstack-ixset offers a data-type similar to Map except that you can have multiple keys. You can even have keys that are calculated from the data but don't actually appear in the data itself. For, example, if your ixset just contains Strings, one of the keys could be the length of the String.
happstack-ixset (and its dependencies) also offers compact serialization/deserialization of the ixset to disk, data migration options, and a smattering of other features that may or may not be useful to you.
While happstack-ixset is built to work with happstack, it is does not depend on the happstack http server or persistent store layer, so it should be useful even if you are not being an application server.
- jeremy _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 8 Mar 2009, at 12:13 pm, G?uenther Schmidt wrote:
Hi Don,
damn, that was quick!
And thx, I'll look into that. The reading it in wasn't much of a problem, I had been able to use MS-ODBC for that, there's a driver for ODBC files. The problem is more the type of data structure I'd be reading it into. In SQL I would have the data indexed by several different columns, if I use maps I'd only have one key, so if I need to lookup data in the map by a value that is not the key the lookups will become quite expensive.
Any suggestions, what do you do in these cases?
Who said you could only have one map? You can have as many as you want: primary :: Map Key1 (Maybe Whole_Record) secondary :: Map Key2 [Whole_Record] ... Adding the same record to both maps won't copy the record.

On Sat, 7 Mar 2009, Gü?nther Schmidt wrote:
is the above mentioned book still *the* authority on the subject?
I bought the book, read about 10 pages and then put it back on the shelf. Um. In my app I have to deal with 4 csv files, each between 5 - 10 mb, and some static data.
I had put all that data into an Sqlite3 database and used SQL on it. But, as the requirements keep changing the SQL becomes a bit messy. I guess we've all had that experience.
So I'm wondering if I will find clues in this book how to do my querying and handling of moderately large data in a more haskellish way and be able to drop the SQL.
If this CSV processing is of pipe style, you may want to try a lazy CSV parser and formatter: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/spreadsheet-0.1
participants (7)
-
Andrew Wagner
-
Don Stewart
-
G?uenther Schmidt
-
Gü?nther Schmidt
-
Henning Thielemann
-
Jeremy Shaw
-
Richard O'Keefe