Sparse records/ADTs

Is there a convenient way of handling a data structure with lots of fields of different types that may or may not be filled in? Something equivalent to data D = D {a::Maybe A, b::Maybe B, c::Maybe C, …} but with better space efficiency and a more convenient empty object. An easy alternative is data E = Ea A | Eb B | Ec C | … type R = [E] which has a straightforward empty object, but one then must define getA e = listToMaybe [a | Ea a <- e] for each field, which is tedious (and O(n)). Obviously Templates would help, but is there an alternative I’ve missed? -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

* Jon Fairbairn
Is there a convenient way of handling a data structure with lots of fields of different types that may or may not be filled in?
Something equivalent to
data D = D {a::Maybe A, b::Maybe B, c::Maybe C, …}
but with better space efficiency and a more convenient empty object.
An easy alternative is
data E = Ea A | Eb B | Ec C | … type R = [E]
which has a straightforward empty object, but one then must define
getA e = listToMaybe [a | Ea a <- e]
for each field, which is tedious (and O(n)). Obviously Templates would help, but is there an alternative I’ve missed?
For runtime efficiency it's better to use Data.Map. For keys of this map you have two alternatives: either define data Key = Ka | Kb | Kc | ... or, to prevent this duplication at the cost of less convenient notation, you can do something like -- Identity combinator; or use one from Control.Monad.Identity newtype I a = I a -- Constant combinator newtype K a b = K a deriving (Eq, Ord) data E c = Ea (c A) | Eb (c B) deriving instance Eq (E (K ())) deriving instance Ord (E (K ())) type R = Map.Map (E (K ())) (E I) When you set a value, you infer the key from the value. (This inference needs to be written manually or generated.) When you get a value by the key, you check that the returned constructor is what you expect and throw an error otherwise (but the latter should never happen if you maintain the invariant). Roman

* Roman Cheplyaka
* Jon Fairbairn
[2012-10-24 11:08:29+0100] Is there a convenient way of handling a data structure with lots of fields of different types that may or may not be filled in?
Something equivalent to
data D = D {a::Maybe A, b::Maybe B, c::Maybe C, …}
but with better space efficiency and a more convenient empty object.
An easy alternative is
data E = Ea A | Eb B | Ec C | … type R = [E]
which has a straightforward empty object, but one then must define
getA e = listToMaybe [a | Ea a <- e]
for each field, which is tedious (and O(n)). Obviously Templates would help, but is there an alternative I’ve missed?
For runtime efficiency it's better to use Data.Map.
Actually, you can use Data.IntMap for even better performance, if you define an Enum instance for your keys. Roman

Hi Jon On Wed, Oct 24, 2012 at 11:08:29AM +0100, Jon Fairbairn wrote:
for each field, which is tedious (and O(n)). Obviously Templates would help, but is there an alternative I’ve missed?
perhaps something like: data Type = Ta | Tb | Tc ... data E = Ea A | Eb B | Ec C | ... type D = HashMap Type E get :: Type -> D -> Maybe E get = lookup Greetings, Daniel

On 24/10/12 12:08, Jon Fairbairn wrote:
Is there a convenient way of handling a data structure with lots of fields of different types that may or may not be filled in?
Not sure about convenience, but here is a type safe solution with O(log n) lookups and updates. The idea is to define a GADT tree type with a fixed layout: -- define the structure type MyT = TBranch (TLeaf A) (TBranch (TLeaf B) (TLeaf C)) -- a value level tree that uses that structure type My = GTree MyT You still have to define the paths to the members pa = GL GH pb = GR (GL GH) pc = GR (GR GH) But once you have that you can perform lookups and updates: *Main> glookup pc (gupdate pa (Just A) (gupdate pc (Just C) gempty)) Just C It shouldn't be too hard to make a template haskell function that generates these paths. Or perhaps the corresponding lenses. Twan

Twan van Laarhoven
On 24/10/12 12:08, Jon Fairbairn wrote:
Is there a convenient way of handling a data structure with lots of fields of different types that may or may not be filled in?
Not sure about convenience, but here is a type safe solution with O(log n) lookups and updates. The idea is to define a GADT tree type with a fixed layout:
Thanks for your reply (and for all the others). Since type safe is something that (for me) goes without saying, this is the best solution, but it doesn’t really satisfy the convenience aspect. (I had already looked at solutions using Map and contemplated a tree structure, but didn’t like anything I had come up with). In short, it looks like the answer to my question is “No.” :-/ -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

Would this be relevant?
https://github.com/jonsterling/Data.Records
On Fri, Oct 26, 2012 at 11:17 AM, Jon Fairbairn
Twan van Laarhoven
writes: On 24/10/12 12:08, Jon Fairbairn wrote:
Is there a convenient way of handling a data structure with lots of fields of different types that may or may not be filled in?
Not sure about convenience, but here is a type safe solution with O(log n) lookups and updates. The idea is to define a GADT tree type with a fixed layout:
Thanks for your reply (and for all the others). Since type safe is something that (for me) goes without saying, this is the best solution, but it doesn’t really satisfy the convenience aspect. (I had already looked at solutions using Map and contemplated a tree structure, but didn’t like anything I had come up with). In short, it looks like the answer to my question is “No.” :-/
-- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Yuri de Wit
Would this be relevant?
That looks promising, thanks. It does use rather a lot of extensions, so it’ll take me a while to understand it. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

Maybe the vault package works for you?
http://hackage.haskell.org/package/vault
Sjoerd Visscher
On Oct 26, 2012, at 5:17 PM, Jon Fairbairn
Twan van Laarhoven
writes: On 24/10/12 12:08, Jon Fairbairn wrote:
Is there a convenient way of handling a data structure with lots of fields of different types that may or may not be filled in?
Not sure about convenience, but here is a type safe solution with O(log n) lookups and updates. The idea is to define a GADT tree type with a fixed layout:
Thanks for your reply (and for all the others). Since type safe is something that (for me) goes without saying, this is the best solution, but it doesn’t really satisfy the convenience aspect. (I had already looked at solutions using Map and contemplated a tree structure, but didn’t like anything I had come up with). In short, it looks like the answer to my question is “No.” :-/
-- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Jon Fairbairn
Is there a convenient way of handling a data structure with lots of fields of different types that may or may not be filled in?
Hi Jon, if your question had appeared in a database forum, the answer would be ... Sounds like you need to do normal-form analysis: - what are the functional dependencies between the fields? - is your data structure something like a 'universal relation'? - is the structure better expressed as several records? (vertically partitioned) Relevant treatments could be "How to Handle missing information without using NULL" http://www.dcs.warwick.ac.uk/~hugh/TTM/Missing-info-without-nulls.pdf I appreciate that SQL's NULL is not the same as Haskell's Maybe/Nothing (and SQL's semantics for NULL are utterly horrible). Nevertheless, if you're concerned to avoid the wasted space of sparse records, it could be a helpful discipline to design data structures without needing Maybe's. AntC

You've got a bunch of great answers, if there's no rhyme or reason to
which fields are missing.
If, on the other hand, they will tend to be present or absent in
groups, you could decompose your data-structure a bit, for fast
lookups, good space efficiency, and maybe even slightly more
interesting checks from the type system.
On Wed, Oct 24, 2012 at 3:08 AM, Jon Fairbairn
Is there a convenient way of handling a data structure with lots of fields of different types that may or may not be filled in?
Something equivalent to
data D = D {a::Maybe A, b::Maybe B, c::Maybe C, …}
but with better space efficiency and a more convenient empty object.
An easy alternative is
data E = Ea A | Eb B | Ec C | … type R = [E]
which has a straightforward empty object, but one then must define
getA e = listToMaybe [a | Ea a <- e]
for each field, which is tedious (and O(n)). Obviously Templates would help, but is there an alternative I’ve missed?
-- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (8)
-
AntC
-
Daniel Trstenjak
-
David Thomas
-
Jon Fairbairn
-
Roman Cheplyaka
-
Sjoerd Visscher
-
Twan van Laarhoven
-
Yuri de Wit