Binary Data Access via PIC…??

On NL FP day, it struck me again when I saw an almost 1 MB *.hs file with apparent sole purpose of getting a quantity of raw data incorporated to the binary – applying some funny text encoding constructs. I remembered that, to my best knowledge, with major downside that it's compile time, this appears to be the best solution to me… Another approach I did notice several times was, say, the use of super fast parsing, to read in binary data at run time. Did I miss something? Or, more specifically – I am speaking about that kind of binary data which is (1) huge! – the 1 MB mentioned above rather being at the lower limit, (2) completely independent from the version of the Haskell compiler, (3) guaranteed (externally!) to match the structural requirements of the application referred to, (4) well managed in some way, concerning ABI issues, too (e.g. versioning, metadata headers etc.), and the question is in how far – as I believe other languages do, too – we can exploit PIC (position independent code), to read in really large quantities of binary data at run time or immediately before run time, without the need for parsing at all. E.g., a textual data representation Haskell file will generate an an object file already, for which linking only should have a limited amount of assumptions regarding its inner structure. Imagining I have a huge but simple DB table, and a kind of converter which by some simplification of a Haskell compiler generates an object file that equally matches these (limited, as I believe) assumptions, and at the end can build a 'fake' the linker accepts instead of one dummy file skeleton – couldn't that be a way leading into the direction of directly getting in vast amounts of binary data in one part? In case there are stronger integrity needs, extra metadata like should be usable for verification of the origin from a valid code generator. Of course, while not completely necessary, true run time loading would be even greater… while direct interfacing to foreign (albeit simple) memory spaces deems much more intricate to me. I regularly stumbled about such cases – so I do believe this to useful. I would be happy to learn more about this – any thoughts…?? Cheers, and all the best, Nick

Shameless plug for one of my own libraries, which seems at least relevant to the problem space: https://hackage.haskell.org/package/capnp Though as a disclaimer I haven't done any benchmarking myself; my personal interest is more in RPC than in super-fast serialization. There will be a release with RPC support sometime later this month. That said, I have heard from one user who's using it to communicate with a part of their application written in C++, who switched over to from protobufs for perf, and because they needed to handle very large (> 2GiB) data. -Ian Quoting Nick Rudnick (2019-01-13 07:43:40)
On NL FP day, it struck me again when I saw an almost 1 MB *.hs file with apparent sole purpose of getting a quantity of raw data incorporated to the binary � applying some funny text encoding constructs. I remembered that, to my best knowledge, with major downside that it's compile time, this appears to be the best solution to me� Another approach I did notice several times was, say, the use of super fast parsing, to read in binary data at run time. Did I miss something? Or, more specifically � I am speaking about that kind of binary data which is (1) huge! � the 1 MB mentioned above rather being at the lower limit, (2) completely independent from the version of the Haskell compiler, (3) guaranteed (externally!) to match the structural requirements of the application referred to, (4) well managed in some way, concerning ABI issues, too (e.g. versioning, metadata headers etc.), and the question is in how far � as I believe other languages do, too � we can exploit PIC (position independent code), to read in really large quantities of binary data at run time or immediately before run time, without the need for parsing at all. E.g., a textual data representation Haskell file will generate an an object file already, for which linking only should have a limited amount of assumptions regarding its inner structure. Imagining I have a huge but simple DB table, and a kind of converter which by some simplification of a Haskell compiler generates an object file that equally matches these (limited, as I believe) assumptions, and at the end can build a 'fake' the linker accepts instead of one dummy file skeleton � couldn't that be a way leading into the direction of directly getting in vast amounts of binary data in one part? In case there are stronger integrity needs, extra metadata like should be usable for verification of the origin from a valid code generator. Of course, while not completely necessary, true run time loading would be even greater� while direct interfacing to foreign (albeit simple) memory spaces deems much more intricate to me. I regularly stumbled about such cases � so I do believe this to useful. I would be happy to learn more about this � any thoughts�?? Cheers, and all the best, Nick
participants (2)
-
Ian Denhardt
-
Nick Rudnick