Haskell SQL backends: How is data handed over?

Dear Café, I was discussing with a friend the fastest way to shovel data into a SQL database after parsing the raw (textual) data with Haskell. I am concerned about numeric conversion: In my experience parsing the file structure is fast. The majority of the time is spent converting strings of digits to numbers. I hoped that once Haskell has constructed a Double, the backend driver can hand that binary value over to the database engine. But Database.HDBC.Statement has a field orginalQuery :: String and HDBC has functions prepare :: Connection -> String -> IO Statement run :: Connection -> String -> [SqlValue] -> IO Integer which makes me wonder whether the backend constructs a complete SQL query statement in form of a string (including the SqlValues), and hands that to the database driver to parse. Is that indeed so? If yes, the numbers should never be parsed in the first place: The DB will do that anyways. I might as well construct a new CSV file and let the database do a bulk insert on that. The above is specific to the HDBC backend. If there are other backends/frameworks that do the marshalling more efficiently, please let me know. Thanks Olaf

Hi Olaf, I can't speak for SQL in general, but if you're targeting PostgreSQL note that the hasql library uses a binary wire format. You'll need to use parameterized queries to benefit from this, but it would let you avoid encoding digits to ASCII. Ollie On Tue, 9 Aug 2022, at 8:34 PM, Olaf Klinke wrote:
Dear Café,
I was discussing with a friend the fastest way to shovel data into a SQL database after parsing the raw (textual) data with Haskell. I am concerned about numeric conversion: In my experience parsing the file structure is fast. The majority of the time is spent converting strings of digits to numbers. I hoped that once Haskell has constructed a Double, the backend driver can hand that binary value over to the database engine. But Database.HDBC.Statement has a field
orginalQuery :: String
and HDBC has functions
prepare :: Connection -> String -> IO Statement run :: Connection -> String -> [SqlValue] -> IO Integer
which makes me wonder whether the backend constructs a complete SQL query statement in form of a string (including the SqlValues), and hands that to the database driver to parse. Is that indeed so? If yes, the numbers should never be parsed in the first place: The DB will do that anyways. I might as well construct a new CSV file and let the database do a bulk insert on that. The above is specific to the HDBC backend. If there are other backends/frameworks that do the marshalling more efficiently, please let me know.
Thanks Olaf
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

That does make me wonder how many database interfaces use the
scientific package for numbers, though.
On Wed, Aug 10, 2022 at 3:11 AM Oliver Charles
Hi Olaf,
I can't speak for SQL in general, but if you're targeting PostgreSQL note that the hasql library uses a binary wire format. You'll need to use parameterized queries to benefit from this, but it would let you avoid encoding digits to ASCII.
Ollie
On Tue, 9 Aug 2022, at 8:34 PM, Olaf Klinke wrote:
Dear Café,
I was discussing with a friend the fastest way to shovel data into a SQL database after parsing the raw (textual) data with Haskell. I am concerned about numeric conversion: In my experience parsing the file structure is fast. The majority of the time is spent converting strings of digits to numbers. I hoped that once Haskell has constructed a Double, the backend driver can hand that binary value over to the database engine. But Database.HDBC.Statement has a field
orginalQuery :: String
and HDBC has functions
prepare :: Connection -> String -> IO Statement run :: Connection -> String -> [SqlValue] -> IO Integer
which makes me wonder whether the backend constructs a complete SQL query statement in form of a string (including the SqlValues), and hands that to the database driver to parse. Is that indeed so? If yes, the numbers should never be parsed in the first place: The DB will do that anyways. I might as well construct a new CSV file and let the database do a bulk insert on that. The above is specific to the HDBC backend. If there are other backends/frameworks that do the marshalling more efficiently, please let me know.
Thanks Olaf
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
_______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- brandon s allbery kf8nh allbery.b@gmail.com

Hi Olaf, It's unlikely that a HDBC database driver will reconstruct a String query and execute that against the database. The originalQuery is simply the one you originally provided to prepare the Statement. You probably want Statement's execute :: [SqlValue] -> IO Integer or executeMany :: [[SqlValue]] -> IO () The executeMany is intended to send a batch of prepared arguments to the database driver. However, I did note that the PostgreSQL driver for HDBC doesn't handle this efficiently. So, check the implementation for your specific backend. https://github.com/hdbc/hdbc-postgresql/blob/master/Database/HDBC/PostgreSQL... All that said, if you are doing a large bulk import, it could certainly be more efficient to use your database's facilities for doing bulk import. e.g. https://www.postgresql.org/docs/current/sql-copy.html You might also find that you are better off using an alternative to HDBC for your specific dbms, depending on the performance you require and whether executeMany is implemented efficiently in the HDBC driver for your dbms.
participants (4)
-
Brandon Allbery
-
Olaf Klinke
-
Oliver Charles
-
Steven Shaw