Arrays and postgresql-simple

Ok, Bas van Dijk has done some preliminary work in adding support for PostgreSQL's array types to postgresql-simple, and I have some questions about arrays and types that I'm hoping Jeff Davis can answer. Here's a link, and a simple experiment in psql: https://github.com/basvandijk/postgresql-simple/commit/eb04ca39c5c22e3f4d083... => create table strings (x text); => insert into strings values ('foo','bar','baz','hello world','goodbye cruel world'); => select array_agg(x) from strings group by x like ('% %'); array_agg --------------------------------------- {foo,bar,baz} {"hello world","goodbye cruel world"} (2 rows) So far so good, but what if we try creating an array of arrays? => select array_agg(*) from (select array_agg(x) from strings group by x like ('% %')) q; ERROR: function array_agg() does not exist LINE 1: select array_agg(*) from (select array_agg(x) from strings g... ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. Now, presumably, postgresql does actually support arrays of arrays, but I'm guessing that you need to create the array of array of string type before you can actually run this query... The reason I ask is that Bas van Dijk has done some work on adding support for arrays to postgresql-simple. And he's modified the type cache from TypeOID -> IO TypeName to become a TypeOID -> IO TypeInfo, where the TypeInfo type is defined as follows: data NamedOid = NamedOid { typoid :: !PQ.Oid , typname :: !ByteString } deriving Show data TypeInfo = TypeInfo { typ :: !NamedOid , typelem :: !(Maybe NamedOid) } deriving Show I think this is a perfectly reasonable first attempt, but I don't think it's correct, because I'm pretty sure that PostgreSQL does actually support arrays of arrays if you know what you are doing. So I think that TypeInfo needs to look something more like data TypeInfo = Plain { typ :: !NamedOid } | Array { typ :: !NamedOid , typelem :: !TypeInfo } deriving Show The real issue here is one dealing with purity and effects: In some sense, Bas's first attempt carries the same information, but you might need to do some IO in order to retrieve it. (Though this assumes that one can query the type cache directly, which is something I should probably add anyway...) But the fromField method, which is is the most likely consumer of this information, isn't allowed to do IO. So I think we really want to change this type. Also, this type makes it more natural to add support for the new range types available in PostgreSQL 9.2. (Which Jeff is largely responsible for, by the way...) The extended TypeInfo might look like this: data TypeInfo = Plain { typ :: !NamedOid } | Array { typ :: !NamedOid , typelem :: !TypeInfo } | Range { typ :: !NamedOid , typelem :: !TypeInfo } deriving Show So, I think that's enough issues to talk about for the time being. Best, Leon

On Mon, Jul 30, 2012 at 5:00 PM, Leon Smith
Now, presumably, postgresql does actually support arrays of arrays, but I'm guessing that you need to create the array of array of string type before you can actually run this query...
Actually, PostgreSQL does not truly support the notion of "arrays of arrays". PostgreSQL arrays are actually N-dimensional matrices, where N is a number from 1 to 6. For example, '{{1,2},{3,4}}' is a valid array, but '{{1,2},{3}}' is not (inconsistent dimensions). Some other weird things about PostgreSQL arrays: * None of an array's dimensions may be zero. For example, '{{},{}}' is invalid. However, an empty, dimensionless array is allowed: '{}'. * Many aggregate functions, such as array_agg, return null instead of an empty array when given an empty set. Care must be taken when using array_agg to construct an array. * Not every type uses ',' as the delimiter character (but almost all of them do). For example, the box type uses ';' instead, as it uses ',' to delimit coordinates. The typdelim column in pg_type [1] indicates what delimiter a given type uses. * PostgreSQL arrays can have explicit dimensions: > SELECT '[5]={1,2,3,4,5}'::int[]; int4 ------------- {1,2,3,4,5} > SELECT '[3:5]={3,4,5}'::int[]; int4 --------------- [3:5]={3,4,5} > SELECT '[2][3]={{1,2,3},{4,5,6}}'::int[]; int4 ------------------- {{1,2,3},{4,5,6}} As you can see, PostgreSQL arrays are perilous. They're useful nonetheless. I would benefit from postgresql-simple having basic array support. It would be one less obstacle to switching my application to use postgresql-simple instead of the crummy DB code I wrote. In that crummy code, I have a function for reading a PostgreSQL array. It supports explicit dimensions, NULL, and unescaping (array items can be quoted with double quotes, and may contain backslash escapes). However, it only supports single-dimensional arrays, and I have not written a corresponding function to generate a PostgreSQL array. Want me to clean up my array parsing code and post it? Thanks, -Joey [1]: http://www.postgresql.org/docs/current/static/catalog-pg-type.html

Well, this is the kind of misinformation that needs to be cleared up.
Thanks Joey. =)
Some of this is understandable, some of it is unfortunate. I'd go so far
as to say that the array_agg issue seems to be a design bug. I mean,
PostgreSQL is relatively nice for SQL, but it is still SQL... (sigh)
So can the types that can be array elements be extended in any way, or are
these types set in stone? (without modifying the PostgreSQL source itself,
of course...) And if the array elements are set, does 9.2 support
arrays of range types?
The TypeInfo declaration I suggested might still be better, even if the set
of types that PostgreSQL supports is rather more limited than what it can
express. But I may well be wrong.
And regarding the array parsing issue, to be honest I wasn't going to
review that code too closely and trust whomever to get it approximately
right, and then worry about fixing it if somebody found some issues. But
if you've dealt with this issue already, then I think you should post
your code and then hopefully we can figure out how to improve one using
ideas from the other.
On Mon, Jul 30, 2012 at 8:06 PM, Joey Adams
On Mon, Jul 30, 2012 at 5:00 PM, Leon Smith
wrote: Now, presumably, postgresql does actually support arrays of arrays, but I'm guessing that you need to create the array of array of string type before you can actually run this query...
Actually, PostgreSQL does not truly support the notion of "arrays of arrays". PostgreSQL arrays are actually N-dimensional matrices, where N is a number from 1 to 6. For example, '{{1,2},{3,4}}' is a valid array, but '{{1,2},{3}}' is not (inconsistent dimensions).
Some other weird things about PostgreSQL arrays:
* None of an array's dimensions may be zero. For example, '{{},{}}' is invalid. However, an empty, dimensionless array is allowed: '{}'.
* Many aggregate functions, such as array_agg, return null instead of an empty array when given an empty set. Care must be taken when using array_agg to construct an array.
* Not every type uses ',' as the delimiter character (but almost all of them do). For example, the box type uses ';' instead, as it uses ',' to delimit coordinates.
The typdelim column in pg_type [1] indicates what delimiter a given type uses.
* PostgreSQL arrays can have explicit dimensions:
> SELECT '[5]={1,2,3,4,5}'::int[]; int4 ------------- {1,2,3,4,5}
> SELECT '[3:5]={3,4,5}'::int[]; int4 --------------- [3:5]={3,4,5} > SELECT '[2][3]={{1,2,3},{4,5,6}}'::int[]; int4 ------------------- {{1,2,3},{4,5,6}}
As you can see, PostgreSQL arrays are perilous. They're useful nonetheless. I would benefit from postgresql-simple having basic array support. It would be one less obstacle to switching my application to use postgresql-simple instead of the crummy DB code I wrote.
In that crummy code, I have a function for reading a PostgreSQL array. It supports explicit dimensions, NULL, and unescaping (array items can be quoted with double quotes, and may contain backslash escapes). However, it only supports single-dimensional arrays, and I have not written a corresponding function to generate a PostgreSQL array.
Want me to clean up my array parsing code and post it?
Thanks, -Joey
[1]: http://www.postgresql.org/docs/current/static/catalog-pg-type.html

On Mon, Jul 30, 2012 at 9:34 PM, Leon Smith
So can the types that can be array elements be extended in any way, or are these types set in stone? (without modifying the PostgreSQL source itself, of course...) And if the array elements are set, does 9.2 support arrays of range types?
(The following is tested with PostgreSQL 8.4.11) Take a look in the system catalog:
SELECT oid, typname, typcategory, typarray FROM pg_type; oid | typname | typcategory | typarray --------+-----------------------------------+-------------+---------- 16 | bool | B | 1000 17 | bytea | U | 1001 18 | char | S | 1002 19 | name | S | 1003 20 | int8 | N | 1016 21 | int2 | N | 1005 22 | int2vector | A | 1006 23 | int4 | N | 1007 --- snip --- 1000 | _bool | A | 0 1001 | _bytea | A | 0 1002 | _char | A | 0 1003 | _name | A | 0 1005 | _int2 | A | 0 1006 | _int2vector | A | 0 1007 | _int4 | A | 0 --- snip ---
Every type has a corresponding array type. When a new type is created, a corresponding array type is also created in pg_type. Tables are types, too, and have corresponding array types. It appears int2vector and oidvector are regular value types, but overload the ARRAY[] notation and try to act like arrays. Read on for more details. Apparently, int2vector, which is an array-like type, has a corresponding type _int2vector. This means you can have an array of int2vectors: > SELECT '{1 2,3 4 5,6 7 8 9}'::int2vector[]; int2vector --------------------------- {"1 2","3 4 5","6 7 8 9"} Notice that each int2vector is independent; no dimension constraining is happening. However, the ARRAY syntax appears to be overloaded for int2vector: > SELECT ARRAY[1,2,3] :: int2vector; array ------- 1 2 3 (1 row) But notice the following: > SELECT ARRAY['1 2', '3 4 5', '6 7 8']::int2vector[]; array ------------------------- {"1 2","3 4 5","6 7 8"} (1 row) > SELECT ARRAY['23 45' :: int2vector, '67 89' :: int2vector]; array ------- 1 0 (1 row) The first example works as expected (under our assumption that int2vector is a regular type and not an array type), but the last example yields the wrong type (int2vector instead of int2vector[]) and produces garbage. This appears to be the case even with recent Postgres. If you want to support an array of int2vectors, leave off the signatures to work around the bug described above. -Joey

Right, I have seen the array types in the catalog, so this much I
understand. (Though again, thanks for pointing out some of the subtleties
here.) But the question was whether or not it was possible to add array
types to the catalog, to store other array types for example.
The answer appears to be yes, assuming I read this page correctly:
http://www.postgresql.org/docs/9.1/interactive/sql-createtype.html
Though doing so is sounds like it's a little involved. (i.e. more
involved than something simple like "CREATE ARRAY TYPE FOR int2vector[];"
which is kind of what I was hoping might exist... but I'm not too upset
that it doesn't either.)
Also, regarding the subtleties, as far as postgresql-simple's code is
concerned, we really only care about subtleties of the format that
PostgreSQL returns to us. Though the syntactic subtleties may be of
interest to users of postgresql-simple, so it may be worthwhile putting
some of them in the documentation.
Best,
Leon
On Tue, Jul 31, 2012 at 12:19 AM, Joey Adams
On Mon, Jul 30, 2012 at 9:34 PM, Leon Smith
wrote: So can the types that can be array elements be extended in any way, or are these types set in stone? (without modifying the PostgreSQL source itself, of course...) And if the array elements are set, does 9.2 support arrays of range types?
(The following is tested with PostgreSQL 8.4.11)
Take a look in the system catalog:
SELECT oid, typname, typcategory, typarray FROM pg_type; oid | typname | typcategory | typarray --------+-----------------------------------+-------------+---------- 16 | bool | B | 1000 17 | bytea | U | 1001 18 | char | S | 1002 19 | name | S | 1003 20 | int8 | N | 1016 21 | int2 | N | 1005 22 | int2vector | A | 1006 23 | int4 | N | 1007 --- snip --- 1000 | _bool | A | 0 1001 | _bytea | A | 0 1002 | _char | A | 0 1003 | _name | A | 0 1005 | _int2 | A | 0 1006 | _int2vector | A | 0 1007 | _int4 | A | 0 --- snip ---
Every type has a corresponding array type. When a new type is created, a corresponding array type is also created in pg_type. Tables are types, too, and have corresponding array types.
It appears int2vector and oidvector are regular value types, but overload the ARRAY[] notation and try to act like arrays. Read on for more details.
Apparently, int2vector, which is an array-like type, has a corresponding type _int2vector. This means you can have an array of int2vectors:
> SELECT '{1 2,3 4 5,6 7 8 9}'::int2vector[]; int2vector --------------------------- {"1 2","3 4 5","6 7 8 9"}
Notice that each int2vector is independent; no dimension constraining is happening. However, the ARRAY syntax appears to be overloaded for int2vector:
> SELECT ARRAY[1,2,3] :: int2vector; array ------- 1 2 3 (1 row)
But notice the following:
> SELECT ARRAY['1 2', '3 4 5', '6 7 8']::int2vector[]; array ------------------------- {"1 2","3 4 5","6 7 8"} (1 row)
> SELECT ARRAY['23 45' :: int2vector, '67 89' :: int2vector]; array ------- 1 0 (1 row)
The first example works as expected (under our assumption that int2vector is a regular type and not an array type), but the last example yields the wrong type (int2vector instead of int2vector[]) and produces garbage. This appears to be the case even with recent Postgres.
If you want to support an array of int2vectors, leave off the signatures to work around the bug described above.
-Joey

On Mon, 2012-07-30 at 17:00 -0400, Leon Smith wrote:
Now, presumably, postgresql does actually support arrays of arrays, but I'm guessing that you need to create the array of array of string type before you can actually run this query...
Actually, array support in postgresql is a little strange, unfortunately. There is no way to create an array type (technically there is in some special cases, which I just learned now, but that doesn't work for arrays of arrays anyway); array types are created implicitly for every non-array type. The following won't work: select array_agg(xs) from (select array_agg(x) as xs from foo group by g) s; But arrays are multi-dimensional, so the following does work: select ARRAY[ARRAY[1,2],ARRAY[3,4]]; I'm not sure what parts of this craziness are part of the SQL spec, and what parts are unique to postgresql.
I think this is a perfectly reasonable first attempt, but I don't think it's correct, because I'm pretty sure that PostgreSQL does actually support arrays of arrays if you know what you are doing. So I think that TypeInfo needs to look something more like
data TypeInfo = Plain { typ :: !NamedOid } | Array { typ :: !NamedOid , typelem :: !TypeInfo } deriving Show
I think this version is a little closer to how the types are represented in the PG catalog, so I think this is the right direction to go, but there are some alternatives along these same lines. For instance:
data TypeInfo = Plain { typ :: !NamedOid } | Array { typ :: !NamedOid , typelem :: !TypeInfo } | Range { typ :: !NamedOid , typelem :: !TypeInfo } deriving Show
That allows a range of arrays or an array of ranges, which is good, because postgres supports those things too. But it supports arrays of arrays, which postgresql does not (at least not in the type system). A more complex proposal might look something like: data TypeInfo = Plain { typ :: !NamedOid , typarray :: !NamedOid } | Enum | Composite { typattrs :: ![TypeInfo] } | Domain { typbase :: !NamedOid } | Array { typ :: !NamedOid , typelem :: !TypeInfo } | Range { typ :: !NamedOid , typarray :: !NamedOid , rngsubtype :: !TypeInfo } deriving Show That still allows arrays of arrays, which is slightly awkward, but I don't see a better way. Am I overcomplicating things? Regards, Jeff Davis
participants (3)
-
Jeff Davis
-
Joey Adams
-
Leon Smith