activerecord/lib/active_record/connection_adapters/postgresql/oid/type_map_initializer.rb · fb25f3ef2be5181c98a713a25503d9ed2e0454dc · Rails relative code / rails-deleted-2

Dec 17, 2020

postgres: Make the initial type map query less expensive · 42ce7057

Felipe Sateler authored Dec 17, 2020

When the pg_type table is large, the initial type map query is very expensive.
This is due to two reasons:

1. The complete query does not allow use of any indexes.
2. The complete query returns much more rows than we need

Point 1 is fixed by splitting up each OR into its own query. For reasons
unknown to me, the query plan is slower with OR instead of separate
queries.

Point 2 is a problem when databases have lots of tables. This is because
for each table, postgres creates both a type for each record, and a type
for an array of records. The latter types are matched by the condition
`t.typelem != 0`. There is no easy way to filter out these types, but
since we already make a new query due to point 1, we can change our query
to bring us exactly the data we need: all the types that reference (via
`typelem`) a type we already know about. This way we greatly reduce the
number of rows returned.

Additionally, we remove the condition
`t.typinput = 'array_in(cstring,oid,integer)'::regprocedure`, since it
brings no value. Array types have a nonzero `typelem`.

This change brings a type map initialization performance improvement.
A production system with aprox 1.6MM rows in `pg_type`, improves from
2.5-4.1 seconds to 1.4-1.6 seconds.

42ce7057

postgres: Make the initial type map query less expensive

Felipe Sateler authored Dec 17, 2020

When the pg_type table is large, the initial type map query is very expensive.
This is due to two reasons:

1. The complete query does not allow use of any indexes.
2. The complete query returns much more rows than we need

Point 1 is fixed by splitting up each OR into its own query. For reasons
unknown to me, the query plan is slower with OR instead of separate
queries.

Point 2 is a problem when databases have lots of tables. This is because
for each table, postgres creates both a type for each record, and a type
for an array of records. The latter types are matched by the condition
`t.typelem != 0`. There is no easy way to filter out these types, but
since we already make a new query due to point 1, we can change our query
to bring us exactly the data we need: all the types that reference (via
`typelem`) a type we already know about. This way we greatly reduce the
number of rows returned.

Additionally, we remove the condition
`t.typinput = 'array_in(cstring,oid,integer)'::regprocedure`, since it
brings no value. Array types have a nonzero `typelem`.

This change brings a type map initialization performance improvement.
A production system with aprox 1.6MM rows in `pg_type`, improves from
2.5-4.1 seconds to 1.4-1.6 seconds.