Tuning Django ORM (part 1): Text Queries

Concerning text queries, Django ORM offers several helper functions. However, some of those are not backed by a database index and will be slow. Fortunately, on PostgreSQL these searches can be sped up with GIN or GIST indexes quite easily.

This article will focus on Django 1.6 with a backing PostgreSQL. It probably applies to Django 1.7, as well.

The Django ORM layer offers some useful functionality when it comes to text search:

  1. contains and icontains (https://docs.djangoproject.com/en/1.6/ref/models/querysets/#std:fieldlookup-contains)
  2. startswith and istartswith (https://docs.djangoproject.com/en/1.6/ref/models/querysets/#startswith)
  3. endswith and iendswith (https://docs.djangoproject.com/en/1.6/ref/models/querysets/#endswith)

When querying the database, these will result in wildcard SQL queries. ‘endswith’ will use a trailing wildcard, ‘startswith’ a leading wildcard, and ‘contains’ leading and trailing wildcards.

In the following, I will focus on ‘contains’ as it covers both the leading and trailing wildcards.

The Django ORM layer will create the following WHERE clause when a ‘contains’ lookup is requested:

# python

WHERE UPPER("blog_article"."title"::text) LIKE UPPER('%fancy%')

Asking PostgreSQL to explain a query making use of ‘icontains’ on shows:

blog_db=# explain analyze SELECT COUNT(*) FROM "blog_article" WHERE UPPER("blog_article"."title"::text) LIKE UPPER('%fancy%');
                                                       QUERY PLAN
 Aggregate  (cost=557.64..557.65 rows=1 width=0) (actual time=29.156..29.156 rows=1 loops=1)
   ->  Seq Scan on blog_article  (cost=0.00..557.64 rows=2 width=0) (actual time=0.236..29.126 rows=3 loops=1)
         Filter: (upper(title) ~~ '%FANCY%'::text)
         Rows Removed by Filter: 19373
 Total runtime: 29.280 ms
(5 rows)

So, a table with around 20 thousand entries will already require around 30 milliseconds for a simple ‘count’ when the result is filtered with ‘icontains’. Can this be reduced?

It can, but it requires a GIN or GIST index, that Django does not create by default as either requires an extension.

I decided to go with GIN, which seemed more appropriate in our case. Installing the extension is fairly simply under Ubuntu:

$ sudo apt-get install postgresql-contrib-9.3
$ psql blog_db
blog_db=# CREATE EXTENSION pg_trgm;

To add the index to a specific column run:

blog_db=# CREATE INDEX blog_article_title_trigram ON blog_article USING gin (UPPER(title) gin_trgm_ops);

Now, check that the index is really being used in the query:

_db=# explain analyze SELECT COUNT(*) FROM "blog_article" WHERE UPPER("blog_article"."title"::text) LIKE UPPER('%fancy%');
                                                                    QUERY PLAN
 Aggregate  (cost=51.53..51.54 rows=1 width=0) (actual time=0.323..0.323 rows=1 loops=1)
   ->  Bitmap Heap Scan on blog_article  (cost=44.02..51.53 rows=2 width=0) (actual time=0.307..0.319 rows=3 loops=1)
         Recheck Cond: (upper(title) ~~ '%FANCY%'::text)
         ->  Bitmap Index Scan on blog_article_title_trigram  (cost=0.00..44.02 rows=2 width=0) (actual time=0.287..0.287 rows=3 loops=1)
               Index Cond: (upper(title) ~~ '% FANCY%'::text)
 Total runtime: 0.411 ms
(6 rows)

The query now lasts around half a millisecond instead of nearly 30 milliseconds.

To find any other queries, that might profit from a special index, simply turn on PostgreSQL’s logging. You can tell Postgres to only log queries that last longer than a configured amount of time.

Further reading: Tuning Django ORM (part 2): Many-to-Many Queries

[fblike style=”button_count” showfaces=”false” width=”90″ verb=”like” font=”arial” float=”left”] [fbshare type=”button” float=”left”] [google_plusone size=”medium” float=”left”] [twitter style=”horizontal” float=”left” lang=”de”] [linkedin_share style=”right” float=”left”]

Verwandte Beiträge