How to search in a huge table on Django admin
Hello everyone!
We all know that the Django admin is a super cool tool for Django. You can check your models, and add/edit/delete records from the tables. If you are familiar with Django, I’m sure you already know about it.
I was given a task: Our client wanted to search in a table by one field. It seems easy enough, right? Well, the tricky part is that the table has 523.803.417 records.
Wow. 523.803.417 records.
At least the model was not that complex:
On models.py
:
class HugeTable(models.Model):
"""Huge table information"""
search_field = models.CharField(max_length=10, db_index=True, unique=True)
is_valid = models.BooleanField(default=True)
def __str__(self):
return self.search_field
So for Django admin, it should be a breeze, right? WRONG.
The process
First, I just added the search field on the admin.py:
On admin.py
:
class HugeTableAdmin(admin.ModelAdmin):
search_fields = ('search_field', )
admin.site.register(HugeTable, HugeTableAdmin)
And it worked! I had a functioning search field on my admin.
Only one problem: It took 3mins+ to load the page and 5mins+ to search. But at least it was working, right?
WTF?
First, let’s split the issues:
- Why was it taking +3mins just to load the page?
- Why was it taking +5mins to search if the search field was indexed?
I started tackling the first one, and found it quite easily: Django was getting only 100 records at a time, but it had to calculate the length for the paginator and the “see more” button on the search bar
So near, yet so far
Improving the page load
A quick look at the Django docs told me how to deactivate the “see more” query:
ModelAdmin.show_full_result_count
Set show_full_result_count to control whether the full count of objects should be displayed on a filtered admin page (e.g. 99 results (103 total)). If this option is set to False, a text like 99 results (Show all) is displayed instead.
On admin.py
:
class HugeTableAdmin(admin.ModelAdmin):
search_fields = ('search_field', )
show_full_result_count = False
admin.site.register(HugeTable, HugeTableAdmin)
That fixed one problem, but how about the other? It seemed I needed to do my paginator.
Thankfully, I found an awesome post by Haki Benita called “Optimizing the Django Admin Paginator” that explained exactly that. Since I didn’t need to know the records count, I went with the “Dumb” approach:
On admin.py
:
from django.core.paginator import Paginator
from Django.utils.functional import cached_property
class DumbPaginator(Paginator):
"""
Paginator that does not count the rows in the table.
"""
@cached_property
def count(self):
return 9999999999
class HugeTableAdmin(admin.ModelAdmin):
search_fields = ('search_field', )
show_full_result_count = False
paginator = DumbPaginator
admin.site.register(HugeTable, HugeTableAdmin)
And it worked! The page was loading blazingly fast :) But the search was still ultra slow. So let’s fix that.
Improving the search
I checked A LOT of options. I almost went with Haystack, but it seemed a bit overkill for what I needed. I finally found this super cool tool: djangoql. It allowed me to search the table by using sql like operations, so I could search by search_field
and make use of the indexation. So I installed it:
On settings.py
:
INSTALLED_APPS = [
...
'djangoql',
...
]
On admin.py
:
from django.core.paginator import Paginator
from django.utils.functional import cached_property
from djangoql.admin import DjangoQLSearchMixin
class DumbPaginator(Paginator):
"""
Paginator that does not count the rows in the table.
"""
@cached_property
def count(self):
return 9999999999
class HugeTableAdmin(DjangoQLSearchMixin, admin.ModelAdmin):
show_full_result_count = False
paginator = DumbPaginator
admin.site.register(HugeTable, HugeTableAdmin)
And it worked! By performing the query:
search_field = "my search query"
I get my results in around 1 second.
Is it done?
Yes! Now my client can search by search_field
on a table of 523.803.417 records, very easily and very quickly.
I’m planning to post more Python/Django things I’m learning by working with this client, so you might want to stay tuned :)