Photo by Chris Ried on Unsplash

We at the Open Library of Humanities and Birkbeck CTP have been working on a Repository system initially built to provide a space for Preprints. We came across an issue where once live data had been imported into the system the main dashboard began to slow down. This wasn’t unexpected — when you add a tonne of data sometimes things get a bit slower — but this was excruciating.

A little investigation with the Django Debug Toolbar showed that we were repeating a bunch of queries over and over again. This is quite easy to tidy up — we added some select_related() and prefetch_related() calls to limit the number of additional queries being made in the templates but due to the nature of one of our models the page remained relatively slow to load. The model looks like this:

When looping through a bunch of preprints on the template we wanted to display the name of the first preprint author. This can be done using the following in the template:

We could add a property to get the first author for us but this wouldn’t stop the additional query from being executed. And we use select_related on preprintauthor_set already. To stop the template from executing additonal queries to fech the Author FK we can create a custom manager that will ensure that when a PreprintAuthor is fetched it will get the Author object as well.

With this we pass the heavy lifting over to the SQL backend which is much faster than Python at dealing with this. Now all we need to do is add this to our PreprintAuthor model.

We reduced the number of queries running on the Dashbord to 17 (15 on a second load due to some cache hits!)

Pub Tech developer for Birkbeck CTP

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store