Django database couldn’t be flushed

12 Jan 2022

This is the first post in a series I’m calling “WTH”, detailing unusual errors I’ve encountered during my day-to-day work that forced me to dig deeper than existing documentation and StackOverflow posts, exceptions that have left me scratching my head and saying “What the heck…”! Most people will probably never encounter these errors, or at least I hope so.

Jump to heading Confusion

I was removing a model in one of our Django apps – for the sake of this post I’ll refer to it as ModelBeingDeleted – and to be safe I was deploying it in two separate stages:

This code was running on multiple machines, so combining both steps might be dangerous because the database migration might finish before the code was fully deployed to all the machines, resulting in a flurry of errors.

I finished the first step locally and re-ran my test suite, which is when I came across this error I’d never seen before:

django.core.management.base.CommandError: Database MY_DATABASE couldn't be flushed. Possible reasons:
  * The database isn't running or isn't configured correctly.
  * At least one of the expected database tables doesn't exist.
  * The SQL was invalid.
Hint: Look at the output of 'django-admin sqlflush'. That's the SQL this command wasn't able to run.

This didn’t make any sense to me, especially because the test that was failing was in a completely different app in my service. I knew my database was working, because otherwise none of the other tests in the suite would have passed; I hadn’t applied any database migrations so all the tables should be present; and I had faith in Django’s ability to write correct SQL for me.

Going a little deeper into the stack trace I saw another error:

django.db.utils.NotSupportedError: cannot truncate a table referenced in a foreign key constraint
DETAIL:  Table "model_being_deleted" references "another_model".
HINT:  Truncate table "model_being_deleted" at the same time, or use TRUNCATE CASCADE.

This confirmed that the error was a result of my changes, which was actually a relief! This second error also made more sense to me than the first one: you shouldn’t be allowed to truncate a table that references another table that isn’t also being truncated as that might break foreign key relationships.

Next up I ran the django-admin sqlflush command as the first error suggested, and sure enough the table for ModelBeingDeleted wasn’t in the SQL, which was confusing because I was sure I hadn’t run my migration to drop the table yet. I then shelled into my local database and confirmed that the table was still there – it was, so I definitely hadn’t run the migration by mistake.

Looking at the docs and in StackOverflow wasn’t helping at this point, so I started trying to make some uneducated guesses. First off I added ModelBeingDeleted back into my app’s models.py file. I re-ran the test and it passed! It also passed even if the body of ModelBeingDeleted was just a single pass statement. We like to have 100% coverage in our services, though, so this would have meant I’d need a test to exercise this model somehow. Fine in the short term, of course, since I was planning to drop the table in the very next deployment, but it irked me. Why was that suddenly working? Why was this unrelated test failing in the first place?

Jump to heading Understanding

One oddity of the failing test was that it was running in a TransactionTestCase, which is a rarity in this particular test suite, so that gave me the beginning of a thread to pull. As with most things in Django and Python you’re free to peruse the internals on Github, so as long as you keep pulling that thread (and can keep track of all the tabs you have open) you should be able to get closer to the answer with enough time.

Looking at TransactionTestCase._fixture_teardown I saw that it calls the flush management command, which calls sql_flush to build the SQL for the teardown. That builds a list of table names with connection.introspection.django_table_names, which iterates over the models returned by connection.introspection.get_migratable_models, and that only looks at migrateable models in installed apps. Whew, progress! This is why adding ModelBeingDeleted back into my models.py file caused the test to pass. Doing so re-registered the model, so its table was added to the SQL command to flush the test database. That still felt like an awkward thing to have to do, though, so I continued to read.

I finally came across TransactionTestCase.available_apps, which is listed as an advanced testing feature in the Django docs. It allows you to limit which apps are available to an individual test case, but it has some pretty heavy caveats so do read the docs and use it at your own peril. Digging into the code again it became clear that setting available_apps to anything other than None controlled the allow_cascade param that is ultimately passed to my (PostgreSQL) database’s sql_flush method and

determines whether truncation may cascade to tables with foreign keys pointing [to] the tables being truncated. PostgreSQL requires a cascade even if these tables are empty.

Suddenly the hint from the second error message started to make sense!

HINT:  Truncate table "model_being_deleted" at the same time, or use TRUNCATE CASCADE.

Setting available_apps allowed me to tell the database to cascade its truncate operations, avoiding the original problem without me needing to add ModelBeingDeleted back into my models.py file. This seemed like the best solution for my use case because:

All the same, in the next deployment I removed available_apps from this test case, just so a future developer wouldn’t accidentally trip over it, find this blog post and write me an angry email.