Monkeypatching Django
I haven't yet posted here about my project nanodjango. If you haven't heard of it yet, it is a package which lets you write Django in a single file. I gave a lightning talk at Djangocon US and have written an introductory blog post over at Lincoln Loop, if you want to find out more from a user's perspective - but here I'm going to talk about how it works.
A couple of years ago I took over yet another project where the previous developers had heard "Flask is easier than Django", and I saw they had spent a long time and a lot of money accidentally building their own poor version of Django, burying the project under years of technical debt. It became clear why they had handed the project off saying they couldn't support it any more - even simple changes required hours of wading through spaghetti code, and upgrading the very outdated packages was daunting, if not impractical. We got frustrated working on it, the client got frustrated paying so much for small changes, and it became clear the only practical option was a rewrite - which the client couldn't afford.
Choosing Flask had been a mistake. Don't get me wrong, Flask has its place, but you need experience to make good decisions that will help your project grow - experience which the original developers didn't have. Django's startproject
may be offputting to beginners, but it does force you to follow a consistent structure which scales and makes it easier to hand off to another team.
And when your project grows to need a database, forms or admin, Flask devs will be rolling their own solutions or pulling in third-party libraries with varying levels of support and compatibility, while Django developers will be using core batteries which will always work for the latest version of Django, and we won't look to pull in third-party libraries until we get to much higher-level functionality. This leads to a more stable platform to develop against, and for those reasons I'm firmly of the opinion that Django is the better option for complex projects.
But I thought to myself, why can't Django be a good fit for smaller projects and prototypes too? Why can't Django work like Flask, in a single file?
Django in a single file in itself isn't particularly novel or special - boiled down, Django just routes requests to functions which return responses, so getting it to handle views in a single file had been done, and done well - first in 2009 with djing (Simon Willison) and Django Inside Tornado (Yann Malet), and more recently Using Django as a Micro-Framework (Carlton Gibson), django-microframework (Will Vincent and Peter Baumgartner), μDjango (Paolo Melchiorre), django-singlefile (Andrew Godwin) and Django from first principles (Eric Matthes) - and many more in between.
Most of these do what Flask does - they implement views and routing, and sensibly stop there. I enjoy doing silly things with Python though, so wanted to see if I could give you access to all of Django's batteries from a single file - the bits which expect you to structure things properly in separate modules, so they can control what order things are loaded in to let them sprinkle the syntactic sugar that makes Django so great. The things that use metaclasses and implicit registration. I wanted models.
This was fun. Models need to register with your app in the apps registry, so you need to get the current module into INSTALLED_APPS
- but you can't just add the current module to INSTALLED_APPS
to get it in there, because your module hasn't finished importing. If it hasn't finished importing, the apps registry can't resolve a reference to it, and it will try to load it again - putting you into an infinite import loop.
I tried various silly things, like monkeypatching Django's models and apps to delay model registration until import was complete, disabling them completely and parsing the AST to rewrite the single file into a proper app structure in-memory on demand, but then I realised I could just trick apps into not importing the model if it already looked like it was imported.
The first problem is that Django expects to load apps itself during setup, but we're going to run setup from within our app, so we need to make Django think we've already loaded it. To do this we need an AppConfig with a hard-coded path:
class NanodjangoAppConfig(AppConfig):
path = str(get_script_path())
and then we need to manually add that to the app registry, before we call django.setup()
, which will in turn call apps.populate()
to pick it up:
app_config = NanodjangoAppConfig(app_name=get_script_name(), app_module=get_script_module())
apps_registry.app_configs[app_config.label] = app_config
app_config.apps = apps_registry
app_config.models = {}
That's solved app load order - we've now force-registered an app using our script name, pointing at our script module, and it knows where to look for anything it might want to look at. In nanodjango, that code is run as part of the app = Django()
intialisation step, just before it calls django.setup()
- which is why models can't be defined in Nanodjango before the app object exists.
But now we've got another problem - Django's model metaclass magic has expectations which we don't want to meet - in particular, it looks for the app name in MyModel.__module__
- but because of how we're running our script, it will get the string __main__
, rather than the name of the app we're trying to dynamically create.
The metaclass then uses this module name to do a lookup in the apps registry, but of course doesn't find anything - we registered our AppConfig
under a different name - so we'll get a fatal error saying our app __main__
isn't in INSTALLED_APPS
.
To solve this, we need to change the behaviour of Django's ModelBase
metaclass. If you need a refresher, we talked about metaclasses last week.
For those who haven't come across it before, monkeypatching is the practice of changing other people's code at runtime. Python makes this possible because everything is an object - we can put a reference to a function in a variable, and overwrite the original just as easily.
The most common time to do this is during testing, where you want to make a temporary change to stub something out - for example, this would be a way to ensure that os.path.exists
says that every file starting test://
exists:
import os
# Create a reference to the old exists() method
old_exists = os.path.exists
def fake_exists(path):
# Our override logic
if path.startswith('test://'):
return True
# Otherwise fall back to the old behaviour
return old_exists(path)
# Overwrite the exists() method with our new function
os.path.exists = fake_exists
The other reason is to make your code work with someone else's - their code doesn't do what you want, but for whatever reason you don't want to fork it or submit a PR. That's what we have with nanodjango.
Essentially all we're doing is overwriting a variable so that when other code tries to use it, it finds our new value or function which does what we want, but this introduces potential problems.
You do have to be careful not to break other people's code. In a test we can get away with functions not doing what they were originally meant to (either intentionally or accidentally), but if you do it in production code things are going to go very wrong very quickly.
It's important that you write unit tests to check that the old and new functions behave the same under normal circumstances - if nothing else, it will tell you when the upstream package changes. You also have to be extra careful that your code doesn't introduce bugs or exceptions - monkeypatched code can be a pain to work past in a stack trace.
But as long as you're aware of the risks, and you take care around them, monkeypatching can be another powerful ally when trying to make your code easier to use.
Which brings me back to nanodjango's problem with the ModelBase
metaclass. If we look at the source, we'll see the error comes from ModelBase.new:
class ModelBase(type):
def __new__(cls, name, bases, attrs, **kwargs):
...
if getattr(meta, "app_label", None) is None:
if app_config is None:
if not abstract:
raise RuntimeError(
"Model class %s.%s doesn't declare an explicit "
"app_label and isn't in an application in "
"INSTALLED_APPS." % (module, name)
)
else:
app_label = app_config.label
...
The app_config
is None
, it's not abstract
, and we don't have an app_label
in our class Meta
definition, so we get the error. That last condition looks promising though - could we just tell nanodjango users that every model they define needs an app_label
?
class MyModel(models.Model):
...
class Meta:
app_label = "myscript"
If you try it you'll see that would work, but it's messy - I don't want to write that every time I define a model, and neither will my users. Monkeypatching to the rescue:
# Collect a reference to the old __new__
old_new = ModelBase.__new__
def new_new(cls, name, bases, attrs, **kwargs):
# See if this is a nanodjango model, if it is defined in __main__
module = attrs["__module__"]
if module == "__main__":
# It's a nanodjango model, so let's say it's from our new module...
attrs["__module__"] = app_name
# ... update a Meta class if it exists ...
attr_meta = attrs.get("Meta")
if attr_meta:
if not getattr(attr_meta, "app_label", None):
attr_meta.app_label = app_name
# ... or create one if it doesn't
else:
class attr_meta:
app_label = app_name
attrs["Meta"] = attr_meta
# Call the original ModelBase.__new__
return old_new(cls, name, bases, attrs, **kwargs)
# Swap our function in so any new model will be created using our code
ModelBase.__new__ = new_new
This is the actual code from the current nanodjango patch_modelbase() function. If you look in that file you'll notice that I like to put my monkeypatches in functions which need to be called explicitly, rather than applying themselves during import - it makes it slightly clearer when and where things happen.
The approach I've taken here is very defensive - I only make changes if I know it's something I'm in charge of (a model defined in the __main__
module can only ever come from nanodjango), and even then I'm careful not to disturb any Meta.app_label
value which may already exist.
I disturb the original code as little as possible - I was lucky here that I found a route to fix it by setting Meta.app_label
before it's checked; that's not always going to be the case, but you can usually find a way to accomplish your goals while still calling the original function. What you don't want to do is rewrite and replace the original function entirely - you could if you have no other option, but then you'll own it forever, and have to keep a much closer eye on it every time a new upstream version is released.
If you dig in nanodjango's monkeypatch file you'll spot there's also a patch to get migrations working - and you'll see it uses a very similar approach.
So that's metaclasses and monkeypatching. But that's still pretty tame, and I promised you dark secrets. Next week it's time to look at my tagging library, django-tagulous.
Leave a comment
- Add a comment - it's quick, easy and anonymous