Syntactic Sugar vs Maintainability

Richard Terry

Hi I'm Richard Terry, CTO at Wildfish, a London-based Python and Django consultancy.

I'll be talking about using Python's syntactic sugar to make your code simpler to use, and the tradeoff that has for you when it comes to maintaining your code.

These are not techniques which I would normally use in my day job, because I don't want to inflict them on my colleagues.

But I do think they have their place, and that's when I'm writing library code, and trying to build clean APIs for others to use.

0:40

Zen of Python

Simple is better than complex.
Explicit is better than implicit.
Beautiful is better than ugly.

The Zen of Python - for those of you who haven't read it - it's a list of guiding principles; it basically says write clean code.

The techniques I'm going to talk about help you do that by shifting the complexity away from what you're trying to write, making the code that's left behind simple and beautiful.

But that comes at a cost

1:10

Abuse of Python

Complex makes the simple possible
Implicit is simpler than explicit
Beauty's in the eye of the beholder

The techniques themselves introduce complexity.

To let you write the simple code, you need to write something complex and ugly.

It will introduce implicit actions - magic which causes things to happen behind the scenes, which can lead to unexpected side effects and make it harder to debug when things go wrong.

Wise developers will say "oh, you should never do that", and there are very good reasons why you shouldn't.

But I think there are times when you should, because these techniques let you hide the complex ugly code away from the people who you want to make things easy for.

But it is a trade-off about where you want the complexity to be in your code.

Your challenge as author is to decide whether the advantages outweight the disadvantages.

So, on to some of the techniques.

2:10

Decorators

@my_decorator
def my_function():
    ...

Decorators. Not controversial, I don't think anyone would say don't use a decorator, but it's a good place to start.

For those who haven't used them, decorators are where you put an @something before your function.

They're the easiest way to move boilerplate functionality away from the code you work on regularly - they let you process and alter the function's arguments and return values, and let you do things with the functions themselves, like manage registration or perform introspection.

2:50

def write_to_screen(var):
    if not isinstance(var, str):
        raise ValueError("Expected a string")
    print(var)

Briefly, a contrived example.

Say we have a function and want to ensure the argument is a string.

That's clear, but if we want to do that with 10 functions, we'll need to write those 2 lines 10 times. It starts to get messy, and introduces the risk of variations in between functions.

3:15

@enforce_string
def write_to_screen(var):
    print(var)

One way of avoiding that would be to move the validation into a decorator

That's nice and clean - nothing cluttering the function, it only contains the actual logic for writing to the screen - it's really clear and easy to follow.

However, that already comes at a cost:

3:40

def enforce_string(fn):
    def wrap(var):
        if not isinstance(var, str):
            raise ValueError("Expected a string")
        return fn(var)
    return wrap

This is your decorator. For those who have written decorators before there's nothing shocking about it, but you can see already this isn't as clear to follow as it was before.

We've turned our 2 lines into 6.

But this is a very contrived example - things get worse in the real world.

4:00

Mara echo server

from mara import Service, events
service = Service()

@service.listen(events.Receive)
def receive(event):
    event.client.write(event.data)

if __name__ == '__main__':
    service.run()

http://radiac.net/projects/mara/

For a real world example, this is an echo server using Mara, a networking library I wrote

Mara uses decorators to register event handlers.

So here we have a receive function which receives an event and sends the same data back to the same client, and we bind that to the service using its listen decorator, which says "listen for a Receive event and pass it to this function".

About as simple and clear as networking can get.

But it comes at a cost

4:45

class Service(ClientContainer):
    def listen(self, event_class, handler=None):
        """
        Bind a handler to the specified event class, and to its subclasses
        """
        # Called directly
        if handler:
            if isinstance(handler, events.handler.HandlerType):
                handler = handler()
            self._listen(event_class, handler)
            return

        # Called as a decorator
        def decorator(fn):
            if isinstance(fn, events.handler.HandlerType):
                fn = fn()
            self._listen(event_class, fn)
            return fn
        return decorator

    def _listen(self, event_class, handler):
        """
        Internal method to recursively bind a handler to the specified event
        class and its subclasses. Call listen() instead.
        """
        # Recurse subclasses. Do it before registering for this event in case
        # they're not known yet, then they'll copy handlers for this event
        for subclass in event_class.__subclasses__():
            self._listen(subclass, handler)

        # Register class
        self._ensure_known_event(event_class)
        self.events[event_class].append(handler)

    def _ensure_known_event(self, event_class):
        """
        Ensure the event class is known to the service.

        If it is not, inherit handlers from its first base class
        """
        # If known, nothing to do
        if event_class in self._known_events:
            return

        # If base class isn't an event, nothing to do
        base_cls = event_class.__bases__[0]
        if not isinstance(base_cls, events.Event):
            return

        # Ensure base class is known, and copy its handlers
        self._ensure_known_event(base_cls)
        self.events[event_class] = self.events[base_cls][:]


class HandlerType(type):
    def __init__(self, name, bases, dct):
        super(HandlerType, self).__init__(name, bases, dct)

        # Collect handler functions, sorted by name
        self._handlers = [
            getattr(self, handler_name) for handler_name in sorted(dir(self))
            if handler_name.startswith('handler_')
        ]

        # Inherit missing docstrings
        if not self.__doc__:
            docbases = bases[:]
            for base in docbases:
                if issubclass(Handler, base):
                    # Either Handler or one of its bases - gone too far
                    continue
                if base.__doc__:
                    self.__doc__ = base.__doc__
                    break


@six.add_metaclass(HandlerType)
class Handler(object):
    """
    Class-based event handler
    """
    # Permanent list of all ordered handlers
    _handlers = None

    # Temporary handler queue, created for each event
    handlers = None

    # Reference to current event
    event = None

    # Reference to current container
    container = None

    def get_handlers(self):
        return self._handlers[:]

    def get_container(self, event):
        """
        Given the event, find the container so it can be made available
        """
        return event.service

    def __call__(self, event, *args, **kwargs):
        """
        Run all handlers
        """
        # Prepare handler context
        self.event = event
        self.container = self.get_container(event)

        # Load up clean queue of handlers and loop until they're all run
        self.handlers = self.get_handlers()
        while self.handlers:
            # Get next handler
            handler = self.handlers.pop(0)

            # Process
            if inspect.isgeneratorfunction(handler):
                # ++ python 3.3 has yield from
                generator = handler(self, event, *args, **kwargs)
                try:
                    next(generator)
                except StopIteration:
                    pass
                else:
                    while True:
                        try:
                            try:
                                raw = yield
                            except Exception as e:
                                generator.throw(e)
                            else:
                                generator.send(raw)
                        except StopIteration:
                            break
                # ++ end python 2.7 support
            else:
                handler(self, event, *args, **kwargs)

            # Option to terminate event
            if event.stopped:
                self.handlers = []

        # Clean up
        self.event = None
        self.container = None
        self.handlers = []

Nearly 150 loc - and there's no networking, it's just what the decorator does to make sure an event reaches the correct function.

So that one line decorator is hiding a lot of complexity, and if there's a problem in here, it's going to be pretty difficult for someone using Mara to figure out what's gone wrong.

By making the choice to simplify mara's API, I've significantly raised the barrier to entry for any potential contributors.

But I'm happy with that - my goal is to write a library which makes networking easier, and this achieves that.

But decorators are generally considered ok. Lets get a little darker

5:30

Metaclasses

class MyMetaclass(type):
    def __init__(self, name, bases, dct):
        super().__init__(name, bases, dct)
        print("Defined")

class MyClass(metaclass=MyMetaclass):
    def __init__(self):
        print("Initialised")

Metaclasses.

They are essentially invisible decorators for classes. The way it'll normally work is you define a base class with a custom metaclass, then every class which inherits from the base will have the same metaclass.

Python classes have __init__ to let you declare what happens when your class is initialised, as you can see at the bottom.

A metaclass is similar, but in its __init__ it lets you control things when the class is defined, as you can see at the top. There's also a __new__ method which lets you manipulate the class before it is defined, but we don't need to worry about that now.

This means you can manipulate class attributes and methods - set and update defaults, enforce restrictions like checking for required class attributes or performing type checking

And again, like decorators, you can use it for introspection, or...

6:25

Class Registration

registry = {}

class RegistryType(type):
    def __init__(cls, name, bases, attrs):
        super().__init__(name, bases, attrs)
        if attrs.get("abstract", False):
            return
        registry[name] = cls

class RegistryBase(metaclass=RegistryType):
    abstract = True

class First(RegistryBase):
    ...

Registration.

What we want to do is create a base class for all our registered objects - but it's not going to do anything, so we don't want it to end up in our registry, so lets set abstract = True.

That line means nothing to python, until you tie up our metaclass defined at the top. That says that when one of its classes is defined, check the class attributes, and if someone has set abstract == True, then don't add it to the registry.

This is a pretty common pattern - I've used it myself in several different projects, and you'll frequently come across it in the wild without realising it.

A great example is in Django models.

7:25

Django models

from django.apps import apps

class ModelBase(type):
    def __init__(cls, name, bases, attrs):
        super().__init__(name, bases, attrs)
        if not attrs.get("abstract", False):
            apps.register(name, cls)

class Model(metaclass=ModelBase):
    abstract = True

class Cat(Model):
    name = models.CharField(...)

Obviously this is very simplified, I've stripped everything else out, but this is essentially what you'll find in django.db.models.base.

This works really well for Django - it needs to register the models for migrations, foreign key lookups, and this is a great way to make it easy for the user to work with model objects.

That said, it's not the only way.

7:55

# Decorator
@apps.register
class Cat(Model):
    name = models.CharField(...)

# Call
class Dog(Model):
    name = models.CharField(...)

apps.register(Dog)

You could just as equally register it using a decorator or explicit function call, both of which you'll find used elsewhere in Django - admin site or template tags for example.

So why use a metaclass over the other options? There's no reason other than it saves you a line of boilerplate - it's one less thing that users need to remember to do, and one less thing to clutter up their source.

So if they're so good, why not use them everywhere?

8:40

class Cat(Model):
    name = models.CharField(...)

They're too magical. Without knowing what we've just talked about, you'd look at that code and have no idea how Django is going to find to about your model.

More importantly, it's not clear how you'd control that, which is a problem for template tags when you want multiple template processors, or admin models when you want multiple admin sites. You don't necessarily want then all to magically register with the same thing.

Metaclasses exist to hide explicit logic - in this case, registration is now implicit, behind the scenes - you define a model class and suddenly Django just magically knows about it.

Which is why the Zen of Python discourages this sort of thing in the first place.

But, lets push on.

9:40

Monkeypatching

import os

def fake_exists(path):
    return True

os.path.exists = fake_exists

For those who haven't come across it before, monkeypatching is changing code at runtime.

The most common reason to do this is probably in testing, where you want to make a temporary change to stub something out, as in the example, where we're going to trick our code into thinking any path exists.

You might also want to do it so that your code works with someone else's; their code doesn't do what you want, but for whatever reason you don't want to fork it or submit a PR.

All we're doing is basically overwriting a variable so that when other code tries to use it, it finds our new value or function which does what we want.

But, this introduces potential problems.

10:50

The risks

Not what other code expects
Obscures stack traces

Firstly you have to be careful not to break other people's code.

When somebody calls os.path.exists they're expecting a boolean back, which will be True if the path exists and False if it doesn't. Now as long as your changes don't break that you're ok, but that's something you need to be very careful about as your patches get more complicated.

Tests can really help here, especially using something like hypothesis to help cover edge cases. Run the same tests against the old and new functions and check everything matches.

The second issue is that if something goes wrong it makes it harder to debug. If the original os.path.exists raises an exception, then our new function will appear in the stack trace, which can be a bit confusing.

But as long as you're aware of the risks, and you take care around them, monkeypatching can be another powerful ally when trying to make your code easier to use.

So lets see how these techniques come together in a real project.

12:15

Django Tagulous

from django.db import models
import tagulous

class Person(models.Model):
    name = models.CharField(max_length=255)
    skills = tagulous.models.TagField()

http://radiac.net/projects/django-tagulous/

Tagulous is a tagging library I wrote for Django database models. It basically lets you categorise objects in a database, by associating them with one or more tags.

There were several tagging libraries out there, but they used generic foreign keys - a Django-specific way to loosely relate objects in the database, but the databases don't understand it, which makes them slow and difficult to work with.

One day I got fed up with them and decided that I wanted to use native many to many relationships, and thought... how hard can it be?

And so I wrote Tagulous, which is both an easy and powerful tagging library to use, and also a perfect case study in how to build a package which is difficult to maintain.

13:25

Where to start

Person.objects.create(
    name="Anne",
    skills="run, jump, kung fu",
)

Person.objects.create(
    name="Bob",
    skills=["bake", "run"],
)

I wanted to be able to set categories with a list or a comma separated string, so for that I'm going to need to write a custom field, so that's the obvious place to start.

I also want it to be a M2M relationship, so, it makes sense to...

13:50

class TagField(models.ManyToManyField):
    ...

class Tag(models.Model):
    ...

class Person(models.Model):
    name = models.CharField(max_length=255)
    skills = TagField(Tag)

subclass the ManyToManyField. So far so good.

But what are we relating it to? A M2M field requires an explicit model, which means my users will need to write boilerplate for creating the tag model.

The other tagging libraries get around this by using a GFK - they store the tags in a single shared model and create the loose relationships out from that.

But that's not going to work for me, because I want a proper M2M, and I don't want more boilerplate than the other libraries.

So I figured, lets create a unique model automatically, at runtime.

14:50

Creating classes at runtime

Defined when imported:

class Tag(models.Model):
    name = models.CharField(max_length=255)

Defined at runtime:

Tag = type(
    "Tag",
    (models.Model,),
    {
        "name": models.CharField(max_length=255),
    },
)

It turns out that Python is really good at this because the standard class syntax is basically just syntactic sugar for a call to type.

We've got the class name, the list of base classes, and a dictionary of its attributes. And these will get passed straight into Django's model metaclass which we looked at earlier, ensuring that the model is automatically registered.

So we can call type directly, and pass in any arguments we need. But where?

15:40

class TagField(models.ManyToManyField):
    def __init__(self, *args, **kwwags):
        self.model = type(
            unique_name,  # ????
            (models.Model,),
            {
                "name": models.CharField(max_length=255),
            },
        )

My first thought was to put it on the field's __init__, but

we need each field to have its own tag model, each tag model name needs to have a unique but consistent name - so something based on the tagged model and name of the tag field.

But __init__ won't have that information.

16:10

Metaclass magic

class ModelBase(type):
    def __init__(cls, name, bases, attrs):
        super().__init__(name, bases, attrs)
        for field_name, field in attrs:
            field.contribute_to_class(self, field_name)

class TagField(models.ManyToManyField):
    def contribute_to_class(self, cls, field_name):
        ...

        self.remote_field.model = type(
            f"Tags_{cls.__name__}_{field_name}",
            (models.Model,),
            {
                "name": models.CharField(max_length=255),
            },
        )

But metaclasses come to the rescue again.

It turns out that because certain Django model fields need to know something about the model they're attached to, the Django metaclass is already set up to call contribute_to_class on each field.

So the metaclass at the top calls contribute_to_class at the bottom with the data we need to generate a unique model name, dynamically create the model, and then monkeypatch the model attribute on the field so that it thinks it has always been pointing at our model.

But

This is what I mean when I say that making your code nice for your users comes at a cost to you.

This was all just to save our users two lines of code when initialising a tag field, and the actual code is a lot more complex than this.

And the deeper I got into Tagulous, the worse things got. If you want to see more monkeypatching insanity, have a look through the source code - there are gems such as...

17:15

Monkeypatching class inheritance

class TagField(models.ManyToManyField):
    def contribute_to_class(self, cls, field_name):
        ...

        cls.__bases__ = (TaggedModel,) + model.__bases__

... injecting a new base class underneath our user's model, so that we can override how the normal model base class works to make it seem like tag field support was built into Django itself.

There's also....

17:45

Switching a class

class TaggedModelManager(models.Manager):
    def filter(...)
        ...

class TagField(models.ManyToManyField):
    def contribute_to_class(self, cls, field_name):
        ...

        orig_manager = cls.objects.__class__
        cls.objects.__class__ = type(
            str('CastTagged%s' % orig_cls.__name__),
            (TaggedModelManager, orig_cls),
            {},
        )

swapping the class of an instantiated object. Obviously this is going to end badly if you're not careful, but this has the advantage over changing base classes that it will now appear ahead of any functionality in the original class - again, to make it look like the tag field is natively supported by Django.

And there's a lot more in there too, work with django's serializers and admin site - what started out as a simple idea, took me down a very deep rabbit hole.

I've tried to mitigate these risks with test coverage, code comments, and documentation for potential contributors, but there's no getting away from the fact there's some pretty hairy stuff in there.

That said, I find it useful in my projects, and other people seem to like it, so I think it was worth the effort.

But it is possible to take these things too far, and I can't finish this talk without giving you an example.

Introducing...

19:15

Perl in Python

$ pip install perl
$ python
>>> import perl
>>> value = "Hello there"
>>> if value =~ /^hello (.+?)$/i:
...     print("Found greeting:", $1)
...
Found greeting: there
>>> value =~ s/there/world/
>>> print(value)
Hello world

http://radiac.net/projects/python-perl/

... my new perl module for python, bringing the elegance of perl's native regular expression syntax into your daily python code.

This works and is up on PyPI now - it essentially monkeypatches the Python interpreter to add perl regular expression syntax to the language.

It is, by any measure, an abomination - an exercise in what you definitely shouldn't do, even though python lets you.

It is full of horrors such as...

20:00

import re
builtins.__dict__["re"] = re

injecting python's regular expression module into every imported file

...

20:10

$foo = "snakes are great"
$foo =~ s/snake/camel/i

becomes

__perl__var__foo = "snakes are great"
__perl__var__foo = __perl__reset_vars() or re.sub(
    r'snake', 'camel', __perl__var__foo, count=1, flags=re.I,
)

using import hooks to pre-parse the source as it's imported so that it can be rewritten into python before the real interpreter sees it,

That lets us replace variables and rewrite entire lines of code on the fly, which is about as sensible as it sounds.

Because the cost of introducing this syntactic sugar is obvious - what happens when we hit a problem - an error gets raised in the code we've generated?

20:35

>>> print($1)
Traceback (most recent call last):
File "<console>", line 1, in <module>
NameError: name '__perl__var__1' is not defined

Suddenly your error messages don't make any sense without an in-depth knowledge of how the module works.

Now, what I need to do there is write a global exception handler to parse and rewrite the message - but that's probably something for another talk.

So, obviously this is taking these ideas too far, but where is the line?

When should you consider these techniques?

21:10

When to do these things

Do it for your users
Do it for yourself
But only when it makes sense

These techniques come with a cost, and you shouldn't use them unless you are willing to pay the price.

But in my opinion, there are times when it is worth it.

Do it for your users - library code you're releasing should be easy to use.

This isn't necessarily purely altruistic, or even just for projects you're going to release - because often the person using the library in the future will be you. By centralising the boilerplate and hacks into a reusable library or a core module in your project, you'll make the rest of your code easier to follow.

But always bear in mind the tradeoffs. It makes it harder for people new to the project to understand how it works and how to contribute, and it makes it harder to follow and debug.

22:00

Managing the cost

Keep it contained
Document how, what and why
Test absolutely everything

What you don't want is to create terrifying monster sat in the corner of your code that nobody will ever go near. Keep your hairy bits small and well-defined, ideally in self-contained clearly-labelled files like monkeypatch.py, or at least in self-contained functions, and leave plenty of comments to explain what you're doing and why you're doing it

And most of all, cover it with tests - test your code, test the things your code might affect, otherwise you'll give other people nightmares, and if you ever need to make a change to it, without tests all is lost.

These techniques are not for every project - don't do them just because you can.

But they show that with some effort up front, you can make your code, or your users' code, much cleaner and easier to write and maintain. The question is, how far are you willing to go.

23:15

Thank you

radiac.net/pycon2019

If you'd like to find out more, later today I'll put up some notes and links at radiac.net/pycon2019

I don't think we have time for questions, but I'll be around for the rest of the conference so do come and find me if you'd like to chat.

Thank you for listening.

Space	Forward
Right, Down, Page Down	Next slide
Left, Up, Page Up	Previous slide
G	Go to slide number
P	Open presenter console
H	Toggle this help