· 7 years ago · Sep 23, 2018, 11:28 AM
1# Upgrading auth.User - the profile approach
2
3This proposal presents a "middle ground" approach to improving and refactoring `auth.User`, based around a new concept of "profiles". These profiles provide the main customization hook for the user model, but the user model itself stays concrete and cannot be replaced.
4
5I call it a middle ground because it doesn't go as far as refactoring the whole auth app -- a laudable goal, but one that I believe will ultimately take far too long -- but goes a bit further than just fixing the most egregious errors (username length, for example).
6
7This proposal includes a fair number of design decisions -- you're reading the fifth or sixth draft. To keep things clear, the options have been pruned out and on the one I think is the "winner" is still there. But see the FAQ at the end for some discussion and justification of various choices.
8
9## The User model
10
11This proposal vastly pare down the User model to the absolute bare minimum and defers all "user field additions" to a new profile system.
12
13The new User model:
14
15 class User(models.Model):
16 identifier = models.CharField(unique=True, db_index=True)
17 password = models.CharField(default="!")
18
19Points of interest:
20
21* The `identifier` field is an arbitrary identifier for the user. For the common case it's a username. However, I've avoided the term `username` since this field can be used for anything -- including, most notably, an email address for the common case of login-by-email.
22
23* The `identifier` field is unique and indexed. This is to optomize for the common case: looking up a user by name/email/whatever. This does mean that different auth backends that don't have something like a username will still need to figure out and generate something to put in this field.
24
25* If possible, `identifier` will be an unbounded varchar (something supported by most (all?) dbs, but not by Django). If not, we'll make it varchar(512) ` or something. The idea is to support pretty much anything as a user identifier, leaving it up to each user to decide what's valid.
26
27* Password's the same -- if possible, make it unbounded to be as future-proof as possible. If not, we'll make it `varchar(512)` or something.
28
29* There's no validation on identifier, but the profile system allows individual profiles to contribute site-specific constraints. See below.
30
31* Why have an "identifier" at all? Why not just leave it up to the profiles? Most uses will have a primary "login identifier" -- username, email, URL, etc. -- and making that something that 3rd-party apps can depend on is probably good. Making it indexed means the common case -- look up user by identifier -- is as fast as possible.
32
33* Why have a password at all? Because if we don't, users will invent their own password management and storage, and that's a loaded gun pointed at their feet. However, `password` newly defaults to "!", which is the unusable password. Thus, if an auth backend doesn't use passwords, it can ignore the password field; the user object will automatically be marked as one that can't be auth'd by password.
34
35## Profiles
36
37OK, so if User gets neutered, all the user data needs to go somewhere... that's where profile comes in. Don't think about `AUTH_USER_PROFILE` which is weaksauce; this proposed new profile system is a lot more powerful.
38
39Here's what a profile looks like:
40
41 from django.contrib.auth.models import Profile
42
43 class MyProfile(Profile):
44 first_name = models.CharField()
45 last_name = models.CharField()
46 homepage = models.URLField()
47
48Looks pretty simple, and it is. It's just syntactic sugar for the following:
49
50 class MyProfile(models.Model):
51 user = models.OneToOneField(User)
52 ...
53
54That is, a `Profile` subclass is just a model with a one-to-one back to user.
55
56*HOWEVER*, we can do a few other interesting things here:
57
58### Multiple profiles
59
60First, `User.get_profile()` and `AUTH_USER_PROFILE` go die in a fire. See below for backwards-compatibility concerns.
61
62Thus, it should be obvious that supporting multiple profiles is trivial. In fact, it's basically a requirement since the auth app is going to need to ship with a profile that includes all the legacy fields (permissions, groups, etc), and that clearly can't be the only profile. So multiple profile objects: fully supported.
63
64### Auto-creation of profiles
65
66Right now, one problem with the profile pattern is that when users are created you've got to create the associated profile somehow or risk `ProfileDoesNotExist` errors. People work around this with `post_save` signals, `User` monkeypatches, etc.
67
68The new auth system will auto-create each profile when a user is created. If new profiles are added later, those profile objects will be created lazily (when they're accessed for the first time).
69
70This behavior can be disabled:
71
72 class MyProfile(Profile):
73 ...
74
75 class Meta(object):
76 auto_create = False
77
78### Extra user validation
79
80Profiles may contribute extra validation to the User object. For example, let's say that for my site I want to enforce the thought that `User.identifier` is a valid email address (thus making the built-in login forms require emails to log in):
81
82 from django.core import validators
83
84 class MyProfile(Profile):
85 ...
86
87 def validate_identifier(self):
88 return validators.is_valid_email(self.user.identifier)
89
90That is, we get a special callback, `validate_identifier`, that lets us contribute validation to identifier. This looks a bit like a model validator function, and that's the point. `User` will pick up this validation function in its own validation, and thus that'll get passed down to forms and errors will be displayed as appropriate.
91
92### Profile data access from User
93
94There's two ways of accessing profile data given a user: directly through the one-to-one accessor, and indirectly through a user data bag.
95
96Direct access is simple: since `Profile` is just syntactic sugar for a one-to-one field, given a profile...
97
98 class MyProfile(Profile):
99 name = models.CharField()
100
101... you can access it as `user.myprofile.name`.
102
103The accessor name can be overidden via a Meta option:
104
105 class MyProfile(Profile):
106 ...
107
108 class Meta(object):
109 related_name = 'myprof'
110
111[Alternatively, if this is deemed too magical, we could require users to manually specify the `OneToOneField` and provide `related_name` there.]
112
113This method is explicit and obvious to anyone who understands that a profile is just a object with a one-to-one relation to user.
114
115However, it requires the accessing code to know the name of the profile class providing a piece of data. This starts to fall apart when it comes to reusable apps: I should be able to write an app that has a requirement like "some profile must define a `name` field for this app to function." Thus, users expose a second interface for profile data: `user.data`. This is an object that exposes an amalgamated view onto all profile data and allows access to profile data without knowing exactly where it comes from.
116
117For example, let's imagine two profiles:
118
119 class One(Profile):
120 name = models.CharField()
121 age = models.IntegerField()
122
123 class Two(Profile):
124 name = models.CharField()
125 phone = models.CharField()
126
127And some data:
128
129 user.one.name = "Joe"
130 user.one.age = 17
131 user.two.name = "Joe Smith"
132 user.two.phone = "555-1212"
133
134Let's play:
135
136 >>> user.data["age"]
137 17
138
139 >>> user.data["phone"]
140 "555-1212"
141
142 >>> user.data["spam"]
143 Traceback (most recent call last):
144 ...
145 KeyError: spam
146
147 >>> user.data["name"]
148 "Joe"
149
150Notice that both profiles are collapsed. This means that if there's an overlapping name, I only get one profile's data back. Which? By default it's undefined and arbitrary, but users can set a `AUTH_PROFILES` settings to control order; see below. If `AUTH_PROFILES` is set, the **first** profile defining a given key will be returned.
151
152If you need to get *all* values for an overlapping key, you can use `user.data.dict`:
153
154 >>> user.data.dict("name")
155 {"one": "Joe", "two": "Joe Smith"}
156
157Setting data works; however, "in the face of ambiguity, refuse the temptation to guess":
158
159 >>> user.data["age"] = 24
160 >>> user.one.age
161 24
162
163 >>> user.data["name"] = "Joe"
164 Traceback (most recent call last):
165 ...
166 KeyError: "name" overlaps on multiple profiles; use `user.one.name = ...` or `user.two.name = ...`
167
168Like all models, just setting `user.data` keys doesn't actually save the associated profile back to the db. For that, user `user.data.save()`. This saves all associated profiles (or perhaps just modified ones if we're feeling fancy).
169
170### Querying against profile data
171
172Making queries against profiles falls into a similar situation as accessing profile data. Since profiles are sugar for one-to-ones, you can always simply do:
173
174 User.objects.filter(prof1__field1="foo", prof2__field2="bar")
175
176However, just like with data access, reusable apps may need the ability to to make queries against profile data. That looks like this:
177
178 User.objects.filter(data__field1="foo", data__field2="bar")
179
180This `data__` syntax also works for `order_by()`, etc.
181
182Once again, "in the face of ambiguity, refuse the temptation to guess": if a data field is duplicated, you'll get an exception if you try to query against it.
183
184### Performance optimization
185
186One of the main criticisms I anticipate is that this approach introduces a potentially large performance hit. Code like this:
187
188 user = User.objects.get(...)
189 user.prof1.field
190 user.prof2.field
191 user.prof3.field
192
193could end up doing 4 queries. This could be even worse if we go with the magic-attributes described above: those DB queries would be eventually hidden.
194
195Luckily this is fairly easy to optimize: allow user queries to pre-join onto all profile fields. THat is, instead of `SELECT * FROM user` do `SELECT user.*, prof1.* FROM user JOIN prof1`. Since profiles all subclass `Profile` it's trivial to know which models to do this to.
196
197In other words, `User.objects.all()` works the same as `User.objects.select_related(*all_profile_fields)`. On many databases, this JOIN across indexed columns is nearly as fast as local column access. However, since there are situations where these JOINs aren't wanted, it's easy to turn off: `User.objects.select_related(None)`.
198
199### Controlling which profiles are available: `AUTH_PROFILES`
200
201`AUTH_PROFILES` is an optional setting that controls profile discovery. It's unset by default, and if let unset Django will simply assume any installed profile -- any `Profile` subclass in an app that's in `INSTALLED_APPS` is an installed profile. This is probably good enough for the common case. However, it falls down in two situations:
202
203* If multiple profiles defined the same fields, then the `user.data` accessor will find those fields in an arbitrary order.
204
205* If users want to install an app with a profile they don't want, or if an app ships multiple profiles, etc.
206
207In both of these cases, you can use the `AUTH_PROFILES` setting to control which profiles are considered installed, and in which "order". It's just a list pointing to profile classes:
208
209 AUTH_PROFILES = ["path.to.OneProfile", "path.to.TwoProfile"]
210
211If a profile isn't listed in the list but is a model in `INSTALLED_APPS`, the model will still get installed (the table will be there), but it won't be considered a profile. That means none of the special behavior -- `user.data`, performance optimization, etc. It's an error to have a model in `AUTH_PROFILES` that's not a `Profile` or not installed.
212
213## Auth backends
214
215Auth backends continue to work almost exactly as they did before. Most notably, they'll still need to return an instance of `django.contrib.auth.models.User`, and that user will require some sort of unique identifier.
216
217However, auth backends now *can* take profiles into account, which means that things'll like OpenID backends can have an `OpenIDProfile` and store the URL field there (or use the URL as the `identifier`, perhaps).
218
219## Forms
220
221Under the new system, if you simple create a model form for user:
222
223 class UserForm(ModelForm):
224 class Meta:
225 model = User
226
227... you'll get a field that only has `identifier` and `password`.
228
229Thus, Django will ship with an convenience form, `django.contrib.auth.forms.UserWithProfilesForm` that automatically brings in all profile fields onto a single form and properly saves users and their profiles. This'll be useful for registration. We may also need to give this form a hook to only include particular profiles; that's TBD.
230
231There's also a set of existing user forms that're used for login, password changing, etc. These'll stay the same, although they'll switch what data they talk to a bit.
232
233## Backward compatibility
234
235The big one, of course.
236
237First, there's deprecation to consider. `AUTH_USER_PROFILE` and `user.get_profile()` will simply be removed. Access to attributes directly on `User` objects (`user.is_staff`, etc.). This will be replaced by `user.data` and/or `user.defaultprofile` attributes. Deprecation will be according to the normal schedule: polite warnings in 1.5, more urgent ones in 1.6, and outright removal in 1.7.
238
239[If it turns out that this schedule causes pain for some users we might consider a longer deprecation cycle for these things.]
240
241After that, there's two facets here; an easy one and a hard one. Let's do the easy one first:
242
243### The "default profile"
244
245Many, many apps rely on existing user fields (`user.is_staff`, `user.permissions`, etc.) -- the admin for one! The fields need to stick around at least for the normal deprecation period, and possibly for longer. Thus, we'll ship with a `DefaultProfile` that includes all the old removed fields, and we'll include sugar such that `user.username`, `user.is_staff`, and all that stuff continues to use.
246
247Django will ship with backwards-compatible shims for this default profile. Data access (`user.is_staff`, etc.) will continue to work, as will support in queries (`User.objects.filter(is_staff=True)`). This'll get deprecated according to the normal schedule.
248
249[We might want to come up with a better name than `DefaultProfile`. If we plan on deprecating the object, maybe `LegacyProfile` is more appropriate.]
250
251At some point, people may want to remove the default profile; they can do so by using `AUTH_PROFILES`. Obviously some stuff won't work -- the admin, again -- but if people turn off the default profile they should be prepared to deal with those changes.
252
253### Model migration
254
255This one's the big one: there has to be a model migration. I'm not tied to the solution below, but there *are* a couple of rules this process needs to follow:
256
2571. This migration cannot block on getting schema migration into core. It'd be great if we could leverage the migration tools, but we can't block on that work.
258
2592. Until the new auth behavior is switched on, Django 1.5 has to be 100% backwards compatible with 1.4. That is, we need something similar to the `USE_TZ` setting behavior: until you ask for the new features, you get the old behavior. This decouples upgrading Django from upgrading auth, and makes the whole upgrade process much less low-risk. If we don't do this, we're effectively requiring downtime for a schema migration from all our users, and that's not OK.
260
261Given those rules, here's my plan:
262
263Django 1.5 ships with the ability to run in two "modes": legacy user mode, and new user mode. There's no setting to switch modes: the mode is determined by looking at the database: if `auth_user` has an `identifier` field, then we're in new mode; otherwise we're in old.
264
265In old mode, `django.contrib.auth.User` behaves much as it did before:
266
267* The `auth_user` table looks as it did before -- i.e. `user.username` and friends are real, concrete fields.
268
269* None of the special `Profile` handling runs (no auto-joins, etc). Profile objects still work 'cause they're just special cases of models, but no magic identifiers, no validation contribution, etc.
270
271* `user.identifier` exists as a proxy to `username` to ease forward motion, but it's just a property proxy.
272
273The new mode gets all the new behavior, natch.
274
275### How to upgrade
276
277A single command:
278
279 ./manage.py upgrade_auth
280
281(or whatever). This means we have to ship with a bunch of *REALLY WELL TESTED*, hand-rolled SQL for all the supported Django backends and versions. That'll be a pain to write, but see rule #1 above. This'll do something along the lines of:
282
283 CREATE TABLE auth_defaultprofile (first_name, last_name, ...);
284 INSERT INTO auth_defaultprofile (first_name, ...)
285 SELECT first_name, ... FROM auth_user;
286 ALTER TABLE auth_user DELETE COLUMN first_name;
287 ...
288 ALTER TABLE auth_user RENAME username TO identifier;
289
290This means that the upgrade process will look like this:
291
2921. Upgrade your app to Django 1.5. Deploy. Note that everything behaves as it has in the past.
2932. Run `manage.py upgade_auth`.
2943. Restart the server (ew, sorry.)
2954. Now start using all the new profile stuff.
296
297Note that an initial `sycndb` will create the *new* models, so new projects get the new stuff without upgrading.
298
299### Warnings, etc.
300
301Fairly standard, but with a twist:
302
303* In Django 1.5, if you haven't yet issued an `upgrade_auth`, you'll get a deprecation warning when Django starts.
304
305* In Django 1.6, this'll be a louder warning.
306
307* In Django 1.7, `upgrade_auth` will still be there, but Django will now refuse to start if the upgrade hasn't run yet.
308
309* In Django 1.8, `upgrade_auth` is gone.
310
311## FAQ
312
313### Where does this idea come from?
314
315It's basically what I do already, and from looking at other people's code it appears to be on its way towards being something of a best practice pattern. That is, I tend to see code like:
316
317 class MyProfile(models.Model):
318 user = models.OneToOneField(user, related_name='profile')
319 ...
320
321... and then access to the profile as `user.profile`.
322
323This profile essentially formalizes this pattern, provides for some improved syntactic sugar, and allows for multiple profiles in a fairly pluggable way.
324
325### Why not a swappable user model?
326
327I'm convinced that such an idea is ultimately a bad idea: it allows apps exert action at a distance over other apps. It would allow the idea of a user to *completely change* without any warning simply by modifying a setting. Django did this in the past -- `replaces_module` -- and it was a nightmare. I'm strongly against re-introducing it.
328
329However, please do note that this proposal doesn't actually preclude introducing a swappable user in the future. It's possible that the right implementation could change my mind, and so this proposal leaves the option available.
330
331### Will user.save() call user.validate() by default?
332
333(This idea was in an earlier draft of this proposal.)
334
335No. Doing this *would* make the extra contributed validation a bit stronger, but it would ultimately make `User` behave differently from a "normal" model, and that's probably a bad idea.
336
337### Why user.data?
338
339It's not perfect, but it's the best of a bunch of flawed options. Other things we considered:
340
341* Nothing: simply make users access profile data as `user.someprofile.somefield`. There's no magic here, but it ultimately falls down since it doesn't allow "duck typing" of profiles. That is, if I'm the author of a reusable app I want to be able to grab an email address from "some profile" without having to know *which* profile provided that field. If we had no combined data bag, apps would do things like hardcoding `user.defaultprofile.email`, and that'd fail if projects remove the default profile.
342
343* Do the above, but provide some mechanism for apps to determine which accessor they'd need to use for some field. That is, there'd be a way to pass into the app the name of the profile, and then apps would use `user.<thatprofile>.somefield`. This mechanism could be the `app` object introduced by app-refactor, for example. This is workable, but it feels like a lot of configuration and bookkeeping for what's really a basic thing: getting information from a profile without caring where that information came from.
344
345* Magic attributes: let `user.somefield` magically proxy to `user.someprofile.somefield`. This I deem to simply to too much magic: it blurs the difference between profile data and local data, and leads to expectations that things like `User.objects.filter(somefield=...)` would work (which wouldn't without even more magic). This would also seriously muddle what `user.save()` does.
346
347Ultimately, `user.data` seems to be the best option. It's clear that user data isn't the same as an attribute, it provides the ability to other things like call `dict()` and `save()`, and it preserves reusability.
348
349### Why filter(data__field=foo)?
350
351Most for symmetry with `user.data`. We also considered `User.objects.data(foo=bar)`, but ultimately `data__` is the most extensible as it allows for the same syntax for `order_by()`, etc.
352
353### Is this special sugar for OneToOne available for other models?
354
355That's out of scope for the purposes of this proposal. It may very well be the case that this sort of "privileged OneToOne" could be useful for other projects, and it may turn out to be just as much work to create a general API as a specific one. But that's not something that's required for this to work, and can always be a future refactor/improvement.