Wednesday, March 2, 2011

Disqus experience - cache of model's related fields

Very interesting tricks from Disqus: Scaling The World’s Largest Django Application

The main idea to retrieve related fields by two requests instead of 1 + 25 sql requests using post.user

Such situation caused in several projects simultaneously so I working on optimization things right now.

# cache
posts = Post.objects.all()[0:25]

users = dict(
    (u.pk, u) for u in \
    User.objects.filter(pk__in=set(p.user_id for p in posts))
)

for p in posts:
    p.__user_cache = users.get(p.user_id)


And don't forget about SQL indexes - in all cases when field used in ORDER or WHERE sql statements it will increase performance a lot

7 comments:

  1. select_related does not solve this problem?
    in 1 request
    http://docs.djangoproject.com/en/1.2/ref/models/querysets/ # django.db.models.QuerySet.select_related

    ReplyDelete
  2. Dixon@ No. You should expect, that the data are needed more times in subtemplates or select_related with all necessary fields creates a too complicated uneffective query or you need several aggregated fields. This last step can be repeated more times. I searched such something for a long time.

    ReplyDelete
  3. Dixon@, I'm sure that it's possible to "emulate" effective query with combination of select_related('field_with'fk) and some kind of qs.extra but code will looks like crazy rocket science and request will be not so efficient like solution above

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Well, nice to hear from you. Very well information from this blog. Thanks for sharing it with us.Ecommerce web developers

    ReplyDelete
  6. Yeah, I'm really having a problem seeing how select_related is not the solution. select_related automatically has the Post query the related user fields and assign them to the user attribute of each Post record in the queryset. Unless I'm missing something?

    posts = Post.objects.all().select_related("user_pk")

    will take care of all of the requirements satisfied by the code listed above *and* require only one (still very simple) query, instead of 2.

    ReplyDelete
  7. Sorry, code above should be


    posts = Post.objects.all().select_related("user__pk")

    (missing one underscore)

    ReplyDelete