v0.13.0
v0.13.0 master v0.12.2 v0.11.3 v0.10.2 v0.9.3 v0.8.3

Dataloaders

Optimizing N+1 database queries using Dataloaders
[edit]

Have you noticed some GraphQL queries end can make hundreds of database queries, often with mostly repeated data? Lets take a look why and how to fix it.

Query Resolution

Imagine if you had a simple query like this:

query { todos { users { name } } }

and our todo.user resolver looks like this:

func (r *todoResolver) UserRaw(ctx context.Context, obj *model.Todo) (*model.User, error) {
	res := db.LogAndQuery(r.Conn, "SELECT id, name FROM dataloader_example.user WHERE id = ?", obj.UserID)
	defer res.Close()

	if !res.Next() {
		return nil, nil
	}
	var user model.User
	if err := res.Scan(&user.ID, &user.Name); err != nil {
		panic(err)
	}
	return &user, nil
}

Note: I’m going to use go’s low level sql.DB here. All of this will work with whatever your favourite ORM is.

The query executor will call the Query.Todos resolver which does a select * from todo and return N todos. Then for each of the todos, concurrently, call the Todo_user resolver, SELECT from USER where id = todo.user_id.

eg:

SELECT id, todo, user_id FROM todo
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?
SELECT id, name FROM user WHERE id = ?

Whats even worse? most of those todos are all owned by the same user! We can do better than this.

Dataloader

What we need is a way to group up all of those concurrent requests, take out any duplicates, and store them in case they are needed later on in request. The dataloader is just that, a request-scoped batching and caching solution popularised by facebook.

We’re going to use dataloaden to build our dataloaders. In languages with generics, we could probably just create a DataLoader, but golang doesnt have generics. Instead we generate the code manually for our instance.

go get github.com/vektah/dataloaden
mkdir dataloader
cd dataloader
go run github.com/vektah/dataloaden UserLoader int *gqlgen-tutorials/dataloader/graph/model.User

Next we need to create an instance of our new dataloader and tell how to fetch data. Because dataloaders are request scoped, they are a good fit for context.


const loadersKey = "dataloaders"

type Loaders struct {
	UserById UserLoader
}

func Middleware(conn *sql.DB, next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		ctx := context.WithValue(r.Context(), loadersKey, &Loaders{
			UserById: UserLoader{
				maxBatch: 100,
				wait:     1 * time.Millisecond,
				fetch: func(ids []int) ([]*model.User, []error) {
					placeholders := make([]string, len(ids))
					args := make([]interface{}, len(ids))
					for i := 0; i < len(ids); i++ {
						placeholders[i] = "?"
						args[i] = i
					}

					res := db.LogAndQuery(conn,
						"SELECT id, name from dataloader_example.user WHERE id IN ("+strings.Join(placeholders, ",")+")",
						args...,
					)
					defer res.Close()

					userById := map[int]*model.User{}
					for res.Next() {
						user := model.User{}
						err := res.Scan(&user.ID, &user.Name)
						if err != nil {
							panic(err)
						}
						userById[user.ID] = &user
					}

					users := make([]*model.User, len(ids))
					for i, id := range ids {
						users[i] = userById[id]
					}

					return users, nil
				},
			},
		})
		r = r.WithContext(ctx)
		next.ServeHTTP(w, r)
	})
}

func For(ctx context.Context) *Loaders {
	return ctx.Value(loadersKey).(*Loaders)
}

This dataloader will wait for up to 1 millisecond to get 100 unique requests and then call the fetch function. This function is a little ugly, but half of it is just building the SQL!

Now lets update our resolver to call the dataloader:

func (r *todoResolver) UserLoader(ctx context.Context, obj *model.Todo) (*model.User, error) {
	return dataloader.For(ctx).UserById.Load(obj.UserID)
}

The end result? just 2 queries!

SELECT id, todo, user_id FROM todo
SELECT id, name from user WHERE id IN (?,?,?,?,?)

The generated UserLoader has a few other useful methods on it:

You can see the full working example here.