GraphQL is great for straightforward CRUD operations, but it can be tricky when implementing non-standard queries. For example, handling operations that aren’t querying for a field on a GraphQL object that corresponds to a column on a database table. Aggregation is one of those cases where it’s not immediately obvious as to how to translate a relatively easy task in an ORM in a REST API to GraphQL. You can add a field to a model for the aggregation, but that has the risk of N+1 queries if you’re querying for multiple objects.
The first time we had a need for aggregation in GraphQL was to add a field containing a count to one of our GraphQL objects. On Handshake, students can leave reviews for jobs they’ve worked at, and we want to aggregate all of those reviews to show a count of how many there are for each employer. This will be used in a view where multiple employers are returned at once, so we also had to avoid making N+1 queries for the reviews.
Sounds simple enough, right?
Fortunately, there actually is a simple approach using Lazy Execution in GraphQL Ruby combined with the BatchLoader gem (Or one of the other gems that does the same thing.) The BatchLoader prevents N+1 queries from being made for the reviews for each employer, and returns an object back to GraphQL Ruby that can be lazily resolved.
This approach is applicable when you’re specifically trying to avoid making N+1 queries. If that’s not a concern, you can skip all of this and make the query directly with ActiveRecord as with any other field on a GraphQL object.
How it’s done
The first step is to create a new BatchLoader subclass:
The file can be anywhere in your GraphQL project. Then, add this line into your GraphQL Schema class to tell GraphQL Ruby that when a SizeBatchLoader object is returned, it should be lazily resolved by calling its sync method:
The last step is to use the new BatchLoader subclass in the field on a GraphQL object where it’s required. If you’re already familiar with how the BatchLoader is used, this won’t be very different. Our reviews are stored in a JobReview model with the only relevant column for this case being employer_id. This is what we'll be using the BatchLoader to query for in this example:
This approach isn’t without its downsides.
You may have noticed that we’re calling size on the result in our BatchLoader subclass and not count; this is because we can’t use a COUNT query in the database if we want to also avoid N+1 queries. We have to load all of the results into memory from a single query to count them using the BatchLoader.
This is a classic tradeoff between time complexity versus space complexity.
In our case, the maximum number of reviews for any one employer is going to be in the thousands, and we’re only selecting a single Integer column from the database. It’s a reasonable amount of memory usage and outweighs the cost of having to perform a separate query to count the reviews for each employer.
Photo by Christopher Gower on Unsplash