Thoughts on the repository pattern in Golang

TL;DR

The Repository pattern is commonly used in DDD / Clean Architecture / Hexagonal Architecture projects. However, porting a Java/C# reference implementation of this pattern to Golang is not as straightforward as it seems. By forcing ourselves to avoid the ‘questionable choices’ common in many examples of this pattern in Golang, we can arrive at an interesting variant of the pattern.

Intro

I stumbled across a dodgy hybrid technique for implementing the repository pattern in Golang that I think it quite useful and practical, especially for smaller projects.

It:

Doesn’t require any magical ORMs
Borrows the event driven modelling focus of most “Domain Driven Design” thinking, without committing to the full CQRS/ES journey that usually accomanpanies DDD discussion online
Avoids Javaisms, and is (hopefully) a bit closer to Golang (i.e. no magic Hibernate or Spring Transactions)
Works for in-memory examples just as well as database backed ones (I have found that many examples use an in-memory DB, i.e. a map, to attempt to remain simple, but they could never work in real-life)

In essence, I found that if you:

Model all command-like mutations as functions on the repository interface (i.e. HandleChangeName() and HandleDelete()); and
Return structs from functions on the repository, as opposed to returning pointers or interfaces see Accept interfaces, return structs.

you get to a design which:

has some of the design pressure of Command Query Responsibility Segregation, by forcing commands to be functions on the repository, but retains the common synchronous usage pattern;
forces you to deal with mutation in a safer way (especially in the pointers & non-OO Golang world), by encouraging you to return copies of objects (thus encourage small short lived objects that work best with GC a la pg 5, Section 4.1 of LMAX Distruptor technical paper and Functional core, imperative shell); and
it makes the responsibilities of the repository much clearer and domain-oriented than the standard generic CRUD operations.

Overview of what is to come

This is a bit of a rambling post, as I think it is important to capture the background reading and thinking that gave rise to this implementation pattern. I will address:

A summary of what I think the implementation looks like, in Go, with comments about the choices I’ve made
A discussion of the canonical texts and discussions on the pattern and the various design philosophies
A quick review of some of the other examples and blogs of this pattern in Go

I would like to continue working on it if I get some time, as I am wondering if there isn’t something more closely aligned with the hard CQRS design espoused by Greg Young, Udi Dahan, et al and demonstrated (at least in principle) in the Simple CRQS Example (C#), but with goroutines and channels. There could also being something to be gained from adopting some of the ideas from the LMAX Disruptor presentations, around avoiding race conditions by having a single low-latency thread at the core of the architecture.

In a nutshell

My interpretation of ‘what’ the repository pattern should be reponsible for is important. I will discuss this more later in this post, but for now it is worth highlighting that I put a lot of stock in the Eric Evans / Domain-Driven-Design / Greg Young (CQRS/ES) view that repositories are the guardians of access to the aggregate roots - that they are responsible for accessing and controlling modification to objects with ‘identity’.

I am not that interested or concerned with the FindX() methods that tend to be found on a repository. Perhaps a repository with lots of FindX() functions is a smell, perhaps it is a pragmatic place to put the code. What I think is more important is to worry about how your objects mutate (that’s what all many of these FP/DD/programming discussions are about….figuring out how to manage the mutations). Additionally, I think there are plenty of resources discussing the ‘FindX()’ behaviours and techniques of repositories, especially in Patterns of Enterprise Application Architecture.

I am going to focus on designing repository functions to help clarify:

Creating an object
Modifying an object
Deleting an object (often omitted, but usually very instructive!)

How I implemented it

I am going to use a boring example of a User object that can have some values like its name, email or loyalty changed. Additionally, our repository is going to be an in-memory only example (but could easily use a database instead, or a hybrid of both). Our collection of User objects must be unique on both name and email, just for added complexity.

This is not a fully committed argument….but I’m proposing that you define your repository interfaces like so:

package main

import (
    'fmt'
)

type UserId int

type User struct {
    Id      UserId
    Name    string
    Email   string
    IsLoyal bool
}

type UserRepo interface {
    All() []User
    //GetById(id UserId) (User, error)
    //GetAllLoyal() []User
    //... etc, as much as you like

    RequestCreateNew(name, email string) (User, error)
    RequestChangeDetails(id UserId, name, email string) (User, error)

    RequestUpgradeToLoyal(id UserId) (User, error)
    // if you wanted also: RequestDowngradeFromLoyal(id UserId) (User, error)

    RequestDelete(id UserId) error
}

I think the intent behind this interface is relatively straightforward. The commented out functions are examples of other extensions that might appear in the future.

Some points to note:

We could elect to define a more DDD-esque NextUserId() function on the interface, and force the client code to incorporate this into its calls to RequestCreateNew(), by passing the id we want to use to create the new user with. For the purposes of this post, it is not bad to have the Repository implementation handle this internally.
The Request prefix is not totally necessary, but it does make it clear which methods perform mutations

The first part of the implementation might then look like:

type InMemoryUserRepo struct {
    nextId UserId
    byId   map[UserId]User
    s      sync.RWMutex
}

func NewUserRepo() *InMemoryUserRepo {
    return &InMemoryUserRepo{
        nextId: 1
        byId: make(map[UserId]User),
    }
}

func (repo *InMemoryUserRepo) All() []User {
    repo.s.RLock()
    defer repo.s.RUnlock()

    userList := make([]User, 0, len(repo.byId))

    for _, value := range repo.byId {
        userList = append(userList, value)
    }

    return userList
}

These define the basic elements of data storage and the simple All() method. The following choices are relevant:

I originally used a slice for this article, but changed to use a map because I think it makes many of the other functions more clear. This choice should be motivated by the amount of data, where it is stored and how it is typically accessed.
The All() function creates a slice and copies items into it. This might not be the best choice, but it is probably not terrible either. How inappropriate it is probably depends on the size of the User struct and the the number of items in the map. But lets just say that computers are really fast, and have lots of memory so it is likely to be fine. If you have memory or CPU restrictions, or a lot of data well then you probably need to worry about using an in-memory repo anyway and/or if you’ll have to implement some sort of pointer sharing or iterator object and worry about the concurrency issues etc (and so you are probably in a good position to judge the suitablity of this idea anyway).
The order of users returned is unstable, as the map is not guaranteed to return objects in any given order. If this is a problem, we could change the underlying structure to a list, or sort the map keys before constructing the userList or sort the resulting userList before returning it, or…

The `CreateNew()` and `Delete()` are next to implement:

func (repo *InMemoryUserRepo) RequestCreateNew(name, email string) (User, error) {
    repo.s.Lock()
    defer repo.s.Unlock()

    for _, existing := range repo.users {
        if name == existing.Name {
            return User{}, errors.New("user with same name already exists")
        }
        if email == existing.Email {
            return User{}, errors.New("user with same email already exists")
        }
    }

    user := User{
        Id: repo.nextId,
        Name: name,
        Email: email,
    }

    repo.byId[user.Id] = user
    repo.nextId += 1

    return user, nil
}

func (repo *InMemoryUserRepo) RequestDelete(id UserId) error {
    repo.s.Lock()
    defer repo.s.Unlock()

    _, ok := repo.byId[id]
    if !ok {
        return fmt.Errorf("user does not exist for id: %d", id)
    }

    delete(repo.byId, id)
    return nil
}

These simple implementations illustrate a few things:

The repository (not the User object) is the boundary for enforcing collection-oriented constraints, like the uniqueness of names and emails.
The repository is responsible for managing its own id sequence, and it is the only place that these get created.

These functions could easily be extended to call ‘domain’ behaviours on the underlying objects, allowing a place for the rich domain model functionality to live in OO-esque bliss. For example:

func (user *User) MagicDomainSpecificDelete() error {
    if user.Name == "Andrew Dodd" {
        return fmt.Errorf("Andrew Dodd[id: %d] cannot be deleted", user.Id)
    }
    // do any other processing, like redacting GDPR data
    user.Name = "REDACTED NAME"
    return nil
}

func (repo *InMemoryUserRepo) RequestDelete(id UserId) error {
    repo.s.Lock()
    defer repo.s.Unlock()
    
    found, ok := repo.byId[id]
    if !ok {
        return fmt.Errorf("user does not exist for id: %d", id)
    }

    if err := found.MagicDomainSpecificDelete(); err != nil {
        return err
    }

    delete(repo.byId, id)
    return nil
}

Finally some examples of mutations:

func (repo *InMemoryUserRepo) RequestChangeDetails(id UserId, name, email string) (User, error) {
    repo.s.Lock()
    defer repo.s.Unlock()

    user, ok := repo.byId[id]
    if err != nil {
        return User{}, fmt.Errorf("user does not exist for id: %d", id)
    }

    user.Name = name
    user.Email = email
    repo.byId[id] = user

    return user, nil
}

func (repo *InMemoryUserRepo) RequestUpgradeToLoyal(id UserId) (User, error) {
    repo.s.Lock()
    defer repo.s.Unlock()

    user, ok := repo.byId[id]
    if err != nil {
        return User{}, fmt.Errorf("user does not exist for id: %d", id)
    }

    user.IsLoyal = true
    repo.byId[id] = user

    return user, nil
}

These are similar to the CreateNew() and Delete() functions.

(For a working example look that uses lists instead of maps, look at this Go Playground).

What do you get

You get an implementation that:

captures a descriptive interface to the inner models
is an in-memory example that includes a ‘delete’ example
is easily changed or extended to use a different persistence option and/or a hybrid of multiple persitence options (e.g. a DB for long-term data, but in-memory for data that doesn’t need to be up to date all the time or can be missing for a little while after boot-up…like a “latest status message”).

The main issues that I see with this style of implementation is:

it does not really attempt to get much code reuse at the repo/persistance layer (i.e. often in the Java/C# world the repository implemenations are part-abstract, part-concrete, ORM-like in an attempt to write the complex database management code only once); and
it does not really adhere to any of the DDD and/or CQRS-land emphasis on making a rich domain model that implements the core behaviour.

First, it is my opinion that code reuse is probably something that should not be a core goal for application developers, and I think that this original tenet of the strength of the OO paradigm has probably done more harm than good. This is especially true of class based inheritance (something that Go does not have). I’ve come to believe that a bit of copy-pasting is probably a good thing, and that developers should try to find resuable ‘utility-style’ functions rather than inheritance hierarchies. (NB: Almost all engineers seem to love taxonomy crusades, believing they will discover the file-naming structure to rule them all…developing type hierarchies is just the same race into useless abstraction…).

Second, I found that it is often difficult to motivate these ‘rich domain models’, especially in basic examples. But that changing the repository implementation from being the thing that ‘does’ the work, to being the place that finds the domain object and arbitrates the commands to it aligns this design a bit more closely to the one presented by Greg Young in m-r.

For example, instead of having an implementation like:

func (repo *InMemoryUserRepo) RequestUpgradeToLoyal(id UserId) (User, error) {
    repo.s.Lock()
    defer repo.s.Unlock()

    user, ok := repo.byId[id]
    if !ok {
        return User{}, fmt.Errorf("user does not exist for id: %d", id)
    }

    user.IsLoyal = true
    repo.byId[id] = user

    return user, nil
}

We could have a more complex behaviour like so:

type UpdateLoyaltyStatus struct {
    data interface{} // whatever is needed for this command
}

func (user *User) HandleUpdateLoyaltyStatus(cmd UpdateLoyaltyStatus) (User, error) {
    // Do whatever is needed
    // ...
    return aNewUserStructIncorporatingChanges, nil
}

func (repo *InMemoryUserRepo) RequestUpgradeToLoyal(id UserId, cmd UpdateLoyaltyStatus) (User, error) {
    repo.s.Lock()
    defer repo.s.Unlock()

    existing, ok := repo.byId[id]
    if !ok {
        return User{}, fmt.Errorf("user does not exist for id: %d", id)
    }
    userWithUpdatedLoyalty, err := existing.HandleUpdateLoyaltyStatus(cmd)
    if err != nil {
        return User{}, err // Error from the domain object
    }

    repo.byId[id] = userWithUpdatedLoyalty

    return userWithUpdatedLoyalty, nil
}

Where the func (u *User) HandleUpdateLoyaltyStatus() function is somewhere between the event-generating ‘command’ functions and the ‘event’ handling “RequestXXX()” functions of the m-r example. This would allow the User domain model to perform complex behaviour on the in-memory object graph (so long as it returns a ‘new’ object, and does not mutate itself) and return the updated representation for the repository to store. If there were errors (like the request was invalid), then the repository would not update itself.

And now for the background

Introduction & Context

The Repository pattern is commonly used and discussed in many of the design commentaries I have read, including Eric Evans’ Domain Driven Design, Alistair Cockburn’s Hexagonal Architecture (and here); Martin Fowler’s Patterns of Enterprise Application Architecture; and Robert Martin’s Clean Architecture.

Exactly what form your implementation takes depends a lot on what language you are using, what architectural patterns you are choosing to use and how much you want your repository to hide the details from your client code.

I found that most of the exhaustive discussions and good implementations are from the C# & Java, ‘hard OO’ world. I also found that most of the Golang implementations are naive ports of Java/C# implentations that seem to forget that often there was an ORM or some other framework providing functionality critical to the functioning of the implementation.

Additionally, I have a lot of time and respect for the ‘design process’ detailed by Eric Evans in DDD, and find most of the DDD discussions/texts are in agreement that the ‘repository’ acts as the gatekeeper for your ‘aggregates’….but that the focus in blogs and post on CQRS and Event Sourcing tends to crowd out other design options and discussion.

Architectural approach background

Whether these are the right way or not to implement applications in Go, I found that the ‘depend on the details’ / Inversion of Control patterns espoused by the ‘Clean Architecture’ and ‘Hexagon Architecure’ discussions are very helpful for framing and structuring an application. This is especially true at the start of an application, when you are not quite sure about how everything will fit together. Perhaps these techniques lead to bit too much indirection, but they do force you to identify flows of control and to avoid depenency cycles (especially if you are working on separate parts of an application concurrently, that will eventually have to interact, but do not do so yet). I found that structuring a recent application in this way lent itself to using the Repository pattern well.

I found the following blogs, discussions and GitHub projects to be invaluable in thinking about these things:

Manuel Kiessling’s Applying the clean architecture to Go applications - A good run down of how to structure a go application inline with the ‘Clean Architecture’ ideas. Yes, the blog has a few issues with consistency and is very long, but I think this is its key strength…it is long enough to start to exploring real issues. In particular, this image is an excellent companion to the post, laying out the code in its general position within the concentric layers.
Iman Tumorang’s Trying Clean Architecture on Golang - Another good run down on how to structure a go application inline with the ‘Clean Architecture’ idea. The example here uses a repository as a key data access component, implementing a Fetch() / Store() style interface. Although it is not really meant to be a fully-fledged application, so I feel a bit bad picking on it, I feel that it is warranted to discuss it given the pervasiveness of the CRUD-style repository implementation.
Marcus Olsson’s Go DDD resource (GitHub Project, Blog posts, and Talk - I found this during a “heavy DDD devotee” phase. I found the GitHub project and the talk video extremely helpful, but also a little troubling. Marcus has done a commendable job porting the Citerus DDD example Java app to Golang. However, it is the faithfulness to the Java implementation that I think illustrates best the issues with the typical CRUD-style repository implementations in Golang.

Repository Pattern Background

In reality, the style of repository I am presenting is not very consistent with the common examples discussed. I have focussed on the ‘store’ side of a repository, and not at all on the ‘query’ side. The major references I have found have the following:

PoEAA - In this, the “Repository” pattern is presented by Hieatt and Mee. It actually focusses on ‘isolat[ing] domain objects from [the] details of the database access code’ (pp 322), and is only concerned with querying. In general, much of what is discussed is generally thought of belonging to an ORM (I may be mistaken but this book was published around the time libraries like Hibernate were gaining traction in the Java world, so perhaps this chapter is a bit out-dated now).
Domain Driven Design - In this, Evans mostly focusses on querying. He has an example about how the repository is responsible for ‘storing’ an object (pp. 158), but it is not discussed much. (NB: The heading on pp. 154, “client code ignores repository implementation; developers do not”, is instructive here - forcing me to consider if all the thinking I’ve done might be a waste! - but ultimately the question around ‘storing’ objects is not dealt with in detail. However, on pg. 155 of DDD, Evans suggests that you “leave transation control to the client”. This is (IMO) at odds with the client code ignoring the implementation. This is especially true in the ‘in-memory’ model of a repository, where there is no standard mechanism for the client code to establish a transaction that can be rolled back. I.e. if you are required to make a change to a number of objects within the aggregate, but the last change is invalid, it is difficult to go back and ‘undo’ changes in memory.
The Citerus DDD example Java app - This is a project that supplies a sort-of reference implementation to many of the examples and ideas discussed in DDD. It is a great companion to the book in many ways. The repository definitions in this project typically have a “Store” method, thus suggesting to me that the typical DDD-interpretation of the repository also includes the ‘save / update / delete’ responsibilities.
The Greg Young M-R sample DDD app, and his many talks - This also includes repository definitions that are responsible for storing mutations to AggregateRoot objects. However, this is always through the CQRS/ES pattern, where repositories implement a method like void Save(AggregateRoot aggregate, int expectedVersion);. Maybe this is a bit of an unusual situation, as the CQRS/ES pattern tends to dictate how mutation has to occur - through the one ‘emit events’ hammer. The concept of creating a queue of events to apply to the aggregate roots in turn is similar to what I am trying to attempt here. Perhaps by extracting the sync locks and replacing the concurrency control with goroutines & channels I could achieve something similar.

In the wild, both from the Go world and others, the implementation of the repository usually have very-CRUD oriented implementations (often also including List and Query). All mutation is forced through Create, Update, or Delete; or alternatively the more opaque Store(). These repository implementations do not really differ much from ORMs, and I think they are really too generic to be considered Repositories in the DDD sense, as they are more SQL-wrapper-tooling than business logic / application logic representations.

Anyway, enough prosthelising, it’s time for some examples…

First, a canonical Java example

The CargoRepository here interface definition in the Citerus DDD example:

public interface CargoRepository {
  Cargo find(TrackingId trackingId);
  List<Cargo> findAll();
  void store(Cargo cargo);
  TrackingId nextTrackingId();
}

The interface does not have a very descriptive interface and generally assumes that the underlying implementation will be able to handle any data consistency issues when store() is called. The implementation uses both the Spring Framework @Repository annotation (JavaDoc) and the Hibernate ORM. I am not sure there is much to say here, except that the application developer here is relying heavily on libraries to provide the necessary functionality, which is a great idea :-), but is not possible in Golang :-(.

A very generic example in Golang

The interface definition of this GitHub project looks like this:

type Repository interface {
    List    func(dest interface{}, query string) error
    Get     func(ID string, dest interface{}, query string) error
    Create  func(data interface{}, query string) error
    Update  func(ID string, data interface{}, query string) error
    Delete  func(ID string, query string) error
}

As you might expect from this interface, the implementation actually gets a bit cornered by its own definition. This is about as close to a generic definition as you could get in Golang. However, it forces the user to pass through the SQL (which is then internally handed off to SQLX anyway!), so it is certainly a bunch of busy work. I don’t think this was ever intended to be a real-life example, but it excellently demonstrates how difficult is in Go to create generic interfaces like you can in Java/C#.

The Golang Java ports

The following two examples are similar enough to present and then discuss together. First, the Repository interface from Iman Tumorang’s Trying Clean Architecture on Golang :

// https://github.com/bxcodec/go-clean-arch/blob/master/article/repository.go
type Repository interface {
    Fetch(ctx context.Context, cursor string, num int64) (res []*models.Article, nextCursor string, err error)
    GetByID(ctx context.Context, id int64) (*models.Article, error)
    GetByTitle(ctx context.Context, title string) (*models.Article, error)
    Update(ctx context.Context, ar *models.Article) error
    Store(ctx context.Context, a *models.Article) error
    Delete(ctx context.Context, id int64) error
}

And second, the CargoRepository interface from Marcus Olsson’s Go DDD port:

// https://github.com/marcusolsson/goddd/blob/master/cargo.go
// CargoRepository provides access a cargo store.
type CargoRepository interface {
    Store(cargo *Cargo) error
    Find(id TrackingID) (*Cargo, error)
    FindAll() []*Cargo
}

I am unsure about the pedigree of the first example, but I believe it has the same problems as the second and so think it is also likely to be a Java inspired implementation.

The implementations provided for both of these interfaces are quite different, but have similar problems.

1. Using pointers instead of structs or interfaces

The Marcus Olsson implemention chooses to have the repository hand out its own internal pointers. This means that the Store() function is actually optional, as the pointer you get from Find() is actually the very same one the Repository is meant to be hiding from you. I wrote a cutdown example here which clearly illustrates this issue! Additionally, it means that concurrent code operating on these object would likely interfere with each other! This is the thing that troubled me the most, as I was such a fan of this project until I noticed this issue. It is a classic example of why you always need to think hard about pointers.

The Iman Tumorang code provides a MySQL backed implemenation that is overall pretty good. I am not really sure why they elect to accept and return pointers to the Article structs (perhaps it is a little more efficient?). This implemenation does not face the same issue as the Olsson one, as every repository function returns the address of a struct is created within its call stack. However, the use of pointers might accidentally result in an implementation like the Olsson one, e.g. for an ‘in-memory caching’ middleware that wraps the MySQL repo to help improve performance in the future?

2. Always writing the whole object

Neither of these interfaces provides a way for the repository to know what has changed within the object. This makes is almost impossible for the repository to reduce work by skipping parts have have not changes (ORMs often have considerable amounts of logic dealing with if an object is dirty or not). Additionally, neither implementation deals with concurrent editing (i.e. has the DB been touched since I retrieved my copy of the data), instead they just overwrite the current record with the Store/Update() request (this is perhaps because the interface does not make it clear that concurrent edits are possible).

The Java examples typically solve this with the Spring Framework @Transactional and/or Hibernate protections (something not available in Go).

I guess my proposal does not really stop you from making this same mistake, but I found that the command-oriented interface encouraged me to design my database tables more along the lines of sequential records of edits, similar to an event store, which actually avoids this mistake.

Closing remarks

Well….what a journey. I am not even sure this tells a coherent story any more. Through the writing and editing of this post I have come to the conclusion that my proposed idea is really trying to have its cake (by holding onto the synchronous / fully-consistent read-write model) and eat it too (by trying to use the CQRS/ES ideas about asyncronicity and eventual consistency).

The core idea of the CQRS thinking is that requiring synchronous response is problematic for scale, and that the separation of read and write allows better scaling and “eventual consistency”. Unfortunately it is poorly suited to a) how most programmers think, b) how most users understand computers (even if it is excellent for real-world domains), and c) how most tooling/delivery/HTTP works.

What I have proposed will not get the scalability gains by spliting the read-write model, but it will illustrate the seams in your domain and allow you to do poor-man’s (or Doddgy, lol) DDD. If anyone reads this or has comments, I’d love to hear them.

Posted on Jan 17, 2019 at 00:00

#go #golang #design #repository #design patterns

Andrew Dodd