jecrooks.com


Go, Avro, Singletons, and Templates

July 4, 2016

I've written a lot of Scala over the past eight months. It's not my favorite language, but it's up there, and it's done a lot to shape my view of software architecture and patterns in that time. More recently, I've been test driving Go for a new internal tool at work. As languages go, Go really could not be more different from Scala. There's good and bad aspects to the choices each of the languages have made, but what I'm interested in today is design patterns, not language comparisons. In fact, today's pattern is that old demon - the Singleton. Also, code generation with text/template.

When I first picked up a book on Scala a couple years ago, I recall reading that the Singleton pattern was built into Scala as a language. In my ignorance, I was horrified. Were Singletons not the design patterns of lesser software developers? Ah, hubris, my old friend.

Patterns have their place. Singletons are terrible because they permit 'spooky action at a distance' in software systems, often used as global repositories of state. Indeed, the road to Big Ball of Mud Hell is paved with such Spooky Singletons. Singletons in Scala can certainly be used that way, but so can implicit access to any top level variable, or really just uncontrolled access to shared state in any language. Even Rust will let you build a nightmare of mutable Arc<RefCell<T>>s that may be memory safe, but still a design disaster.

Okay, so a Singleton is a simple design pattern generally associated with OO-style. They're straightforward: there exists a single instantiation of a class, and all code gets a handle to that instantiation in place of the ability to make new instantiations. Simple enough, and straightforward to implement in the single-threaded case. The danger is that the pattern generally wants to be expressed by using a function to return the handle to the singleton, which means access can be grabbed anywhere without the 'evidence-trail' or swapability of Dependency Injection. Let's consider the Scala object, which is what encodes Singletons, and see how different variations play out.

In Scala, object declares a Singleton class. Unless declared private, this object can be imported from the package and can be used like a normal object instance. Additionally, an object can have the same name as a class in the same package, in which case it becomes the 'companion object' for the class, which has a few special properties that are out of the scope of this article. The simplest (and probably most common) use for objects are the Scala equivalent to static functions in Java, essentially acting as namespaces.


package com.jecrooks.examples

object Arithmetic {
  def add(x: Int, y: Int): Int = {
    x + y
  }
}
        
Now the add function can be accessed as Arithmetic.add(). This is pretty standard stuff, and this barely qualifies as a Singleton - semantically this is a Namespace, a feature I wish Go had beyond just package scoped names.

Since Scala objects are, well, objects, they can hold state. By default, objects are lazily instantiated. The major value of Scala objects over standard functions in a Namespace is the ability to run code and have state for the functions to work from at runtime. This is where immutability really shines. We can initialize immutable values in the object at runtime when the object is first accessed, and then re-use those immutable values. Consider the situation where you want to expose a Codec for transforming some data object between a binary serial format and an internal object representation. We want to expose a function to do the encoding and decoding (in practice this is a rather procedural style and one would probably take a more OO approach in Scala). Imagine that, at runtime, we need to read in a file that describes the schema of the binary format and initalize the Codec, and then re-use the Codec for the encoding/decoding process from then on.


package com.jecrooks.examples

object FooCodec {
  val schema = SchemaLoader.readFromFile("FooSchema.schema")
  def encode(f: Foo): Array[Byte] = {
    // encoding
  }
  def decode(bytes: Array[Byte]): Foo = {
    // decoding
  }
}
        

This might seem a bit contrived, but is actually the exact problem that kicked off this post, although in Go rather than Scala. We use Avro to serialize data between systems at Litmus Health. I'm test driving Go for building a new study design tool, which needs to serialize data to Avro format for communication with the server. I'm using LinkedIn's go-avro library for the serialization, which works well, but only exposes a few functions for generating codecs and record types from Avro schema files. What I want to do is expose a pair of functions for building records and getting access to a codec for each data type I want to encode. In Go, this looks something like:


package examples

import (
  "linkedin/goavro"
)

type Foo struct {
  // fields
}

func NewFooRecord() goavro.Record {
  // Parse schema file and build record
  return goavro.NewRecord(fooSchema)
}

func FooCodec() goavro.Codec {
  // Parse schema file and build codec
  return goavro.NewCodec(fooSchema)
}
        
Of course, we don't really want to be parsing files each time the function gets called - we really want to do that exactly once and then be able to generate new records and access to the codec for each data type. So, we're back to the Singleton pattern. For this particular variant, I'm going to use the Singleton as an internal implementation detail, and instead provide package functions. This keeps us safe from the Spooky Singleton - there's no global state being passed around or modified.

Singletons in Go

Let's start with implementing the Singleton pattern in Go. There's a good post on the subject here, so I won't go through the whole process. Instead, let's jump straight to the implementation using sync.Once.


package singleton

import (
  "sync"
)

type singleton struct {
  //fields
}

var instance *singleton
var initLock sync.Once

func GetInstance() *singleton {
  initLock.Do(func() {
   instance = &singleton{}
   // initialization code as needed
  })
  
  return instance
}
        
We're combining a couple of features here to implement the singleton. First, note that the singleton is in it's own package - by keeping the type private we prevent instantiation of other copies. In practice, this is a bad idea, it's difficult to work with unexported types, but we'll get to that in a moment. The rest is fairly straightforward if you've seen an implementation of the pattern in another language, like Java. We export a function that can return a handle to the instance, and allocate an instance of the singleton at package global (but private) scope. The function closes over it's lexical scope, so it can return the *singleton pointer. The sync.Once type from the standard library handles initalization. Once.Do(fn) takes a callback function fn as an argument, and executes it the first time Once.Do is called. After that, Once.Do is semantically a no-op, so the GetInstance function will initialize the singleton on first access, and simply return the instance pointer thereafter. Once.Do is thread-safe, so now we have a thread-safe, lazily initialized Singleton. This is perfectly safe for system-wide use in a read-only context, which is what we'll use it for in the Avro encoding problem. However, immutability is not languaged enforced, so discipline is still needed to prevent turning this pattern in the Spooky Singleton anti-pattern. Furthermore, you may still want to use Dependency Injection or another decoupling pattern at the call site to improve testing, as needed for your architecture.

Go-by-Template

Finally, solving the motivating problem. I'm going to make some adjustments to the above, in particular rather than exporting functions that return the singleton struct itself, I'm going to return field values with types that are exported, getting around the issue where the singleton return type is hard to work with. In this case, I'm not using the Singleton pattern to prevent the creation of new instances of types (which is irrelevant to the problem at hand), instead I'm interested in using the lazy initialization and having global access to the encoding function specialized for a specific schema. Finally, I'm going to do this once per avro schema. I've got 29 schema files, and that's just for the subset of the set Litmus uses that I needed to write my design tool. That's a lot of boilerplate, so we're going to use the closest thing Go has to metaprogramming, and write Go in templates, using the text/template library.

First, let's see the Avro part. I have a trivial Avro schema for a record with a single string field. If you haven't seen Avro before, the schema files are written in JSON, as seen below.


{
  "type": "record",
  "namespace": "com.jecrooks.examples.avro"
  "name": "EchoRecord",
  "fields": [
    {
      "name": "value",
      "type": "string"
    }
  ]
}
        
On my system the file is saved as EchoRecord.avsc. This schema file defines a record with the fully qualified name "com.jecrooks.examples.avro.EchoRecord" and holding a single field, named value and containing a string. So, exactly what you would expect. I'm using LinkedIn's go-avro library for handling Avro records. The library generates a generic goavro.Record struct with string-based setters and getters for the Avro fields derived from the schema, and builds a codec for encoding and decoding from the goavro.Record to []byte types. What I want to do is use the Singleton pattern to provide functions that generate new goavro.Records for each schema, and provide a handle to each of the codecs, without having to load the schema each time. For the particular application, a command-line tool that generates and uploads data to the server, this isn't quite important, but it is for turning the command-line application into a web-service, which is the plan. Plus, it generally bothers me to repeatedly parse a file if it's unnecessary.

I implemented the Singleton pattern as follows for the above EchoRecord type.


package schema

import (
  "fmt"
  "github.com/linked/goavro"
  "io/ioutil"
  "os"
  "sync"
)

type echoRecord struct {
  record  goavro.Record
  codec   *goavro.Codec
  initL   sync.Once
}

var echoRecordProto echoRecord

func (p echoRecordProto) init() {
  p.initL.Do(func() {
    fileContents, err := ioutil.ReadFile("/path/to/schema/EchoRecord.avsc")
    if err != nil {
      fmt.Println("Error reading file '/path/to/schema/EchoRecord.avsc'.")
      os.Exit(1)
    }

    codec, err := goavro.NewCodec(string(fileContents))
    if err != nil {
      fmt.Println("Error building codec from file '/path/to/schema/EchoRecord.avsc'.")
      os.Exit(1)
    }

    p.codec = &codec

    schema := goavro.RecordSchema(string(fileContents))
    p.record, err = goavro.NewRecord(schema)
    if err != nil {
      fmt.Println("Error initializing record prototype from file '/path/to/schema/EchoRecord.avsc'.")
      os.Exit(1)
    }
  })
}

func NewEchoRecord() goavro.Record {
  echoRecordProto.init()
  return echoRecordProto.record
}

func EchoRecordCodec() *goavro.Codec {
  echoRecordProto.init()
  return echoRecordProto.codec
}
        
There's a couple of important things to note about this snippet. First, my response to errors at any point is simply to crash (log.Fatal should work here as well, but knowing the file that caused the error is nice at each step). This is a command-line application and an internal tool, so crashing is the most expediant and, frankly, correct, path of action in this context. This would obviously be unacceptable as a webservice, however. Second, there's a lot of literals, in particular the filename is written out in full each time as a literal string, not stored in a variable. This example was transcribed from the actual code in the app, which as we'll see, is generated by a build tool, not hand-written, hence the literals. Finally, the return types for the functions are types exported from the goavro library, which fixes the unexported type problem seen earlier. The codec is actually a singleton, in that a pointer to the same codec is returned each time. However, the record is a little more interesting and explains the 'Proto' naming. Since Go is pass-by-value, returning the record field from the struct actually returns a copy of the value, in this case a new empty record with the correct schema set, which is exactly what I initially intended. The echoRecordProto struct stores a prototype of the record, and uses pass-by-value semantics to make new copies for use. This approach should work in C, C++, or Rust, but would require explicit copies in pass-by-referance default languages, such as Java.

Now we can use the Avro records and codec easily in the rest of the program through the exported access functions. There's one final problem however: writing the above code for each Avro record is extraordinarily tedious. The final piece is to implement the Avro access code above with code generation as a build step. Go has an excellent text templating engine, which makes it quite reasonable to write a Go code generation tool in Go. To do this, I added another directory called 'build' in my project directory. In my current project, builds are handled by a simple shell script, so the whole build process becomes a matter of building and running the code generation script, followed by building the main project. I didn't use go generate, because in my understanding, go generate re-writes extant files using special comment tags, but here I want to generate multiple new files instead.

The code generation script looks like


package main

import (
  "fmt"
  "log"
  "os"
  "path"
  "path/filepath"
  "strings"
  "text/template"
)

const goschemaTemplate = `package schema

import (
  "fmt"
  "github.com/linkedin/goavro"
  "io/ioutil"
  "os"
  "sync"
)

type {{ .TypeName }} struct {
  record  goavro.Record
  codec   *goavro.Codec
  initL   sync.Once
}

var {{ .TypeName }}Proto {{ .TypeName }}

func (p {{ .TypeName }}Proto) init() {
  p.initL.Do(func() {
    fileContents, err := ioutil.ReadFile("{{.FilePath}}")
    if err != nil {
      fmt.Println("Error reading file '{{.FilePath}}'")
      os.Exit(1)
    }

    codec, err := goavro.NewCodec(string(fileContents))
    if err != nil {
      fmt.Println("Error building codec from file '{{.FilePath}}'")
      os.Exit(1)
    }

    p.codec = &codec

    schema := goavro.RecordSchema(string(fileContents))
    p.record, err = goavro.NewRecord(schema)
    if err != nil {
      fmt.Println("Error initializing record prototype from file '{{.FilePath}}'")
      os.Exit(1)
    }
  })
}

func New{{.SchemaName}}() goavro.Record {
  {{.TypeName}}Proto.init()
  return {{.TypeName}}Proto.record
}

func {{.SchemaName}}Codec() *goavro.Codec {
  {{.TypeName}}Proto.init()
  return {{.TypeName}}Proto.codec
}
`

type SchemaTemplater struct {
  SchemaName  string
  TypeName    string
  FilePath    string
}

func NewSchemaTemplater(fp string) SchemaTemplater {
  st := SchemaTemplater{}
  st.FilePath = fp

  filename := filepath.Base(fp)
  st.SchemaName = strings.Replace(filename, ".avsc", "", 1)
  st.TypeName = strings.ToLower(st.SchemaName)

  return st
}

func WriteProtoFile(fp string, tmpl *template.Template) {
  st := NewSchemaTemplater(fp)
  targetPath := strings.Replace(fp, ".avsc", ".go", 1)
  target, err := ps.Create(targetPath)
  if err != nil {
    log.Fatal(err)
  }

  defer target.Close()

  tmpl.Execute(target, st)
}

func main() {
  buildDir, err := filepath.Abs(filepath.Dir(os.Args[0]))
  if err != nil {
    log.Fatal(err)
  }

  rootDir := path.Dir(buildDir)
  schemaDir := path.Join(rootDir, "avro/schema")
  schemaFiles, err := filepath.Glob(schemaDir + "/*.avsc")
  if err != nil {
    log.Fatal(err)
  }

  tmpl, err := template.New("SchemaTemplater").Parse(goschemaTemplate)
  if err != nil {
    log.Fatal(err)
  }

  for _, schemaFile := range schemaFiles {
    WriteProtoFile(schemaFile, tmpl)
  }

  fmt.Printf("Wrote %d goavro access files.\n", len(schemaFiles))

}
        
There's a bit here, but it's pretty straightforward. The template is a raw string that replicates the earlier code we saw for providing access functions, except with filepaths and type names replaced with simple template hooks. The program itself looks for a directory called avro/schema in the project root directory, which is assumed to be the parent directory from which the build file is executing. The avro schema files are all those in the avro/schema directory with ".avsc" extentions, which are loaded and used as template arguments. Notably, the filenames are used to generate the schema access functions, not the name given to the avro schema in the schema files themselves. This partially for simplicity, but also since all of our .avsc files are themselves generated from .avdl, Avro protocol description language files. The Avro tool that does the generation names the .avsc files according to the schema name, so in practice these should always match.

And that's it, we can generate access functions for all of the Avro schemata in the system. Building is as simple as using a shell script that compiles and executes the code generator before building the main program, and Go's performance and compilation speed make this quite fast and pleasant to use.

Go is still not my favorite language. There were several points throughout this process (and when writing the main project this was done for) that I wished I had some tool or feature from Scala or Rust. Nonetheless, there is a certain pleasure in the simplicity of the language, and it definitely feels much easier to simply get something written and working than in many other languages with a larger design space. Code generation initially felt quite hacky, but the text/template library is nice to use, and the code generation portion turned out to be very satisfying to get working, especially after I wrote several schema access set-ups and functions by hand. I can certainly understand why the language has caught on so much.