Strong Types and their Impact on Testing

In this post (series?), I’d like explore how to write a program in a type safe and testable way. I’m using Scala because it’s the language I’m most proficient in; this would be prettier and less boilerplatey in Haskell.

The basic point that I hope to get across in this post (and the potential follow-ups) is that by encoding our domain into strong types and avoiding side effects we can accomplish a few things:

  1. We can greatly limit the scope of our tests
  2. For the tests that still make sense to write, we can write very strong tests
  3. By using meaningful values and separating side-effecting functions from pure ones, we can more easily reason about our program

We’re going to write a simple service that takes a form filled out by a user, sends an email based on that form, and then records that event to a database.

Here’s a possible data model for our email and a form.

package com.notik

case class EmailAddress(address: String)
case class FromAddress(address: EmailAddress)
case class ToAddress(address: EmailAddress)
case class EmailBody(body: String)
case class Recipient(name: String)
case class Email(from: FromAddress, to: ToAddress, body: EmailBody, recipient: Recipient)
case class WebForm(from: FromAddress, to: ToAddress, body: EmailBody, recipient: Recipient)

Right away, we’ve gained something. Not a ton, but something.
If we had encoded the idea of an Email as Email(fromAddress: String, toAddress: String, body: String, recipient: String) then we can easily shoot ourselves in the foot. We might mistakenly write Email("hello", "jason", "jason@gmail.com", "body of email"), for example. We’ve mixed up all of the parameters and we only know things have gone wrong once we run our program and it blows up for InvalidEmailAddressException or whatever. Worse, it might not blow up at all and instead our program is just wrong.

What we should do is encode our domain as strongly as possible and let the type system/compiler do as much work as possible.

Ok, so with our case classes we have separate types representing the different parts of an email. But, we’re probably creating this Email by populating it from the fields of some web form and there’s still nothing preventing us from mixing up the strings and getting incorrect data, right?

This is always a possibility and users might input bad data altogether. Ok, so let’s encode this into our types. Firstly, we’re going to create a function which takes in a WebForm and produces an Email. But instead of producing an Email directly, we’re going to produce “possibly” an email. We’re not going to assume we have an email and wait for runtime exceptions nor are we going to do validation checks and simply halt in the case of invalid input. Instead, we’re going to encode the notion that something could always go wrong when submitting the form. See EmailService.mkEmail. It takes a WebForm and returns a ValidationNel[String, Email]. If the form is incorrect (as defined by our domain logic) then we get back a list of all the errors. If nothing is wrong, we get an Email.

object EmailService {

  import scalaz._
  import Scalaz._
  import argonaut.Parse

  /*
    Validation represents the idea of success or failure. NEL stands for non-empty list. We can actually require
    that the list of errors we get back in the case of failure is non-empty.

    This is the first step towards a more strongly typed way of handling web form submissions. We encode the notion
    of failure in our type. We no longer have simply an Email as a result of a WebForm. Instead, we are forced, at
    the level of the type system, to deal with the possibility of failure.
   */

  val emailRegex = "^[_A-Za-z0-9-+]+(.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(.[A-Za-z0-9]+)*(.[A-Za-z]{2,})$".r

  def mkEmail(form: WebForm): scalaz.ValidationNel[String, Email] =
    (validateFrom(form.from).toValidationNel |@| validateTo(form.to).toValidationNel |@| validateBody(form.body).toValidationNel |@| validateRecipient(form.recipient).toValidationNel)(Email.apply)

  def validateFrom(from: FromAddress):Validation[String, FromAddress] =
    emailRegex.findFirstIn(from.address.address)
     .map(_ => from.success)
     .getOrElse(s"$from is not a valid email address".fail[FromAddress])

  def validateTo(to: ToAddress):Validation[String, ToAddress] =
    emailRegex.findFirstIn(to.address.address)
     .map(_ => to.success)
     .getOrElse(s"$to is not a valid email address".fail[ToAddress])

  def validateBody(emailBody: EmailBody): Validation[String, EmailBody] =
   if(emailBody.body.nonEmpty) emailBody.success
   else "body of the email cannot be empty".fail[EmailBody]

  def validateRecipient(recipient: Recipient): Validation[String, Recipient] =
    if(recipient.name.nonEmpty) recipient.success
    else "recipient name cannot be empty".fail[Recipient]

  def emailFromJsonForm(json: String):Validation[NonEmptyList[String], Email] = for {
    form <- Parse.decodeValidation[WebForm](json).toValidationNel
    email <- mkEmail(form)
  } yield email
}

With this in place, if we tried creating an email out of a form with something like
from = "gmail.com", to "@gmail.com", body = "", recipient = "Levi"
we’d get a list of errors like this:
NonEmptyList(FromAddress(EmailAddress(gmail.com)) is not a valid email address, ToAddress(EmailAddress(@gmail.com)) is not a valid email address, body of the email cannot be empty)

Our ultimate goal is to send an email and log a record of that. But we’re still dealing with just our data/model at this point.

Ok, so how do we get this WebForm? For our example, we’ll assume a user is filling out some input fields which will then be posted as JSON. If we were doing it through query params or some other way, the same general principles would apply.

On to JSON. We’ll use the Argonaut library to deserialize some JSON data into our WebForm type.

import argonaut._, Argonaut._

/*
We need a codec that defines, in a *type safe* way, how to decode our JSON into our WebForm class. We'll put our codecs in the respective companion objects so the implicits can be found without additional imports.
*/

object WebForm {
  implicit def WebFormCodecJson: CodecJson[WebForm] =
    casecodec4(WebForm.apply, WebForm.unapply)("from", "to", "body", "recipient")
}

/*
case class EmailBody(body: String)
case class Recipient(name: String)
case class Email(from: FromAddress, to: ToAddress, body: EmailBody, recipient: Recipient)
 */

object FromAddress {
  implicit def FromAddressCodecJson: CodecJson[FromAddress] =
    casecodec1(FromAddress.apply, FromAddress.unapply)("address")
}

object ToAddress {
  implicit def FromAddressCodecJson: CodecJson[ToAddress] =
    casecodec1(ToAddress.apply, ToAddress.unapply)("address")
}

object EmailBody {
  implicit def EmailBodyCodecJson: CodecJson[EmailBody] =
    casecodec1(EmailBody.apply, EmailBody.unapply)("body")
}

object Recipient {
  implicit def RecipientCodecJson: CodecJson[Recipient] =
    casecodec1(Recipient.apply, Recipient.unapply)("recipient")
}

Parsing/decoding may fail for a number of reasons. As with the validation examples above, we the possibility of failure is encoded into our types. Specifically, we we decode, the value produced is an Either[String, WebForm] where the left side contains any error message in the case of failure and the right side contains the Email in the case of success. Again, the basic idea is simple but powerful: instead of pretending that we have the values we ultimately want even though we know that things may very well blow up, we simply encode the possiblity of failure into our type.

Using functional constructs we can deal with the “happy path” and deal with errors at the very end, instead of sprinkling error handling all over our code. We do this by decoding the form from json and then calling mkEmail. The function emailFromJsonForm returns a Validation[NonEmptyList[String], Email] with the list of any errors on the left and the Email on the right side.

At this point, we have no side-effecting functions. This may seem like a lot of boilerplate, but we’ve gained a lot from this. And the benefits are only increase as our program grows larger.

To see the benefits of this approach, consider an alternative program where functions don’t return meaningful values and are only called for their side effects.

def sendEmail(webForm: String): Unit = {
val form = deserialize[WebForm](parse(webForm))
val email = Email(from.from, form.to, form.body, form.recipient)
EmailService.sendEmail(email) //sends the email
logEvent(email) //writes record to the DB
}

This monolithic function mixes the data transformations with the side effect of sending an email. The Unit return type is completely opaque. It’s not a meaningful value that we can reason about. By returning Unit, we’re saying “Nothing to look at here..move along”. By definition the only way to test our program is to test the entire thing at once. Since Unit has no meaning, we can only test that our program does what we want by inspecting that the side effects seem to meet our requirements.

This kind of program becomes impossible to maintain, especially as it grows larger and we add more functionality. Reasoning about this program requires an ever-increasing amount of mental energy. With no meaningful types anywhere, we can’t add functionality without keeping everything in mind at once. This function also ignores the real possibility of errors occurring when deserializing to our WebForm type and leaves out any validations on our Email.

Contrast this to the program we’ve written where we have clearly defined types and transformations that return clear values which indicate the possibility of failure. We can now write tests using something like ScalaCheck to confirm that malformed email addresses, for example,are rejected and return failures. Furthermore, the scope of our tests is greatly diminished. Instead of writing tests that exhaustively check that the side effects of our program occur as we expect (since we can’t inspect anything directly), we write small functions that return meaningful values for which we can write strong tests that confirm the correctness of our code directly. With the monolithic and side-effecting approach, if the tests don’t pass, we’re not necessarily sure why. We’re left to fish through our code, trying to find the bug that caused the problem.

Even with all the problems in my monolithic sendEmail example, I’m still using more clearly defined types than a lot of programs that I often see. For example, a lot of people seem to use hashes as their data type for everything. Someone showed me an example where they deleted two entries in some hash where they had intended to only remove one. Their tests blew up and eventually they tracked down that the unintended deletion which caused the issue. The problem here is that a hash is the wrong type for almost anything in your program’s domain. All we know about a hash is that contains keys and values. That’s it. There’s no information there and no type (beyond simply the most general notion of a structure that contains keys/values). By designing a very specific model, encoding it into strong types, and using a compiler you can catch errors before things blow up at runtime.

I’ll either edit this post or add a part two that completes the picture, illustrating how to perform the effect of sending an email, how to do type safe database queryies and insertions, etc.

Why College is a Poor Choice for a Lot of People

The idea of college education is deeply ingrained in our culture. People seem to go down the college route almost impulsively and the correctness of this path is, for most, unquestionable.

People have many different motivations for going to college. Some common motivations include:

  1. A (vague) notion of “getting an education”
  2. Gaining a specific, marketable skill (also known as “getting a job”)
  3. Networking and connecting with others
  4. Studying specific areas of interest
  5. Getting drunk and having fun

Some people are just “school lifers”. They feel comfortable inside the walls of the university, they excel at gaming the system, whatever. I have nothing to say about these kinds of people nor do I intend to address all of the motivations I’ve listed above.

Instead, I’m addressing this post to a specific type of person. But it’s a type that I’ve been encountering with greater frequency recently.

I’m talking about the kind of person who has various passions — theoretical or practical — and wants to study and/or gain skills in their area(s) of interest. For this type of person, college is often a very poor choice. At the very least, college can be a very circuitous path to those goals; there are paths that are way more direct.

It’s not that college doesn’t include some things that can further these goals. It might, but only incidentally. And along with a bunch of stuff that has absolutely nothing to do with these goals. College also includes things that are diametrically opposed to them.

If you’re passionate about some theoretical area, consider that in a classroom setting everyone must advance at the same pace. As a student, if you’re not keeping up with the curriculum, you’re simply left behind and there aren’t second chances to circle back. Learning on a schedule that isn’t directly based on your progression of understanding makes no sense if the knowledge per se is what you’re interested in. It’d be like sitting down to study something and setting an alarm to let you know that it’s time to move on.

In college you have to take a bunch of classes that have nothing to do with the areas you’re either interested in from a theoretical or practical standpoint. A lot of this derives from a notion of the “well-rounded” citizen. Maybe there’s some merit to that idea. I don’t know. But the point is, I’m not talking to the person that’s interested in such a notion. Again, I’m talking about people that have clear interests and they want to advance themselves either in that field of knowledge or as a practical career path. From that perspective, every course you take that isn’t informed directly by what you’re looking to learn is a waste of time.

Related to this last point is that college has a pre-planned, specific length. It’s four years (in most places…as far as I know). If you’re interested in specific knowledge or skills, does that make sense? Hardly. For some things a person could be interested in, it might take 2 years to learn. Hell, maybe it’ll take two hours. Who knows? But college is a four-year commitment. If there are very concrete and specific things you’re hoping to learn, committing to four years from the outset is not a rational plan.

To my mind, all of the arguments in favor of college as a means of education per se are wrong. Some of the things you do in college will further the educational goal, but that doesn’t mean that college per se, as a wholesale endeavor makes any sense. This is my real point. College is a very specific kind of thing. It’s a certain style with a paced curriculum, it’s four years, it’s a variety of courses (many of which have nothing to do with your interests), etc. Once again, much of the substance and form of college (qua college) is unrelated to the learning you might be interested and is also in fundamental opposition to learning anything.

I saw this Hacker News post about a recent high school graduate who moved to the Bay Area and was looking to meet new people and make something of himself. I was perusing the comments and saw some folks debating whether the OP had made the right choice in deciding not to go to college. One comment was making some argument in favor of college when he stated:

Really, there’s no reason not to go

I’m not trying to pick on this particular comment (seems to be a common argument).But, from a purely logical perspective, the argument makes no sense. “There’s no reason not to…” is not a positive argument for anything. If I told someone “go do a bunch of carthweels; there’s no reason not to”, that’s not an argument that you should, in fact, go do carthweels.

The fact of the matter is that there are many good reasons not to go to college. Especially for the kind of person I’m talking about it. First, there’s all the stuff I said before about how college includes so much unrelated stuff.

I also mentioned that there’s a way more effective path to advancing in some area of study or skill. The more effective method is simply a la carte education. This is something that a lot of people are already doing. There’s Kahn Academy, Codecademy, Lynda.com, etc. Looking to become a designer? You can pay $25/month for access to premium content that will teach you how to use photshop, balsamiq or whatever other software you want to learn. Want to learn programming? Ditto. I can’t tell you how many programmers I’ve met who’ve done CS degrees who tell me that they didn’t learn anything about how to write code in college. Category theory something that excites you? I was just talking to a guy the other day (mind you this is a guy who’s in school studying some highly theoretical stuff) who recommended a ton of freely available resources for learning cateogry theory. Interestingly, even for people who are in some form of higher education, for the areas they’re really passionate about, they’re getting their learning material from outside of the university setting and engaging in a process of self-study. Again, a la carte education. At your own pace. At a fraction of the cost of attending university.

I realize that there’s no one-size-fits-all approach and that there may very well be special cases where, for practical or other reasons, going to college or doing a PhD or whatever is the only option. I get that. But that’s kind of my point. The question of whether to attend college should be approached with the same open-mindedness that we approach any other life decision. No bias, no presumptions. If it turns out that college is the only viable option for what you’re looking to do, then by all means. But I’d argue that for the group of people I’m talking about, which I think is a rather large group (and increasing), college is simply not the most effective way to go about achieving their goals.

The world has changed dramatically in the last few decades. In many ways, parents of college-aged kids have prepared them for a world that no longer exists. With the internet democratizing everything, most importantly access to knowledge, there are very real and very practical ways to gain advanced skills and knowledge more directly and cheaper than going to college.

I believe that this old-style, formal education that we find in college and graduate programs will become an increasingly niche endeavor reserved for the few for whom it makes sense for one reason or another. For the majority of people looking to expand their knowledge and skills, college is simply no longer the best option.

Discussion on Hacker News here