Introductory TyDD in Scala: Deconstructing complex Types

These types are ugly and cumbersome; they are not at all human readable. How do we get data out of such a type and format it in a way that is useful to the operator? Let's again start with a naive example.

def stringify1[A, B, C](
  fa: A => String, fb: B => String, fc: C => String,
  in: List[(A, (B, C))]): String = {
  in.map{case (a, (b, c)) =>
    fa(a) + ", " + fb(b) + ", " + fc(c)
  }.mkString("(", "; ", ")")
}

Here we take a List of nested pairs and return a string that is hopefully more human readable than the toString method would provide.

Abstract the F

Like before, we will abstract the F from our function that it may be used with any type constructor of arity 1 rather than hard-coded to List.

import cats.Functor
val functorList = new Functor[List]{
  override def map[A, B](fa: List[A])(f: A => B): F[B] =
    fa.map(f)
}
def stringify2[F[_]: Functor, A, B, C](
  fa: A => String, fb: B => String, fc: C => String,
  in: F[(A, (B, C))]): String = {
  val F = implicitly[Functor[F]]
  F.map(in){case (a, (b, c)) =>
    fa(a) + ", " + fb(b) + ", " + fc(c)
  }
  ???
}

The cats library provides a nifty Functor typeclass for us so, we can abstract the map call pretty easily. What of the mkString? As it turns out, cats provides a typeclass for this as well! It is called Show; let's see how it works.

import cats.Show
def stringify3[F[_]: Functor, A, B, C](
  fa: A => String, fb: B => String, fc: C => String,
  in: F[(A, (B, C))])(implicit
  FS: Show[F[String]]): String = {
  val F = implicitly[Functor[F]]
  val result = F.map(in){case (a, (b, c)) =>
    fa(a) + ", " + fb(b) + ", " + fc(c)
  }
  FS.show(result)
}

And extending this idea of Show to the Function1 instances we have

def stringify4[F[_]: Functor, A: Show, B: Show, C: Show](
  in: F[(A, (B, C))])(implicit
  FS: Show[F[String]]): String = {
  val F = implicitly[Functor[F]]
  val fa = implicitly[Show[A]].show _
  val fb = implicitly[Show[B]].show _
  val fc = implicitly[Show[C]].show _
  val result = F.map(in){case (a, (b, c)) =>
    fa(a) + ", " + fb(b) + ", " + fc(c)
  }
  FS.show(result)
}

Abstracting over Arity

We'll attempt to build a recursive version of this function using the same principled we've used in previous posts.

def stringify5[F[_]: Functor, A: Show, B: Show](
  in: F[(A, B)])(implicit
  FS: Show[F[String]]): String = {
  val F = implicitly[Functor[F]]
  val fa = implicitly[Show[A]].show _
  val fb = implicitly[Show[B]].show _
  val result = F.map(in){case (a, b) =>
    fa(a) + ", " + fb(b)
  }
  FS.show(result)
}

This is not what we want! We need to recurse on the Show instance inside the Functor. Let's make a recursive function for Show. This will follow the code we read from Shapeless very closely.

implicit def makeShow[A: Show, B: Show]: Show[(A, B)] = {
  val fa = implicitly[Show[A]].show _
  val fb = implicitly[Show[B]].show _
  new Show[(A, B)]{
    override def show(t: (A, B)): String = {
      val (a, b) = t
      "(" + fa(a) + ", " + fb(b) + ")“
    }
  }
}

This says, Given any two Show instances, a Show instances for their pair can be produced. Alternatively, it can be said that given a proof for Show[A] and a proof for Show[B] a proof for Show[(A, B)] follows.

So, now we have the following stringify function:

def stringify[F[_]: Functor, A: Show](
  in: F[A])(implicit
  FS: Show[F[String]]): String = {
  val F = implicitly[Functor[F]]
  val fa = implicitly[Show[A]].show _
  val result = F.map(in)(fa)
  FS.show(result)
}

Notice how simple our code has become. All of the specific type information has been absorbed into the recursive implicit functions. This is simply the Show instance for Functor itself. Furthermore, our makeShow function will produce Show instances for any nesting of Tuple2 instances; it generates Show instances for binary trees. The makeShow function is 10 lines of code (even including the Scala boilerplate) and gave us a giant boost in usefulness for our library code.

What's the Point?

Let's see an example of how this can be used. Given our individual Show instances:

implicit val ShowListString = new Show[List[String]]{
  def show(in: List[String]): String =
    in.mkString("(", "; ", ")")
}
implicit val showInt = new Show[Int]{
  override def show(in: Int): String = in.toString}
implicit val showLong = new Show[Long]{
  override def show(in: Long): String = in.toString}
implicit val showString = new Show[String]{
  override def show(in: String): String = in}
implicit val showDouble = new Show[Double]{
  override def show(in: Double): String = f"$in%.2f"}
implicit val showFloat = new Show[Float]{
  override def show(in: Float): String = f"$in%.2f"}
implicit val showArrayByte = new Show[Array[Byte]]{
  override def show(in: Array[Byte]): String = new String(in)}

Our writer and reader code becomes:

//Write the thing with our writer
val result = implicitly[Result1]
//Read the thing back with our reader
println(stringify(result))

For anyone to create an application which reads and writes data with our library they need only define the specific business logic and types for their application. If they miss writing an instance, the compiler will tell them. If they rearrange the order of the types in IO, the compiler will figure it out for them. Many common pain points of writing data readers and writers have been taken care of by the library implementor giving the rest of the team the ability to focus on business logic.

The goal here is not to have an entire company of developers who write this kind of code. Just like the goal of any business is to have employees who each bring something new to the team, the goal here is to have a few developers who write these libraries that other developers can use to develop business applications. The benefit of this is the libraries help limit the kinds of errors that can make it to production; the entire production cycle gets a confidence boost.

Introductory TyDD in Scala: Writing Type Level Functions

Type level functions consume types and produce types. This allows us to tell the compiler exactly what we want in our code that errors may be caught at compile time rather than at later stages in our product deployment cycle. The true power of this approach lies in the fact that with Scala no application can be produced without compilation. No matter how undisciplined a developer may be, she cannot skip compilation. Moving error checking into compilation gives us more confidence in the performance of our binary than confidence gained by any other means. In short, when there is a failure in production we always ask "were the tests run?" but never do we ask "was it compiled?"

A Basic Zipper

Here we have a zipper.

def zipper[A, B, C](
  a: List[A], b: List[B], c: List[C]
): List[(A, (B, C))] = a.zip(b.zip(c))

From previous posts in this series we know this can be generalized with a type class. Let's exercise this muscle immediately.

trait Zip[F[_]]{
  def apply[A, B](a: F[A], b: F[B]): F[(A, B)]
}
def zipper[F[_]: Zip, A, B, C](
  a: F[A], b: F[B], c: F[C]): F[(A, (B, C))] = {
  val F = implicitly[Zip[F]]
  F(a, F(b, c))
}

This is nicer than the first version but it is still super restrictive. It only works for zipping exactly 3 values. If we want to zip 2 or 4 or 70 values, we are out of luck! We saw how the shapeless HList allowed us to compose an arbitrary number of arbitrary types into a single type. Let's try to use the same kinds of methods here to produce a zipper of arbitrary arity.

Step 1 - Simplify as much as is possible

We will simplify a zipper to the purest form we can. The purest zipper takes two instances of a particular type constructor and produces a single instance of that type constructor on a pair. Let's write that.

def zipper[F[_]: Zip, H, T](
  h: F[H], t: F[T]): F[(H, T)] = {
  val F = implicitly[Zip[F]]
  F(h, t)
}

Now, we don't need separate functions for different arity versions of a zipper. We can simply call this function recursively to produce the desired result.

val (list1, list2, list3, list4, list5, list6) = ...
implicit val ZipList = new Zip[List]{
  override def apply[A, B](
    a: List[A], b: List[B]): List[(A, B)] = a.zip(b)
}
val with2 = zipper(list1, list2)
val with3 = zipper(list1,
            zipper(list2, list3))
val with6 = zipper(list1,
            zipper(list2,
            zipper(list3,
            zipper(list4,
            zipper(list5, list6)))))

If we recall the shapeless HList code learned to read, we see the same pattern here of a recursive type being produced from recursive calls. This can be done in the compiler using implicits.

Step 2 - Replace explicit recursion with implicit

implicit def zipper[F[_]: Zip, H, T](implicit
  h: F[H], t: F[T]): F[(H, T)] = {
  val F = implicitly[Zip[F]]
  F(h, t)
}

This is the same code just the keyword implicit is placed in two locations. In scala we can promote an explicit call to an implicit call simply by adding a keyword. We inform the inputs with implicit so the compiler knows to find the head and tail by itself. We inform the function with implicit so the compiler knows to call the function implicitly if a value of the necessary type is not found.

We communicate our intent to the compiler with implicits and types. Type aliases help simplify business logic.

type F[A] = List[A]
type Result =
  F[
    (Int, (Long, (String, (Double, (Float, Array[Byte])
  ))))]

To tell the compiler which values it can use during the application of functions we inform the values as implicit

implicit val list1: List[Int] = ???
implicit val list2: List[Long] = ???
implicit val list3: List[String] = ???
implicit val list4: List[Double] = ???
implicit val list5: List[Float] = ???
implicit val list6: List[Array[Byte]] = ???

The implicitly function tells the compiler what it needs to execute.

implicitly[Result]

And that's it! The compiler assembles the recursive calls for us. No more errors from placing things in the wrong order; refactoring is as simple as rearranging the order of the types in our alias Result. One change on one line propagates throughout the entire code base automatically.

Why stop at Tuple2?

There is nothing here that requires that a tuple be used. In fact, the only thing we need is a type constructor of arity 2. We can express as

trait ZipG[F[_], G[_, _]]{
  def apply[A, B](a: F[A], b: F[B]): F[G[A, B]]
}
implicit def zipper[F[_], G[_, _], H, T](implicit
  F: ZipG[F, G], h: F[H], t: F[T]): F[G[H, T]] = {
  F(h, t)
}

And we can zip Tuple2s or Eithers by creating type class instances

implicit val zipListTuple2 = new ZipG[List, Tuple2]{
  override def apply[A, B](
    a: List[A], b: List[B]): List[(A, B)] = a.zip(b)
}
implicit val zipListEither = new ZipG[List, Either]{
  override def apply[A, B](
    a: List[A], b: List[B]): List[Either[A, B]] =
    for{a <- a; b <- b}yield{
      if(a.toString.size < b.toString.size) Left(a)
      else Right(b)
    }
}

For Tuple2 we have business logic like

type F[A] = List[A]
type Result =
  F[
    (Int, (Long, (String, (Double, (Float, Array[Byte])
  ))))]

implicit val list1: List[Int] = ???
implicit val list2: List[Long] = ???
implicit val list3: List[String] = ???
implicit val list4: List[Double] = ???
implicit val list5: List[Float] = ???
implicit val list6: List[Array[Byte]] = ???

implicitly[Result]

This is the same as before. Commonly, further abstractions at the type level have little or no effect on code at the value level. Abstractions should allow the code to be more expressive than it was prior to the abstraction exercise never less expressive.

Changing our business logic to use Either as our Zipping class is simple

type F[A] = List[A]
type Result =
  F[
    Either[Int, Either[Long, Either[String,
    Either[Double, Either[Float, Array[Byte]]
  ]]]]]

implicit val list1: List[Int] = ???
implicit val list2: List[Long] = ???
implicit val list3: List[String] = ???
implicit val list4: List[Double] = ???
implicit val list5: List[Float] = ???
implicit val list6: List[Array[Byte]] = ???

implicitly[Result]

This is very powerful. By changing our type aliases, we were able to entirely change the meaning of our business logic without complex refactorings. As long as there are the correct type classes in implicit scope, the business logic need not be bothered by those implementation details.

What we have here is a sort of data pipeline and writer. Now that we have formatted data that we can work with in code, how do we present that data to an operator? Next, we'll write a reader for our types.

 

Introductory TyDD in Scala: Reading Type Level Functions

This is shapeless' HList

sealed trait HList extends Product with Serializable

final case class ::[+H, +T <: HList](head : H, tail : T) extends HList {
  override def toString = head match {
    case _: ::[_, _] => "("+head+") :: "+tail.toString
    case _ => head+" :: "+tail.toString
  }
}

sealed trait HNil extends HList {
  def ::[H](h : H) = shapeless.::(h, this)
  override def toString = "HNil"
}

case object HNil extends HNil

The important thing to note is it is very similar to a standard scala List just at the type level. It has a head and tail as type parameters and the tail is itself an HList. The implementation details are not important for our purposes here. All we need to take from this is an HList is a recursive list (head, tail) which is either empty (HNil) or whose elements can be of different types.

A Type Level Operation on HList

This is a function defined in shapeless which helps client code work with HLists

trait Mapped[L <: HList, F[_]] extends Serializable {
  type Out <: HList
}
object Mapped {
  …
  type Aux[L <: HList, F[_], Out0 <: HList] =
    Mapped[L, F] { type Out = Out0 }
 implicit def hnilMapped[F[_]]: Aux[HNil, F, HNil] =
    new Mapped[HNil, F] { type Out = HNil }
  …
  implicit def hlistMapped1[
  H, T <: HList, F[_], OutM <: HList](implicit
  mt: Mapped.Aux[T, F, OutM]): Aux[H :: T, F, F[H] :: OutM] =
    new Mapped[H :: T, F] { type Out = F[H] :: OutM }
}

We can see all the parts we mentioned in the first post of this series. Recall:

  1. Keywords
  2. Input types
  3. Output types
  4. Body

Note there are multiple implicit def declarations. The modifiers implicit def are similar to case statements in value function bodies. Let's take this piece by piece.

The First Case

implicit def hnilMapped[F[_]]: Aux[HNil, F, HNil] =
new Mapped[HNil, F] { type Out = HNil }

When focusing on the types, the intent becomes clear

implicit def hnilMapped[F[_]]: Aux[HNil, F, HNil] =
  new Mapped[HNil, F] { type Out = HNil }

This reads something like: Given a type constructor, yield the empty HList. Simple enough! This stuff is comprehensible afterall!

The Second Case

implicit def hlistMapped1[
H, T <: HList, F[_], OutM <: HList](implicit
mt: Mapped.Aux[T, F, OutM]): Aux[H :: T, F, F[H] :: OutM] =
  new Mapped[H :: T, F] { type Out = F[H] :: OutM }

This is admittedly more complex. Let's take it in stride again focusing on the types.

implicit def hlistMapped1[
H, T <: HList, F[_], OutM <: HList](implicit
mt: Mapped.Aux[T, F, OutM]): Aux[H :: T, F, F[H] :: OutM] =
  new Mapped[H :: T, F] { type Out = F[H] :: OutM }
  • Given a head, a tail which is an HList, a type constructor and another HList
  • Also, given proof that this type class holds good for our tail
  • We can generate an HList where our type constructor is applied to our head and the tail follows

A brief Word on Induction

Many of the type level functions encountered in the wild are of this type. Where there is at the very least

  1. a trivial case which produces some terminal element
  2. a more complex case where given a type or arity n can produce a type of arity n+1

In other words, recursive types can be generated inductively assuming you have an instance of the recursive type and all the necessary parts to wrap that instance within the target type.

In the next post we'll create some type level code using these methods.

Introductory TyDD in Scala: Basic Type Class Development

(A more complete treatment of type classes and higer kinds.)

A simple Type Class

Take the following

trait Mapping[A, B]{
  def map(a: A): B
}

and an instance for it

val mapping: Mapping[List[Int], List[String]] =
  new Mapping[List[Int], List[String]]{
    override def map(a: List[Int]): List[String] =
      a.map(_.toString)
  }

This instance is super restrictive. It only works for taking Int into String. We want to map a List of any type. Since we know what our type parameters are, we can achieve our goal by passing in a function

trait ListMapping[A, B]{
  def map(list: List[A])(f: A => B): List[B]
}

So, given a List[A] and a function, A => B, we can get a List[B]. And by taking the type parameters from the trait definition and placing them onto the function definition, we can squeeze out a bit more freedom.

trait ListMapping{
  def map[A, B](list: List[A])(f: A => B): List[B]
}
val mapping: ListMapping =
  new ListMapping{
    override def map[A, B](a: List[A])(f: A => B): List[B] =
      a.map(f)
  }

Now, why would anyone ever do this? The List type provides a map function which does exactly this. With this approach one may provide any number of methods for mapping a list. Take this alternative:

val reverseMapping: ListMapping =
  new ListMapping{
    override def map[A, B](a: List[A])(f: A => B): List[B] =
      a.reverse.map(f)
  }

Through type classes, we can define new functionality for old data structures on the fly. Similar code can be written for sort order, string formatting or just about anything else.

Making things even more general

While, the ability to map Lists of any type in any number of ways is fairly abstract it is not abstract enough for our purposes. What if we want to map a different data structure such as an Option or a Stream or a spark Dataset?

Luckily, Scala has a language feature which can help us out here.

trait WithMap[F[_]]{
  def map[A, B](m: F[A])(f: A => B): F[B]
}

The type parameter, F[_], has a type parameter of _, this tells the compiler that our type parameter itself requires a type parameter. Notice in our definition all mention of List has been replaced by our parameter, F. This just says that given a type, F, which itself takes a type parameter, we can change the inner type or F without changing F. We can do this with any parameterized type of arity 1.

implicit val listWithMap = new WithMap[List]{
  override def map[A, B](m: List[A])(f: A => B): List[B] =
    m.map(f)
}
implicit val optionWithMap = new WithMap[Option]{
  override def map[A, B](m: Option[A])(f: A => B): Option[B] =
    m.map(f)
}
implicit val streamWithMap = new WithMap[Stream]{
  override def map[A, B](m: Stream[A])(f: A => B): Stream[B] =
    m.map(f)
}
val reverseListWithMap = new WithMap[List]{
  override def map[A, B](m: List[A])(f: A => B): List[B] =
    m.reverse.map(f)
}


With these techniques we can define super polymorphic functions. Take this pretty strinfigy function

def prettyString[F[_]: WithMap, A](m: F[A])(f: A => String): String = {
  implicitly[WithMap[F]].map(m)(f).toString
}

This takes two type parameters, F[_]: WithMap and A. The `:` character in the first type parameter tells the compiler that it needs an implicit instance of WithMap defined for our type F.

And here is a data processor defined in the same way

def processData[F[_]: WithMap, A, B, C, D](
  m1: F[A])(f1: A => B)(f2: B => C)(f3: C => D): F[D] = {
  val F = implicitly[WithMap[F]] 
  val m2 = F.map(m1)(f1)
  val m3 = F.map(m2)(f2)
  F.map(m3)(f3)
}

We have taken an implementation detail (the map function on List, Option, etc...) and brought it outside the type. This has given us the ability to talk about data which has a sensible map function without knowing what that data necessarily looks like.

Next we'll learn how to read some of the Type Level functions that exist in the shapeless library.

Introductory TyDD in Scala: Anatomy of a Type Level Function

Value Level Functions

A value level function typically looks like

def f(a: Int, b: Int): String = {
  a.toString + b.toString
}

The key parts are

  1. A keyword: def
  2. Inputs: a: Int, b: Int
  3. Outputs: String
  4. Body: a.toString + b.toString

When you see these 4 parts, you know you are reading a value level function. Nothing surprising here. Now, let's see what a similar definition looks like at the type level.

Type Level Functions

A type level function looks like

trait MyTrait[A, B]{type Out}
object MyTrait{
  def apply[A, B, C](): MyTrait[A, B]{type Out = C} =
    new MyTrait[A, B]{override type Out = C}
}

In this definition there is a type refinement, MyTrait[A, B]{type Out = C}. These are undesirable artifacts of type level development. To simplify these definitions we use the Aux alias (a document about on this). Aux helps us remove type refinements from our logic.

A type level function looks like

trait MyTrait[A, B]{type Out}
object MyTrait{
  type Aux[A, B, C] = MyTrait[A, B]{type Out = C}
  def apply[A, B, C](): Aux[A, B, C] =
    new MyTrait[A, B]{override type Out = C}
}

The type refinement from the previous example is replaced by the nicer (more readable, fewer braces, less code) Aux type.

Type level functions have the same 4 key parts as value level functions

  1. Keywords: trait object
  2. Inputs: A, B
  3. Outputs: type Out
  4. Body: def apply[A, B, C](): Aux[A, B, C]

Here the inputs are type parameters and outputs are type members. This is so the output types are not erased and can be referenced later in business logic. This is similar to value level functions as the result of a value level function does not expose the inputs required by the function.

Bodies of type level functions are value level functions. They are typically only a few lines long. Their purpose is to present the compiler with a way to construct a new type from the types provided. This is what these blog posts will focus on.

Whenever you see a set of definitions which have these 4 qualities, you know you are looking at a type level function.

The type class is the fundamental element of this style of type driven development. The next post will give an overview of this concept.

Introductory TyDD in Scala

Here are the slides (pptx, pdf) and code from the PHASE talk on Type Driven Development in Scala. Long form blog post version is forthcoming.

  1. Anatomy of a Type Level Function
  2. Type Classes & Higher Kinds
  3. Reading Type Level Functions
  4. Writing Type Level Functions
  5. Reading Data that is this strongly typed

Abstraction in F[_]: Lift Implementation Details Outside the Class

We now know that we can lift implementation details outside the class to gain some freedom in the client code; we have gained abstraction through decoupling. Why not apply a similar abstraction on the rest of the code.

We had:

trait Pipeline[F[_], A, B]{
  final def apply(uri: URI)(implicit F: Functor[F]): F[Unit] = {
    val in = read(uri)
    val computed = F.map(in)(computation)
    F.map(computed)(write)
  }
  def read(uri: URI): F[A]
  def computation(in: A): B
  def write(in: B): Unit
}

And we can abstract it into:

trait Read[F[_], A] extends Function1[URI, F[A]]
trait Computation[A, B] extends Function1[A, B]
trait Write[B] extends Function1[B, Unit]

trait Pipeline[F[_], A, B]{
  final def apply(uri: URI)(implicit
    F: Functor[F],
    read: Read[F, A],
    computation: Computation[A, B],
    write: Write[B]): F[Unit] = {
    val in = read(uri)
    val computed = F.map(in)(computation)
    F.map(computed)(write)
  }
}

While we're at it, let's abstract out the apply method as well.

sealed trait Pipeline[F[_], A, B]{
  def apply(uri: URI): F[Unit]
}
object Pipeline{
  final def apply[F[_]: Functor, A, B](implicit
    read: Read[F, A],
    computation: Computation[A, B],
    write: Write[B]) = {
    val F: Functor[F] = implicitly
    new Pipeline[F, A, B]{
      override def apply(uri: URI): F[Unit] = {
        val in = read(uri)
        val computed = F.map(in)(computation)
        F.map(computed)(write)
      }
    }
  }
}

Now our trait is a simple expression of inputs to outputs. But why would we ever perform this kind of abstraction? It seems like we are complicating the problem, not making it simpler.

 

The benefits here are more subtle and only appear in certain use cases. Say you have two pipelines:

val p1: Pipeline[Stream, Int, String] = ???
val p2: Pipeline[Stream, String, Array[Char]] = ???

And you wanted to combine them into a single

val pipeline: Pipeline[Stream, Int, Array[Char]] = ???

Taking two Pipeline instances and composing them is not a readily understood idea. However, taking two Function1 instances and composing them is very well understood. Notice we brought the functions read, compute and write outside the class as simple Function1 instances. Abstracting the Pipeline functions outside the trait provides the developer who is writing the client code with a clear and well understood method for composing multiple Pipelines.

This is still an incomplete implementation. We can see a path forward for composing any number of Pipelines whose computations can be composed but, how do we compose Pipelines who accept different inputs?

A simple switching mechanism

Say we have three Pipeline instances which require separate inputs.

val p1: Pipeline[Stream, ...] = ...
val p2: Pipeline[Stream, ...] = ...
val p3: Pipeline[Stream, ...] = ...

Our application would need to accept a URI and choose which pipeline (if any) should run it.

def perform(uri: URI): Stream[Unit] = {
if(uri.toString.contains("p1")) p1(uri)
else if(uri.toString.contains("p2")) p2(uri)
else if(uri.toString.contains("p3")) p3(uri)
else Stream()
}

This is a lot of boilerplate. Especially when you consider the number of Pipelines (for any successful business) is expected to increase. Let's unpack what we have and see if we can't abstract it into our Pipeline definition.

  1. Uniform Input URI is the input to ALL Pipeline instances
  2. Guards checking a URI against some
  3. Constant value defining a Pipeline for use in a Guard
  4. Default case in case the input matches no Pipeline

Our uniform input means we don't have to worry about which Pipeline can take what Types of values. This is already abstract enough.

We'll build a typeclass to model Guards and Constants associated with each pipeline.

trait Guard[-T]{
def name: String
}
sealed trait Pipeline[-T, A, B]{
def apply(uri: URI): F[Unit]
}
object Pipeline{
final def apply[T: Guard, F[_]: Functor, A, B](implicit
read: Read[F, A],
computation: Computation[A, B],
write: Write[B]): Default[T, F, A, B] = {
val G: Guard[T] = implicitly
val F: Functor[F] = implicitly
new Pipeline[T, A, B]{
override def apply(uri: URI): F[Unit] = ???
  }
}
}

We have an issue here. The last else case of our function returns an empty Stream. In the Pipeline object we don't know what our effect type is. We cannot return an empty version thereof. This problem takes me back to a talk given by Runar Bjarnason last year wherein he describes how when we liberate our types, we constrain our implementation and when we constrain our types we liberate our implementation. We have liberated all of our types here (except URI) leaving ourselves no room to implement what we need. So, we need to constrain a type that we may regain our ability to implement our function. Let's constrain our output type.

 
trait Guard[-T]{
  def name: String
}
sealed trait Pipeline[-T, A, B]{
type Out
  def apply(uri: URI): Out
}
object Pipeline{
  final def apply[T: Guard, F[_]: Functor, A, B](implicit
    read: Read[F, A],
    computation: Computation[A, B],
    write: Write[B]): Default[T, F, A, B] = {
    val G: Guard[T] = implicitly
    val F: Functor[F] = implicitly
    new Pipeline[T, A, B]{
  type Out = Either[Unit, F[Unit]]
      override def apply(uri: URI): Out = {
        val from = uri.toString
        if(from.contains(G.name)) Right{
          val in = read(uri)
          val computed = F.map(in)(computation)
          F.map(computed)(write)
        } else Left(())
      }
    }
  }
}

So our client code becomes

trait P1
trait P2
trait P3
implicit def guardP1 = new Guard[P1]{
  final override def name: String = "p1"
}
implicit def guardP2 = new Guard[P2]{
  final override def name: String = "p2"
}
implicit def guardP3 = new Guard[P3]{
  final override def name: String = "p3"
}
implicit def p1: Pipeline[P1, Stream, ...]
implicit def p2: Pipeline[P2, Stream, ...]
implicit def p3: Pipeline[P3, Stream, ...]
def perform(uri: URI): Either[Either[Either[Unit, Stream[Unit]], Stream[Unit]], Stream[Unit]] = {
  p1(uri).fold(
    _ => Left(p2(uri).fold(
      _ => Left(p3(uri).fold(
        _ => Left(()),
        a => Right(a)
      )),
      a => Right(a)
    )),
    a => Right(a)
  )
}

This has made things much worse. There is even more boiler plate and the nesting will become unreasonable in short order. But we have something we can easily reason about at the type level:

  • Given a known set of Pipeline instances
  • Created a computation which is at most 1 Pipeline
  • Resulting in a nested Data Structure

These characteristics indicate we can take an inductive approach to building our Pipeline library. Enter Shapeless.

 

Abstraction in F[_]: Abstract Your Functions

We have a reasonably abstract pipeline in

trait Pipeline[F[_], A, B]{
  final def apply(uri: URI): F[Unit] =
    write(computation(read(uri)))
  def read(uri: URI): F[A]
  def computation(in: F[A]): F[B]
  def write(in: F[B]): F[Unit]
}

Recognizing Higher-Kinded Duplication

Taking a close look at the trait, we see the computation and write functions are the same aside from their type variables. In fact, if we rename them to have the same name, the compiler complains

scala> :paste
// Entering paste mode (ctrl-D to finish)
trait Pipeline[F[_], A, B]{
  def perform(in: F[A]): F[B]
  def perform(in: F[B]): F[Unit]
}
// Exiting paste mode, now interpreting.
<console>:9: error: double definition:
method perform:(in: F[B])F[Unit] and
method perform:(in: F[A])F[B] at line 8
have same type after erasure: (in: Object)Object
         def perform(in: F[B]): F[Unit]

Since these are the same, we can build an abstraction to simplify our API even further.

trait Pipeline[F[_], A, B]{
  final def apply(uri: URI): F[Unit] = {
    val in = read(uri)
    val computed = convert(in)(computation)
    convert(computed)(write)
  }
  def read(uri: URI): F[A]
  def computation(in: A): B
  def write(in: B): Unit
  def convert[U, V](in: F[U], f: U => V): F[V]
}

We've removed the need for the developer to understand the effect type in order to reason about a computation or write step. Now, let's focus on this new function

def convert[U, V](in: F[U], f: U => V): F[V]

This is super abstract. Like so abstract it is meaningless without context. I am reminded of this video in which Rob Norris explains how he continued to abstract his database code until some mathematical principles sort of arose from the work. In this talk, he points out that anytime he writes something sufficiently abstract he checks a library for it, as he probably has not himself discovered some new basic principle of mathematics. We do the same here.

Inside the cats library we find the following def within the Functor class

def map[A, B](fa: F[A])(f: A => B): F[B]

This is the same as if we wrote our convert function as curried rather than multiple argument. We replace our custom function with one from a library; the chance is greater that a developer is well-informed on cats than our internal library. (post on implicits and type classes)

trait Pipeline[F[_], A, B]{
  final def apply(uri: URI)(implicit F: Functor[F]): F[Unit] = {
    val in = read(uri)
    val computed = F.map(in)(computation)
    F.map(computed)(write)
  }
  def read(uri: URI): F[A]
  def computation(in: A): B
  def write(in: B): Unit
}

Here we were able to replace an internal (thus constrained) implementation detail with an external (thus liberated) argument. In other words, we have lifted an implementation detail outside our class giving the client code freedom to use the same instantiated Pipeline in a multitude of contexts.

Abstraction in F[_]: Abstract your Types

A Pipeline of Stream from BigInt to String

Say we have a  data pipeline:

trait Pipeline{
  final def apply(uri: URI): Stream[Unit] =
    write(computation(read(uri)))
  def read(uri: URI): Stream[BigInt]
  def computation(in: Stream[BigInt]): Stream[String]
  def write(in: Stream[String]): Stream[Unit]
}

The limitations here are many. The most important limitation is this only works for data pipelines your team can model as streams of an input to a BigInt to a String to an output. This is not very useful. The first step is abstracting over your computation types.

A Pipeline of Stream

Removing the constraint on BigInt and String requires type parameters on our Pipeline trait:

trait Pipeline[A, B]{
  final def apply(uri: URI): Stream[Unit] =
    write(computation(read(uri)))
  def read(uri: URI): Stream[A]
  def computation(in: Stream[A]): Stream[B]
  def write(in: Stream[B]): Stream[Unit]
}

We have gained a bit of freedom in implementation. We can now write Pipelines that can be modeled as streams of an input to a A to a B to an output given any A and B. Now instead of being constrained to BigInt and String, we have gained some liberty through our abstraction.

Still, we are constrained to the scala Stream type. This, too, is a nuisance what if we require Pipelines that effect through fs2 Stream or spark Dataset or any other suitable effect? Similar to how we abstracted away from BigInt and String by making them type parameters A and B, we can do the same with our Stream.

A Pipeline

Using a higher-kinded type parameter, we can abstract over any effect assuming the effect has a single type parameter.

trait Pipeline[F[_], A, B]{
  final def apply(uri: URI): F[Unit] =
    write(computation(read(uri)))
  def read(uri: URI): F[A]
  def computation(in: F[A]): F[B]
  def write(in: F[B]): F[Unit]
}

Now, we can make a data Pipeline using any such types! We can have our original Pipeline of Stream from BigInt to String

val pipeline: Pipeline[Stream, BigInt, String] = ???

We can have a Pipeline of fs2 Stream with some type construction:

type MyStream[A] = fs2.Stream[fs2.Task, A]
val pipeline: Pipeline[MyStream, BigInt, String] = ???

We can even do this with spark

val pipeline: Pipeline[Dataset, BigInt, String] = ???

Any Pipeline your team can model as a read effect a computation and a write effect can be defined with Pipeline defined this way.

Abstraction in F[_]

I gave a talk at typelevel today about abstraction data pipelines and some ways to ease the use of Spark in purely functional application. It seemed to have gone pretty well, here you will find the video, deck (pptx, pdf) and code

In the coming weeks I'll be posting the long-form version of the talk.

  1. Abstract Your Types
  2. Abstract Your Functions
  3. Lift Implementation Details Outside the Class
  4. Implicits and Induction

Asynchronicity

(video, slides, code)

I gave a talk at PHASE (PHiladelphia Area Scala Enthusiasts) about Asynchronous programming. Specifically, we covered how to transform a basic linear and sequential computation (factorial) into one which could be processed in parallel. Moreover, we added an extra layer of complexity onto the computation (n choose k) to show how Actors and Futures in Scala can work together to help a developer build an asynchronous system.

5. Typeclasses over Subclasses (Introducing Typelevel Scala into an OO Environment)

(Examples can be found on GitHub)

In the previous post, we introduced the Argonaut library to convert between values and JSON strings. The important part of this conversion is there was no superclass or interface to implement in order to get the benefit of JSON across classes. All we needed to do was define values of type CodecJson for each of the types we wanted to convert. We added the functionality to the class without changing the class itself. 

Argonaut allowed us to call toJson on classes with a codec and decodeOption on Strings to produce values of classes with a codec defined. This type of polymorphism, where a function's implementation depends on its inputs is called ad-hoc polymorphism. Furthermore, when we define a type, T, which defines functionality across classes to be used in ad-hoc polymorphic functions we call T a Type Class. Type Class polymorphism is a specific flavor of ad-hoc polymorphism.

Type Class polymorphism is a powerful tool for expressing context based functionality far more powerful than subclass polymorphism. As a well-known example take the Java interfaces Comparable and Comparator. If some data is defined in a class which implements Comparable, it can be sorted one way and needs an entire second class definition to be sorted with a different method. On the other hand, using Comparator the data is defined with a single class and each sort method gets its own Comparator. Comparator is a Type Class and allows the developer to determine in which contexts which sorting method should be used.

Subclass Method

Take the following traits:

trait Adder[Type]{
  def add(other: Type): Type
}
trait Chainer[Arg, Type[Arg]]{
  def chain[Res](f: Arg => Type[Res]): Type[Res]
}

Adder describes how to add two values of some Type together. Chainer describes how to chain operations over a parameterized type.

We'll use the idea of a team to illustrate. For the sake of simplicity we say a team consists of people of a certain profession. So we can have a team of engineers or a team of doctors or a team of cashiers or ...

Teams can (trivially) grow by hiring but, they can also grow by combining with other teams. Teams can be added.

Teams can have members who are themselves team leads. At times, the members of a lead's team must join the team the lead belongs to. This implies an operation which develops teams out of the members of teams. Teams can be chained.

Here is our implementation of Team given this functionality:

case class Team[Type](members: List[Type])
  extends Adder[Team[Type]]
  with Chainer[Type, Team]{
  override def add(other: Team[Type]): Team[Type] = {
    Team(members ++ other.members)
  }
  override def chain[Res](
      f: Type => Team[Res]): Team[Res] = {
    val list = members.flatMap(member => f(member).members)
    Team(list)
  }
}

Simple enough but this doesn't account for an organization of structured teams. For an organization who develops teams that each have one product lead and one technical lead, simple concatenation won't maintain a soft ranking of individuals within the new team. We need a new Team definition which accounts for this.

case class TeamStructured[Type](members: List[Type])
  extends Adder[TeamStructured[Type]]
  with Chainer[Type, TeamStructured]{
  override def add(
      other: TeamStructured[Type]): TeamStructured[Type] = {
    val (lead1, indi1) = members.splitAt(2)
    val (lead2, indi2) = other.members.splitAt(2)
    TeamStructured(lead1 ++ lead2 ++ indi1 ++ indi2)
  }
  override def chain[Res](
      f: Type => TeamStructured[Res]): TeamStructured[Res] = {
    val (leaders, individuals) = members.map{member =>
      val mems = f(member).members
      mems.splitAt(2)
    }.unzip
    TeamStructured(
        leaders.flatMap {x=>x} ++
        individuals.flatMap{x=>x})
  }
}

Now we have two definitions for the same data that differ only by functionality. We have a triple coupling here:

  1. Data Definition
  2. Addition Description
  3. Chaining Description

If the data needs to change (from List to Set is a good place to start) the change needs to be made in two places. Each function which accepts a Team for the purpose of team composition and combination needs to know which style of team it needs at development time. These problems gets worse for each possibility for combining and chaining teams (maybe a round robin or reverse algorithm would fit in certain situations). Type Classes solve these issues.

Type Class Method

Our traits become:

//The underscore here implies we need a parameterized type.
trait Adder[Type[_]]{
  def add[Item](
      left: Type[Item], right: Type[Item]): Type[Item]
}
trait Chainer[Type[_]]{
  def chain[Item, Res](
      arg: Type[Item], f: Item => Type[Res]): Type[Res]
}

These have the same uses as their counterparts above. However we have a single definition of the Team type:

case class Team[Type](members: List[Type])

The data is defined in a single place. Each piece of software which requires a Team has a consistent idea about what a Team is and means. The two versions of functionality are defined by:

object unstructured{
  implicit def adder: Adder[Team] = new Adder[Team]{
    override def add[Item](
        left: Team[Item], right: Team[Item]): Team[Item] = {
      Team(left.members ++ right.members)
    }
  }

  implicit def chainer: Chainer[Team] = new Chainer[Team]{
    override def chain[Item, Res](
        arg: Team[Item], f: Item => Team[Res]): Team[Res] = {
      val list = arg.members.flatMap(
          member => f(member).members)
      Team(list)
    }
  }
}

object structured{
  implicit def adder: Adder[Team] = new Adder[Team]{
    override def add[Item](
        left: Team[Item], right: Team[Item]): Team[Item] = {
      val (lead1, indi1) = left.members.splitAt(2)
      val (lead2, indi2) = right.members.splitAt(2)
      Team(lead1 ++ lead2 ++ indi1 ++ indi2)
    }
  }

  implicit def chainer: Chainer[Team] = new Chainer[Team]{
    override def chain[Item, Res](
        arg: Team[Item], f: Item => Team[Res]): Team[Res] = {
      val (leaders, individuals) = arg.members.map{member =>
        val mems = f(member).members
        mems.splitAt(2)
      }.unzip
      Team(
          leaders.flatMap {x=>x} ++
          individuals.flatMap{x=>x})
    }
  }
}

Now, each function which accepts a team, if needed, will also accept an adder or chainer or both (wholly decoupled). The down side here is each call to such a function requires at least one extra argument from the subclass versions. Scala has a fix for this limitation.

Implicits

The implicit keyword before a definition is an important part of making Type Class polymorphism beneficial to the developer. The word implicit, according to Oxford Dictionaries, means Implied though not plainly expressed. In Scala it means we can prepend the implicit keyword to an argument list and not explicitly produce the value in code assuming a valid value is in scope. For example:

def chainTeams[Type, Result](
  team: Team[Type])(
  func: Type => Team[Result])(
  implicit chain: Chainer[Team]): Team[Result] = {
  chain.chain(team, func)
}

This has three arguments, the team to operate on, the operation to perform, and the chainer for application. However, since the final argument is implicit, if we bring a valid implicit value into scope, there is no need to pass it in directly.

import structured._
val team: Team[Person] = Team(List(???))
val func: Person => Team[Person] = {(p: Person) => ???}
val newTeam: Team[Person] = chainTeams(team)(func)//valid

Since, we don't need to explicitly state the Chainer it keeps boilerplate clean. A nice effect of implicit resolution is if you have scoped two separate valid values for the implicit argument, the compiler will complain. The suggestion if you have multiple valid implicits in scope is to decouple your code functionally. No single scope should have use of more than one implicit of the same type; this is a code smell. A corollary to this is one should not explicitly provide implicit arguments; let the compiler do its work.

In the final post of this series, we will introduce another library, Cats.

Monocle & Argonaut (Introducing Typelevel Scala into an OO Environment)

(Examples can be found on GitHub)

In the last post, we put to rest our use of mutable objects for good. Here we learn how to make use of our new found Functional Programming powers in the real world.

Every application of sufficient user base requires a persistent settings store. The more users one has, the more styles one is responsible for accommodating. In my experience, JSON has been the most useful format for small-scale persistence. JSON is widely understood, works on the web and is plaintext. We'll use Monocle and Argonaut to implement a persistent settings store.

The first question here is "Why not circe?". I found circe to be a bit too ethereal for most Java developers to wrap their heads around. Implicit scope (especially when its as magical as circe's auto) is an alarming feature for people who come from a language with no developer-defined implicit semantics (C++ developers are quite comfortable with this notion).

Monocle

The Monocle library provides semantics for defining simple accessors and combinators on nested data. Monocle is especially well suited for handling nested Case Classes which is what we'll focus on. Take the following data definition

case class Color(r: Byte, g: Byte, b: Byte)
case class FishTank(liters: Int, color: Color, fish: List[Fish])

We have nested Case Classes as well as a nested collection. To cover the changes that can occur here we would need to define:

  • eighteen operations
  • six of which are a composition from a Fish Tank into a Color
  • one of which is nested within a List structure.

Monocle makes this simple:

val (tankLiters, tankColor, tankFish) = {
  val gen = GenLens[FishTank]
  (gen(_.liters), gen(_.color), gen(_.fish))
}
val (colorR, colorG, colorB) = {
  val gen = GenLens[Color]
  (gen(_.r), gen(_.g), gen(_.b))
}
val (tankColorR, tankColorG, tankColorB) = (
  tankColor.composeLens(colorR),
  tankColor.composeLens(colorG),
  tankColor.composeLens(colorB)
)

Defining your data using Case Classes provides Monocle with the information it needs in order to generate lenses (nested views) into your data structures. Lens composition in Monocle is a single straightforward call. We can get into and out of our data with very little boilerplate.

Now that we can define settings and alter them, we need a way to persist them and communicate them to other parts of the system. We'll use JSON as our data format and Argonaut as our transcoder.

Argonaut

Argonaut provides semantics for converting between classes and JSON strings. Like Monocle, its easiest to use with Case Classes. Taking the same classes as above we would have:

//ignore the implicit keyword.
//I promise we'll get to it in the next post!
implicit def codecTank: CodecJson[FishTank] =
  casecodec3(
      FishTank.apply, FishTank.unapply
  )("liters", "color", "fish")
implicit def codecColor: CodecJson[Color] =
  casecodec3(
      Color.apply, Color.unapply
  )("r", "g", "b")
implicit def codecFish: CodecJson[Fish] =
  CodecJson(
    (f: Fish) =>
      ("name" := f.name) ->:
      ("color" := f.color) ->:
      jEmptyObject,
    (c: HCursor) => for{
      name <- (c --\ "name").as[String]
      color <- (c --\ "color").as[String]
    }yield{(name, color) match{
      case ("One Fish", "Red Fish") => OneFish
      case ("Two Fish", "Blue Fish") => TwoFish
      case _ => NotFish
    }}
  )

There are quite a few operators here. This could make things tricky for Java developers at first but, there are few of them so no big deal. For Case Classes, Argonaut has next to no boilerplate; one passes in the apply and unapply functions and names everything. Also, composition in Argonaut is implicit. It gets a little tricky with non Case Classes but, its still not much. At most Argonaut requires two functions; one from the class to JSON, the other from JSON to the class.

One more thing to note is that codecs for simple standard collections are implicit. The codec for List[Fish] is implicitly defined by the codec for Fish.

Putting it all together

A settings object would look something like:

object settings{
  private val settings: mutable.Map[String, FishTank] =
    mutable.Map()
  
  def apply(key: String): Option[FishTank] = settings.get(key)
  def update(key: String, byte: Byte): Unit = {
    settings(key) = settings.get(key) match{
      case Some(tank) =>
        tankColor.modify { _ => Color(byte, byte, byte) }(tank)
      case None =>
        FishTank(0, Color(byte, byte, byte), Nil)
    }
  }
  def update(key: String, size: Int): Unit = {
    settings(key) = settings.get(key) match{
      case Some(tank) =>
        tankLiters.modify(_ => size)(tank)
      case _ =>
        FishTank(size, Color(0,0,0), Nil)
    }
  }
  def update(key: String, fish:List[Fish]): Unit = {
    settings(key) = settings.get(key) match{
      case Some(tank) =>
        tankFish.modify(_ => fish)(tank)
      case _ =>
        FishTank(1, Color(0,0,0), fish)
    }
  }
  
  def persist(): Unit = {
    val jsonRaw = settings.toList.asJson
    val json = jsonRaw.nospaces
    putOnWire(json)
    writeToDisk(json)
  }
  def recall(): Unit = {
    val str = getFromDisk()
    val opt = str.decodeOption[List[(String, FishTank)]]
    opt.foreach{list =>
      settings ++= list.toMap
    }
  }
}

But, of course this is not threadsafe and it uses a mutable collection to perform its work. A different threadable implementation could look something like:

val actorSystem: ActorSystem = ???
implicit val timeout: akka.util.Timeout = ???
implicit val ec: ExecutionContext = ???
object asyncSettings{
  private sealed trait Message
  private case class Get(key: String)
      extends Message
  private case class SetGrey(key: String, hue: Byte)
      extends Message
      
  private class Perform extends Actor{
    override val receive: Receive = step(Map())
    def step(map: Map[String, FishTank]): Receive = {
        case Get(key) => sender ! map(key)
        case SetGrey(key, value) =>
          val newTank: FishTank = ???
          val newMap = map + (key -> newTank)
          context.become(step(newMap))
    }
    override def preStart(): Unit = ???//recall
    override def postStop(): Unit = ???//persist
  }
  
  val actor: ActorRef = actorSystem.actorOf{
    Props(new Perform())
  }
  def apply(key: String): Future[FishTank] =
    (actor ? Get(key)).collect{
      case Some(t: FishTank) => t
    }
  def update(key: String, hue: Byte) = 
    actor ! SetGrey(key, hue)
}

Here we use akka for asynchronous operations. One could also employ scalaz Task or simple Future composition or really any other asynchronous library.

Next we'll cover Type Classes to further decouple our data from functionality.

 

 

 

 

4. Objects are not Coroutines (Introducing Typelevel Scala into an OO Environment)

In the previous posts, we went over how to introduce immutability, combinators and case classes to move toward functional programming. These three points together are the basis for the point described in this point that Objects are not Coroutines.

If you are unfamiliar with coroutines, wikipedia has a basic description of them.

In Java, the usual application runs a little like this:

  1. Initialize an object
  2. Perform an operation
  3. Mutate the object
  4. Perform an Operation
  5. ...

This habit breaks all of the FP ideas we have developed so far.

When introducing Typelevel Scala, it is important to note we are not simply adding a library to an already existing system (the JVM). We are trying to change how people do their day to day work. Some of them have been writing OO Imperative software products for decades making a change of paradigm difficult. Keeping to simple language is key to our goal of shifting an organization's workflow.

The workflow shift we are suggesting here is:

  1. Define
  2. Apply Combinator
  3. ...

We will be moving from a paradigm based in mutability and state to one built on immutability and functions.

OO Imperative Style

We will be defining a school of fish and how that school of fish grows.

class BadSchool(){
  private var name: String = null
  private var depth: Depth = null
  private var location: Location = null
  private var fish: mutable.Buffer[Fish] = null
  def setName(newName: String): Unit = {
    name = newName
  }
  def getName(): String = name
  def setDepth(newDepth: Depth): Unit = {
    depth = newDepth
  }
  def getDepth(): Depth = depth
  def setLocation(newLocation: Location): Unit = {
    location = newLocation
  }
  def getLocation(): Location = location
  def setFish(newFish: mutable.Buffer[Fish]): Unit = {
    fish = newFish
  }
  def removeFish(aFish: Fish): Unit = {
    fish -= aFish
  }
  def addFish(aFish: Fish): Unit = {
    fish += aFish
  }
  def getFish(): mutable.Buffer[Fish] = fish
  override def toString(): String= {
    s"School(\n\t$name,\n\t$depth,\n\t$location,\n\t$fish)"
  }
}

In my experience, this is the type of code commonly written by people who have just made the jump from an OO language into Scala. Here we initialize the object's members to null and use mutable containers to maintain and augment state. A typical use case would look like:

def asCoroutine(): Unit = {
  val coroutine = new BadSchool()
  val (name, depth, location, fish) = someInit()
  coroutine.setName(name)
  coroutine.setDepth(depth)
  coroutine.setLocation(location)
  coroutine.setFish(fish)
  convertToJsonAndPutOnTheWire(coroutine)
  var newFish: Fish = null
  for(i <- (0 to 10)){
    newFish = nextFish(coroutine)
    coroutine.addFish(newFish)
    convertToJsonAndPutOnTheWire(coroutine)
  }
}

This application is difficult to follow and very cluttered. In order to create a new valid school of fish, one must initialize the object and call four set methods. When growing a school of fish, the developer needs to destroy the previous school of fish forcing any interaction with the object to be synchronous.

This is like a very messy coroutine. The addFish method is like a yield and there is no way to get back to the previously returned yield state.

Typelevel Style

The above impure code can be fairly simply converted to a more FP style. First we define our school of fish.

case class School(
    name: String,
    depth: Depth,
    location: Location,
    fish: immutable.Queue[Fish])

This is short and to the point. The intent of the code is clear and there are no messy methods defined for maintaining, getting and mutating state. The same use case would be implemented functionally like:

def aBetterWay(): Unit = {
  @annotation.tailrec
  def perform(qty: Int, acc: List[School]): List[School] = {
    if(qty > 0 && acc.nonEmpty){
      val head :: tail = acc
      val currentFish = head.fish.last
      val next = nextFish(currentFish)
      val result = head.copy(fish = head.fish.enqueue(next))
      perform(qty - 1, result :: acc)
    }else acc
  }
  val school = School(
      "Bikini Bottom",
      Deep,
      South,
      immutable.Queue(OneFish))
  val result = perform(10, List(school))
  result.foreach(convertToJsonAndPutOnTheWire)
}

There are two main ideas:

  1. In lieu of initializing state we define data. 
  2. Once data is defined, it cannot be redefined.

The function buildSchool builds a new school of fish from a provided school of fish. State is created, never destroyed, and there is no initialization step. Moreover, the webservice call can be made asynchronous without worry for synchronization or heap issues.

Next, we'll introduce our first libraries: Monocle and Argonaut.

 

3. Case Classes and Auto-Encapsulation (Introducing Typelevel Scala into an OO Environment)

(Examples can be found on GitHub)

We have decoupled data from its usage by replacing more complicated control flow ideas (like loops null and throw) with combinators by passing functions as parameters.  This allows us to think differently about how we write programs.

Instead of thinking about software as objects acting on themselves; we can start to think of it as two parts:

  1. Data
  2. Functions which define Data flow

To illustrate, we'll redefine our color function as data.

sealed trait Fish{
  val name: String
  val color: String
}
case object OneFish extends Fish{
  override final val name: String = "One Fish"
  override final val color: String = "Red Fish"
}
case object TwoFish extends Fish{
  override final val name: String = "Two Fish"
  override final val color: String = "Blue Fish"
}
case object NotFish extends Fish{
  override final val name: String = "Ahab"
  override final val color: String = "White Whale"
}

Here, we model our fish data in three points; two good states and a bad state. Instead of modeling a two point set of data with Strings, we model it as three discrete states using case classes.

Case Classes are automatically encapsulated and immutable by default and, as the states are encoded into the types, there is no need to use null or throw.

At this point, we have built up a nice base for functional programming. We can define robust data structures which model our problem space very closely and functional constructs which allow us to transform that data. This brings us to the salient feature of this process: Objects are not Coroutines.

2. Combinators over loops, null & throw (Introducing Typelevel Scala into an OO Environment)

(examples can be found on GitHub)

In the previous section we found ways to transform mutable thinking into the immutable. Here we will take this one step further and introduce combinators into the organization.

A combinator is a method on a data structure. This method takes a function as argument and maps values of the data structure into other values of the data structure. This fits well with our immutable approach. There are four main ideas here:

  1. Functions produce new state; they do not destroy old state.
  2. Methods on structures are functions.
  3. Data and usage should be separate.
  4. Bad state should be handled as early as possible.

OO Imperative Style

When developers first get into Scala, it is pretty typical to see this kind of code:

class BadFish(
    private var m_name: String,
    private var m_color: String
){
  def this() = this(null, null)
  def getName(): String = m_name
  def getColor(): String = m_name
  def setName(name: String){
    m_name = name
  }
  def setColor(color: String){
    m_color = color
  }
  def isValid(): Boolean = try{
    check()
    true
  }catch{
    case _: IllegalArgumentException => false
  }
  def check(): Unit = {
    check(m_name, m_color)
  }
  def check(newName: String, newColor: String){
    if(!color(newName).equals(newColor))
      throw new IllegalArgumentException(
          "Fish color and name do not match"
      )
  }
}

The use case for this data is very tightly coupled to the data itself. In order to create a new value based on this value one must make at least 5 calls; 2 get calls, a new and 2 set calls:

val current: BadFish = ...
val (name, color) = (current.getName(), current.getColor())
val next = new BadFish()
next.setName(name)
next.setColor(color)

This puts a burden on any user of this class. One must know the entire structure of the class and the precise method calls for access in order to use it.

The set methods destroy old state to provide new state and the state of the object is not validated until a check method is envoked. When any one reference to this object has a set method called on it, the object could be made invalid for all references without any indication. It is very difficult to follow the logical flow of data through an application full of these kinds of classes.

This class also initializes values to null and provides semantics for its own data being overwritten. A null value is not a value of any type; its essentially the type system lying to the developer.

Typelevel Style

The same functionality can be provided by the following class:

class Fish(val fishName: String){
  val fishColor: String = color(fishName)
  def spawnFish(f: String => String): Fish = {
    new Fish(f(fishName))
  }
}

First thing's first. Its much shorter than the other snippet. The brevity affords a great deal of clarity about the class and intended usage.

It has a single input rather than two; the second variable is dependent upon the first for a valid state. Bad state is found at construction time. there is no longer any need to throw an exception in the class based on the class' own data. Everything is handled up front.

The single function doesn't destroy the state of the object, it only creates a new object based on the data in the current object. This allows us to make the data immutable as well as give the user of the class all the flexibility she needs for instantiation and processing of objects.

Next we'll use Case Classes to encapsulte our data and define it as immutable by default.

1. Immutability as Default (Introducing Typelevel Scala into an OO Environment)

(examples can be found on GitHub)

The first point is using Immutability as Default.

What does "as default" mean? We are not barring mutability from applications wholesale; there are practical reasons for using mutability. For instance,

  • performance
  • global settings.

However, the mutable code should be bounded by its defining scope. This idea can be captured with four rules of thumb:

  1. Function inputs are immutable.
  2. Function outputs are immutable.
  3. var and collection.mutable values are local and temporary.
  4. Function return values are placed into a val.

OO Imperative Style

First, we have a helper function for our examples.

def color(str: String): String = {
  str match{
    case "One Fish" => "Red Fish"
    case "Two Fish" => "Blue Fish"
  }
}

This snippet defines an imperative style One Fish Two Fish Red Fish Blue Fish. Note that we have two valid data points

  1. ("One Fish", "Red Fish")
  2. ("Two Fish", "Blue Fish")

Yet, we model it with a function from String to String which is a space much larger than 2 points. Also, it throws an exception on bad input; the function has a return type of String which is a lie since a thrown exception is a very different result from a String. I found code like this to be very common in the OO imperative world. In the coming posts we will seek to replace this with something better.

Some typical code which breaks our general rules looks a little like:

def bad(): mutable.Buffer[String] = {
  val fish = mutable.Buffer[String]()
  var one = "One Fish"
  var two = "Two Fish"

  fish.append(one)
  fish.append(two)

  one = color(one)
  two = color(two)

  fish.append(one)
  fish.append(two)
  
  fish
}

Here, mutable inputs or outputs can very easily poison threaded data in an application. This makes it difficult to follow the data through an application and reason about control flow.

Also, the vars one and two are used to first hold constant data then hold the result of a function call. The functional assignment to a var has similar negative cognitive affects as mutable arguments and return values. They needlessly complicate control flow.

Typelevel Style

The typelevel way to write the same functionality would be close to:

def better(): List[String] = {
  val one = "One Fish"
  val two = "Two Fish"

  List(
      one,
      two,
      color(one),
      color(two)
  )
}

The function is self contained. No mutable state inside the function escapes its defining scope and all function calls are nested readably within other function calls. The important part is the intent of the function is clear. Also, any threading performed around it is safe from accidental data poisoning; no messy synchronization calls are necessary.

Now that we have a foundation of immutability, we'll add combinators to our toolset.

Introducing Typelevel Scala into an OO Environment

These posts seek to describe a process by which a Scala dev can reasonably (responsibly) introduce Functional Programming (FP) in Scala into an Object Oriented (OO) Java development atmosphere. It can also serve as an entry point for an OO dev into the Scala community. They are based on a talk I gave at the Typelevel Summit in Philadelphia in 2016. Here are the slides and index cards from that talk. A video is available on YouTube.

We'll transform from OO paradigms to FP paradigms through five key ideas:

  1. Immutability as Default
  2. Combinators over loops, null & throw
  3. Case Classes for auto-encapsulation
  4. Objects are not Coroutines
  5. Type Classes over Subclasses

And provide practical examples using 3 libraries:

  1. Monocle
  2. Argonaut
  3. Cats

Each of these ideas builds upon the last and by the end we'll have a solid foundation for introducing FP and Category Theory without even saying any of the "M-words".

Before starting it is important to note that many of the people who develop software do not have Computer Science (CS) degrees. They will sometimes have a degree in another science or engineering discipline. Often, they will have no degree at all. An extension of this idea is you don't need a degree to define flatMap.

This fact is important to us because, the vernacular of CS is different from the vernacular of other disciplines. On the other hand, we are assuming an audience of OO developers so, some CS terminology is ok and indeed will be present throughout. Wherever OO words fail us, we will find "plain old English" ways of describing idioms and ideas. We will be coaching usage, strategies and tactics not terms and vocabulary.