A case to introduce Scala in a Java world

Stéphane Derosiaux
15 min readAug 29, 2018
The Purple Tentacle is me: I want the blue water back. Yes, I’m never happy.

New Job, Big Company, Java team, no Scala miles away.

My goal: move some not-that-big Java projects to Scala and start the inertia to create new projects in Scala.

Why? While I like Java as it gains more capabilities, I simply love Scala. The existing projects were in good ol’ verbosy Java and I know quite well Scala and the benefits we could expect from it.

I think Scala makes it easier to read — when we don’t use “esoteric” libraries — , to share with others, and to refactor. Scala leads to a more robust code, a better productivity, and developers produce less bugs. This is all due to the powerful Scala type system (among other things) and the way of coding.

The experimented developers around were inclined to learn and change. Only the managers were chilly about it: “If we start using Scala, we need to validate it with the company hierarchy first. To find Scala developers is more difficult than Java ones.

I need to be persuasive. I need to explain why we are going to be more productive using Scala, why it’s good for developers to use it, why we need to spread it: what are the drawbacks of Java, what are the strengths of Scala.

Years ago, coming from .NET, I had to jump into both the modern Java & Scala land at the same time — I’d never been past Java 1.5 and ant before that. The learning curve was quite steep but who is not looking for some challenges nowadays? Scala is a challenge to open our minds to the broad world of FP and types systems. I learned so much about programming thanks to Scala.

In this article, I’ll talk about what I lived and learned from my own experience. You may have a different point of view, feel free to share it.

We’ll look into Java drawbacks and improper practices I’ve seen in projects and consider how Scala can help us to avoid them.

What was wrong?

Logorrhea

We all know it, Java is terribly verbose. Some people like it, others just live with it. The lambdas helped us tremendously to reduce verbosity but it’s not enough.

var will help, data class will help, pattern matching will help, but not every project can upgrade their JVM like that (Java 9+). Most of us will be stuck on Java 8 for a while.

Accidental verbosity doesn’t help at all to understand the code, on the contrary. Java conveys way more info than it should. It’s not only a question of types but also of API design. The more you must read in a code, the more you have to accumulate in your mind to reason about it.

I prefer not to think a lot when I code or review, just the bare minimum, just the business flow, not the programming language itself. Scala makes the code more declarative whereas Java tends to be imperative. You need to “run the program” in your head to understand what’s going on.

Even Maven — the ubiquitous Java build tool — is quite verbose and “hard” to understand. In Scala, we have sbt aka “Simple Build Tool” (let’s not kid ourselves, it’s far from being simple). Still, when a friend of yours helps a project and translate “263 lines of unreadable maven xml to 34 lines of sbt definition”: I prefer to read 34 (unobfuscated) lines to understand.

https://twitter.com/guizmaii/status/1032929860430323715

The not-so-happy path

Because of the checked exceptions constraints in Java, a lot of Exceptions are rethrown as RuntimeExceptions. This happens because:

  • the interface did not plan to throw anything
  • the interface did not plan to throw the Exception an implementation could (it makes sense, why would the interface know that?)
  • “we don’t care” to declare them and prefer to let the downstream free of try-catch. Raise your hand if you never did that ever?

Let’s put aside the catch-all Exception handler bad smell to avoid writting multiple catchs.

When reviewing how multiple functions work together, try-catch can blur the logic. It’s a “side-output” that short-circuits the functions, something you’d like not to have to think about. The code should always be read in one way, not several.

An analogy I like is the Railroad oriented programming to handle errors explicitely in the type (as we do in Scala), allowing to compose functions safely.

Most functions have a happy path and a not-happy path (errors). But it’s explicit in the return types such as using Either[Error, Int], allowing for compositions and errors traversal.

https://www.slideshare.net/ScottWlaschin/railway-oriented-programming

Annotations break layers

In the projects I was working on, the models were intertwined with annotations from Jackson and ORMs. Guice was all over the place too. Hopefully, I didn’t find any Spring inside (which rely heavily on annotations).

Annotations can have good sides:

  • Bring “magical” behaviors at runtime without typing much (like with Guice or Spring).
  • Add metadata on top of classes or member variables (like Jackson).
  • Make it easy to create HTTP restful webservices (Spring, JAX-RS).

They are all good use-cases, but they are a two-edged sword.

Some annotations force us to break layers boundaries and merge code that should not be tied together. Often, Annotations force tight coupling. This is totally against the rules.

I may be too strict (Software Craftsmanship anyone?) but projects should be separated into layers: model, persistence, service, api… that should be reusable.

Read more about Clean Architecture https://android.jlelse.eu/thoughts-on-clean-architecture-b8449d9d02df

Call it what you want:

The point is that the frameworks choices should be deferred the further we can in the layers. Frameworks must not be part of the core business — the domain — and we should find a proper separation of concerns in code.

If I want to use some repository interface, I don’t want to import Spring because you added an Spring annotation on it. I want to provide my own implementation for some reason.

Layering makes it easy to understand the scope and the impact of the code we are reading as well as its module concerns.

Here is another example with serialization/deserialization: if I’m reading some code about a Car model to understand how we consider it, I truly do not care how it is serialized. This is totally orthogonal to the core domain, so why add coupling to the code?

Moreover, a Car could be serialized into different forms (JSON, Avro, Protobuf…). Are you going to add more and more annotations from different frameworks on top of it? These form different models and should not be tied together, and even be in different packages and independent modules. What if I want to reuse your model but without the dependencies you carried with your annotations? I’m stuck.

Annotations make it hard to know what is relying on it or using it. How to know it’s not dead code? You can’t just use your IDE to “find references”. It’s another world, another language, like the Upside Down world. This prevents codebase exploration and leads to uncertainty.

In Java, it’s too easy to use annotations all over the place without thinking about the layers and dependencies.

In Scala, we have frameworks that generally don’t work with annotations but with proper code (through code generation, macros and implicits). Therefore, it’s natural to package this code in another module, without impacting the core code. Not all annotations are bad, we’ll see that later.

Runtime checks instead of compile-time checks

Relying on runtime is like playing with fire

Many Java features and frameworks rely on runtime reflection, introspection, and classpath scanning.

As I said, in the projects I wanted to convert to Scala, the code was covered with Guice annotations. It’s “nice” but it’s difficult to understand the graph of dependencies. Because it is built at runtime and uses reflection, we don’t even try, and we let the runtime crash at startup or not… Surprise! What a loss of our time. Who never had this issue? Again, this leads to uncertainty.

With Spring, you have so many annotations available, most of them have an impact at runtime and uses magic strings as parameters. Special tribute to @HystrixCommand(fallbackMethod=“newList”) which will be triggered if the circuit is opened. You’ll notice you have renamed the fallback function but not this magic string in production, when it will crash (correct me if I’m wrong).

Also, this sort of code is idiomatic in Java:

if (obj instanceof Integer) {
int intValue = ((Integer) obj).intValue();
// ...
} else if (obj instanceof String) {
...

Because there is no smart pattern-matching (yet, but it’s gonna be awesome), we sometimes see this code. But the compiler can’t guarantee its completeness at compile-time. What if a developer passes another type for obj and forgets to add a condition here? In Scala, pattern matching is powerful , ubiquitous, and checked at compile-time.

Scala programmers barely use annotations. They are mostly employ to generate boilerplates like with @BeanProperty (to support Java compatibility (!)) or scalameta. They are also use to check functions behavior at compile-time like with @tailrec, to ensure the function is tail-recursive. Scala is oriented compile-time, which is why the programs are more robust.

Note that all Java annotations aren’t bad when they are only processed at compile-time and their goal is to generate boilerplate, such as google/auto, mapstruct, Immutables, or lombok. That’s the good way to use annotations.

A lake of semantics

Often, I find the code not semantic enough in traditional Java. The Collections API don’t offer many functions. I still see home-made for-loops: nothing convey the idea of why are we looping, what do we want to achieve?

The Java Stream API — more semantic, more fluent— is not used enough because of it does not offer enough features and has some peculiarities.

In Scala, the Collections API is more complete, contains many common functions (fold, reduce, exists, diff, filter, head, tail, sliding, sum, zip…), don’t have the weird Collector thing. In Java, because of this lack of functions, you often have to rely on snippets found on StackOverflow or use a distinct 3rd party library as Collections API (like Guava, jOOλ, vavr).

Using properly-named methods, you know what will be the effect without reading the code of the given function: it conveys way more meaning than classic for-loops where you need to dive into the code to understand its (side-)effects.

A lack of abstraction

ClassNotFoundException

null is still a thing in Java. We always assume things can be null in Java. I hate so much null tests. There are some annotations to avoid null tests like @Nullable or @NotNull: it should be part of the type itself, not part of the outer world which are the annotations.

Who never worked with Guava’s Preconditions? (example taken from Apache Druid):

Preconditions.checkNotNull(task, "task");      Preconditions.checkNotNull(status, "status");      Preconditions.checkArgument(        
task.getId().equals(status.getId()),
"Task/Status ID mismatch[%s/%s]",
task.getId(), status.getId());

The code is littered with Preconditions that throw RuntimeExceptions. Every functions in the class needs to check the variables they use again and again. You never know exactly the state of your instance so you add defensive code. This is the worse code ever because nobody will dare to remove it now. It either means the model is wrong or the type system is not strong enough.

These kind of null tests or annotations don’t exist in Scala, so it can’t pollute the code. We use the Scala type system and abstractions to deal with them, like Option we map over. In Java, Optional is unfortunately not ubiquitous because of its peculiarities and poor API.

The F word

All this leads us to Functional Programming.

Java is rarely used with FP in mind (vavr is trying). It’s often mutable and use impure code making it hard to reason about (ie: it’s not referentially transparent, see my previous post to know what it matters: Why Referential Transparency matters).

Also, there is not a lot of Fluent Interfaces in Java. Stream and CompletableFuture are fluent but it’s not universal. In Scala, it’s everywhere because the language and libraries often embraces FP.

If we think about a program, its purpose is to process data like this: input → transformation → output. FP is exactly about this:

Output = Program(input).map(f1).map(f2)

This is a functional style: you pass and compose functions, easy to read and understand. To understand the whole, you divide-and-conquer: you just need to understand each functions apart to understand the whole.

Those functions (here f1 f2) don’t rely on an external context (this). They use the variables we give them and return a result.

Functions should respect 3 rules:

  • Be Total: no partial functions that can return a result different of the function return type (like exceptions).
  • Be Deterministic: given a fixed input, the output should be the same (no randomization, no dependency on an external source outside parameters).
  • Be Free of Side-effects: no mutation outside of the function scope (like println which alters stdout). Those mutations must be declared in the return type of the function, for the caller to be aware of this behavior (often IO).

I won’t explain more of this marvellous world here, you can read more about it on my blog. https://www.sderosiaux.com/articles/2018/08/15/types-never-commit-too-early-part1/

Contrast this with OOP: you have a long class hierarchy where each parent contains a bit of the state. The whole aggregation forms the state of the instance. It’s crazy hard to reason about: you have a this with tons of variables the methods of the class can use, but often they only need one or two. And this is where you start adding special conditions because some can be null, or the whole set of instance parameters can be incoherent.

In FP, there is no need to combine data and functions into objects. The OO approach concerns itself with encapsulation, protected and private data and members to protect itself against confusing mutable state. Once you no longer have mutable state, all of that justification evaporates.

This is why FP is powerful: the scope of the functions is reduced only to what it needs. Nothing is stateful in FP. The state flows from functions to functions.

It makes it easy to test FP programs. You don’t need to start your dependency injection framework or mock your ORM to test something. You just call any public functions with some parameters, test the output, and you’re done.

Should we move to Scala?

It’s not going into extinction

Some people say that Scala is in danger of becoming irrelevant. They supported their belief with GitHub & Tiobe stats which pointed out that Scala was in regression. Moreover, they implied there are less and less people interested in Scala (no source), and that it’s already hard to find Scala developers: therefore it‘s a risky bait.

From my own perspective, I still see a ton of active people on gitter, writing and reading blogs, Scala open-source projects are still growing at a good pace.

Scala comes from the “Big Data” world. Major softwares are written in Scala such as Spark and Kafka. In a world where the data is a first-class citizen, they are ubiquitous. Huge companies are using Scala: LinkedIn, Twitter, Netflix, Criteo. It’s not random: they know it makes them more robust and productive.

Developers are empowered

The more abstract, the more powerful and “short” syntax you get

We are still discovering better abstractions (thanks to the work in cats, monix, scalaz, zio, but also less impactful libraries) and better ways of doing things. It means we are not even on a “stable” phase and still iterating the “how-to”.

Scala 3 is incoming. This will clearly improve Scala features set while simplifying the language. We’ll gain new patterns to work with, new ways of doing things.

I’m always happy when I find out that a French company is now doing Scala. The more, the merrier. It’s sad that it’s a noticeable event and not as widespread as Java, I can just hope the time will come.

I think Scala raises the bar of code quality and encompass developers with powerful principles (FP) and help them open their minds to better code abstractions and a better organization (separation of concerns, Category Theory).

What about the JVM

In Scala, we don’t need more than the JRE 8 to use awesome features Java lacks of: types inference, higher-order types, pattern matching. Scala is compiled to JVM bytecode, we don’t need any migration path to JRE 9+.

It’s like TypeScript and JavaScript: TypeScript provides tons of features and types on top of JavaScript where they don’t exist. It’s not a problem, because we don’t work directly with JavaScript and all its quirks. The TypeScript compiler ensures the program is valid TypeScript first and compiles it to Javascript. This reduces drastically the number of bugs you can have by just working with JavaScript thanks to the static typing.

Note that TypeScript can be translated to another language than JS (like WebAssembly). It’s just a language on top of another. Scala is the same abstraction on top of the JVM bytecode.

Scala can be compiled to languages other than JVM bytecode:

  • ScalaJS which compiles to JavaScript. There are ReactJS bindings, VueJS bindings, and much more. This is very powerful because ScalaJS relies on the Scala type system. You have all its features at disposal, and many frameworks are compatible (ie: can be compiled to JS).
  • Scala Native which compiles to native code. It’s still experimental but still growing slowly.

Also, because Scala runs on a JVM, we use the existing tools to monitor and debug Scala applications than we do with Java applications: jconsole, visualvm, Java Mission Control etc. A Java developer can continue using its tools to debug a Scala program, it does not matter. Only the data structures used internally will be different and proper to the Scala framework.

“Scala is slower than Java”

I’m always sceptic when a Java developer declares that.

It may be true according to your code. But often, you just don’t need the highest performance ever. We can run Scala servers with complex logic inside and still handle easily thousands of QPS (experience inside).

Yes, there is often an overhead when using Scala because we prefer immutability: more garbage objects are generated and GC cleanups more than traditional mutable Java programs. But GC are extremely fast to do this (this only concerns the young generation of objects).

You have to make a tradeoff between a more robust properly typed program in Scala that pressures the GC a bit more, and a less maintenable Java program that may be faster and consume less RAM. What do you prefer?

I would even argue that in Scala, it’s easier — faster and free of bug — to refactor code to improve performance that it is in Java, all thanks to the way we generally code in Scala. Code is generally more abstract and reusable. Just changing some typeclasses implementation can boost the performance without compromising the whole program (in a micro-benchmark, I got a 3x boost with ZIO for instance).

Talking about performances, in Scala we also use jmh which is the de-facto Java standard to write micro-benchmarks. It’s often used to improve a critical piece of code and ensure performance won’t degrade over time and developers. Coupled to sbt-jmh to provide sbt commands, it’s a wonderful tool to use.

Should we move further than Scala?

Further than Scala, there is Haskell or Eta (Haskell implementation on the JVM). Most people think Haskell is the natural “evolution” of confirmed Scala developers. I’m not sure it’s the case, but it clearly helps to know well Scala.

The Scala ecosystem took a lot of ideas from Haskell (scalaz, cats) to deal with Category Theory: a bunch of abstract things that compose. Googling for answers, we often stumbled upon the same concept but with Haskell code. It’s more concise than Scala and has a even more powerful type system (better inference, kind polymorphism (which we’ll have in Dotty!)).

But this looks like a niche (way smaller than Scala). I’ve not heard of any Haskell company in my region, only a few in Paris. But that’s only me.

Conclusion

I think going to Scala is the best step Java developers can do. They won’t be lost because the existing Java ecosystem is available, but there’s more: the Scala ecosystem is also available (and preferred).

It’s a matter of time and experience to code in Scala from “a better Java” to the Scala FP idiomatic way of doing things. Sharing with other experimented developers definitely helps to understand what is this “way”.

If you are uncertain about transitioning directly to Scala, maybe using vavr is a good first step not to disrupt too much the developers. But why take a baby-step, when you can take a man-step? In both cases, there is a learning curve, so better get only one and focus directly on Scala.

Using Scala will make developers naturally be constrained (by the APIs) and guided to the Functional Programming paradigm: this will improve the code quality and expand their mindset. Even if they come back to Java or any language later, they won’t code the same way: they will have evolved.

So, do you think I’m going to sell it?

Thanks for reading!

Huge thanks to @philderome for the corrections and improvements!

--

--

Stéphane Derosiaux

Founder of conduktor.io | CTO, CPO, CMO, just name it. | Kafka and data streaming all the way down