sean cassidy : Strings are untyped

in: programming

There has been a lot of discussion recently about whether or not strings are broken or if we even need them. This misses what I believe to be a more significant issue with strings.

Strings are essentially untyped, like a bare Object or Any. You wouldn't use an Object unless you had to, right? So why do we use strings in the same way?

Has this happened to you?

public String createPath(String domain, String fileExt, String customerId, /* etc */) {
    // combine all of these in a complicated way
    return result;
}

And then you switch two of the arguments around?

final String result = createPath(domain, customerId, fileExt ...); // Whoops!

No compiler error here. Maybe no run time error either, depending on what you use that for. This has bitten me harder than I'd like to admit.

Java, C, C#, and most other popular languages have no way of representing that there is a difference between a "domain" and a "customerId" without making an entire object to distinguish it and boxing it up. Talk about overhead.

This affects even very strongly typed languages like Scala. If you try and fix it in Scala like this (maybe taking inspiration from Haskell's newtype and hoping it'll work), it won't work:

type CustId => String
type FileExt => String

def func(a: CustId) = ???

def otherFunction(b: FileExt) = func(b) // Compiles!

Using 'type' merely saves you from typing (on a keyboard), and doesn't actually introduce new restrictions.

How can we make it fail without writing Haskell and using newtype?

Type tags

We can use Scalaz to get tagged types, which let's us add ancillary types to other types:

import scalaz._

trait CustomerId

def func(a: String @@ CustomerId) = a + " is a CustomerId"

def hello() = func("Hello") // Compilation error!

Because "Hello" is not of the type String @@ CustomerId, it's a compilation error. To use it, we need to be able to construct CustomerId tagged strings easily, like this:

trait CustomerId

def CustomerId(a: String): String @@ CustomerId = Tag[String, CustomerId](a)
def func(a: String @@ CustomerId) = a + " is a CustomerId"

def hello() = func(CustomerId("Hello")) // Works!

Now we have a function called CustomerId which constructs String @@ CustomerId from a String.

This seems verbose, though. What was that keyword we used when we needed to type less? Oh yeah! 'type'!

trait CustomerIdTag
type CustomerId = String @@ CustomerIdTag
def CustomerId(a: String): CustomerId = Tag[String, CustomerIdTag](a)

These three lines to make dealing with strings safer. Haskell does it in one line, but three isn't that bad. Go also solves this effectively.

Java/C++/C# can't do this without boxing up the string into another object. Let me know if there are any languages that can do something similar to this; I'd love to know.

Why is this better?

Why should we preserve the underlying String type? Because it is very useful.

We can form paths with it, URLs, log properly, format emails properly, and so on. Strings are useful, and this keeps that.

Boxing a String up is less useful because you constantly have to unbox it via .get() or similar. And as soon as you unbox it, it loses it type protection.

Strings are untyped, so add typing information

Don't use multiple different types of plain strings near each other. It's like passing around Objects.

Instead, search out for your favorite programming language's solution to this problem. Haskell, Scala, and other very strongly typed languages offer solutions. If you need to, box them up in another object.

Sean is the Head of Security at Asana, a work management platform for teams.

Follow @sean_a_cassidy