Categories
Uncategorized

JSON Validation and Type Driven Development

Writing a Parser in TypeScript starting with types first.

In my personal projects I have fallen in love with solving my problems via Type Driven Development.

Given a language has static types, generics, and first-class functions it hits the sweet spot for this kind of development. The only real requirement is first-class functions because it is an application of Lambda calculus principles.

The Problem with any

Typed languages provide safety. If the developer uses an API incorrectly, the computer will yell at them.

type Product = { readonly name: string } function createProduct(name: string): Product { return { name }; } createProduct(5);

When calling createProduct with name of something other than a string the computer cries out:

Argument of type '5' is not assignable to parameter of type 'string'.

A problem I want to solve in one of my side-projects is JSON safety. Take Product as an example. When serializing it with JSON.stringify and then parsing it with JSON.parse, the type is lost:

type User = { readonly username: string } function renameUser(name: string, user: User): void { // implementation left blank } const product = createProduct('some product'); renameUser('some user', product); renameUser('some user', JSON.parse(JSON.stringify(product)));

The second call to renameUser shows no error. The first call to renameUser shows:

Argument of type 'Product' is not assignable to parameter of type 'User'.
  Property 'username' is missing in type 'Product' but required in type 'User'.

If we write the unit test I’m confident we can prove that product and JSON.parse(JSON.stringify(product)) are deeply equal.

The problem is that JSON.parse() returns any (in TypeScript and Flow).

A similar problem exists in all of the languages I have come across:

  • Java org.json.JSONObject and org.json.JSONArray
  • Swift/Objective-C have JSONSerialization/NSJSONSerializalion
  • PHP’s json_decode

Going from binary data to native object is inherently unsafe. When the JSON data comes in from an external system – like a REST API – the risk is real.

A Band-Aid

In a language like TypeScript or Flow the straight-forward way to safely deal with JSON values is through type refinement.

This results in an increasing number of type guards as different members within the any type are accessed. Assuming your chosen REST API layer does JSON marshaling for you:

const result = await api.get('http://example.dev/api/people'); if (result && result.people && Array.isArray(result.people) { people.map(person => { // more runtime type refining 💩 }) }

If both client and server are both under your control, or you feel somewhat confident enough in the REST API maintainers, one might feel brazen enough to force the situation:

type PeopleResponse = { people: Array<Person> }; const result: PeopleResponse = await api.get('http://example.dev/api/people'); // go along your merry way until your Runtime errors start popping up

This is madness. It assumes type safety when there isn’t any. Unfortunately, this is what I see most often in projects at work.

The prospect of writing lines and lines of type refinements for every possible JSON structure for every API response is a lot of work. In my “toy” project I already have 21 different REST API calls with varying shapes of responses and that’s only going to grow.

Can I write a JSON validation layer that’s as declarative as creating custom TypeScript types?

Let’s give it a shot.

Defining our Validation Types

Time to start practicing Type Driven Development.

What is Type Driven Development? Start with types, then write implementations to satisfy the type checker. It’s like Test Driven Development, but you don’t even have to write the tests.

Our current problem is pretty clear. We need a way to write functions that validate some JSON any type. That means we need a function that accepts a single any type as its input.

But which type does it return? That should be up to the implementation of the validation, and at this point, that implementation doesn’t exist. So we’ll use a generic type to stand in its place:

type Validator<T> = (value: any) => T;

This states that a Validator<T> is a Function that accepts a single any and returns a T.

This makes sense for success cases, but what about failure cases? What happens when validation fails?

At this point there are two options to deal with failure:

  • throw an Error
  • return a Union type to indicate success or failure modes.

Common usage of a Validator<T> expects failure. Using throw might feel simpler at the implementation level, but it forces the user of the Validator<T> to take on that complexity. TypeScript’s (or Flow’s) Union types allow for safe handling of success/failure modes.

Here’s what a Union type API looks like:

type Success<T> = { readonly type: 'success' readonly value: T } type Failure = { readonly type: 'failure' readonly value: any readonly reason: string } type Result<T> = Success<T> | Failure; type Validator<T> = (value: any) => Result<T>;

This looks like the complete set of types for a “validation” API. A function that accepts any thing and returns Success<T> or Failure. The Success<T> boxes the typed value with the refined type. The Failure contains the original value and the reason that validation failed.

Let’s write our first validator:

const isString: Validator<string> = (value) => { if (typeof value === 'string') { return { type: 'success', value } } else { return { type: 'failure', value, reason: 'typeof value is ' + (typeof value) }; } }

With tsc and jest we can confirm that both type checking and runtime behavior match our expectations:

describe('isString', () => { it('succeeds', () => { const validator: Validator<string> = isString; const value: Result<string> = validator('yes'); expect(value).toEqual(success('yes')); }) });

The remaining non-container (Array, and Object) types are equally as trivial. And to make things a little more convenient we can make Success<T> and Failure factories:

function success<T>(value: T): Success<T> { return { type: 'success', value, }; } function failure(value: any, reason: string) { return { type: 'failure', value, reason, }; }

Now isString, isNumber, isNull, isUndefined, isObject, isArray, isUndefined and isBoolean can all follow this pattern:

const isNull: Validator<null> = value => value === null ? success(null) : failure(value, 'typeof value is ' + (typeof value));

With each basic case we can write the corresponding set of tests to confirm the runtime characteristics and the static type checker’s ability to infer types.

But JSON is more complex than these base types, and our TypeScript types even more complicated with nullables and unions.

We need to be able to combine these base cases into something that can address our real world needs.

Combining Simple Types to Make Complicated Ones

Optional types in TypeScript and Flow are a Union type of null or some type T.

type Optional<T> = null | T;

If we wanted to validate an optional type our validator’s type would be Validator<null|T>.

An optional string validator would have the type Validator<null|string>. We have a Validator<string> already, so perhaps we can utilize that.

const isOptionalString: Validator<null|string> = value => { if (value === null) { return success(null); } return isString(value); }

This works fine, but the idea of writing each isOptionalX sounds boring. And TypeScript types can be more complex than null|T. They can be string | number or any other set of unions.

Since we’re playing at leveraging Lamda calculus concepts, we can lift ourselves out of the minutiae of Validator<T> implementations and start working with validators themselves.

Given two different validators Validator<A> and Validator<B>, can we use what we know about validators to create a Validator<A|B>?

Using Type Driven Development, let’s stub out the function signature:

function oneOf<A,B>(a: Validator<A>, b: Validator<B>): Validator<A|B> { }

At this point tsc is upset:

A function whose declared type is neither 'void' nor 'any' must return a value.

What should we return? A Validator<A|B> is like any other validator in that it accepts a single any argument. In Type Driven Development style, let’s return a function since that’s what it wants:

function oneOf<A,B>(a: Validator<A>, b: Validator<B>): Validator<A|B> { return value => { } }

Now tsc says:

Type '(value: any) => void' is not assignable to type 'Validator<A | B>'.
  Type 'void' is not assignable to type 'Result<A | B>'.

Our function isn’t correct yet. It has no return value (void) but a Validator<A | B> needs to return a Result<A | B>.

We now have all of the inputs we need do do that within the scope of this function. All we need to do is use them:

function oneOf<A,B>(a: Validator<A>, b: Validator<B>): Validator<A|B> { return value => { return a(value); } }

Now tsc is happy, but does it have the runtime characteristics we want?

describe('oneOf', () => { it('succeeds', () => { const validator = oneOf(isNumber, isString); expect(validator('a')).toEqual(success('a')); expect(validator(1)).toEqual(success(1)); )}; });

What does jest think:

    expect(received).toEqual(expected) // deep equality

    - Expected  - 1
    + Received  + 2

      Object {
    -   "type": "success",
    +   "reason": "typeof value is number",
    +   "type": "failure",
        "value": 1,
      }

It failed with the number value as it should have, because we didn’t use both Validator<T>‘s.

function oneOf<A,B>(a: Validator<A>, b: Validator<B>): Validator<A|B> { return value => { const result_a = a(value); if (result_a.type === 'success') { return result_a; } return b(value); } }

If Validator<A> succeeds, we return a Success<A>. Otherwise return the result of Validator<B> which is Success<B> | Failure.

We’ve written a function that accepts two Validator<T> types and returns a new Validator<> by combining them. We wrote a combinator.

I have so far failed to create a variadic version of oneOf that can take “n” Validator<T>s and infer the union Validator<T1|T2|Tn> type. This means we need to use multiple calls to oneOf to build up inferred union types:

const validator: Validator<null|string|number> = oneOf( isNull, oneOf(isNumber, isString) );

Since nullable types are so common – and because it’s so easy to do given our APIs – we can use oneOf to make a convenient combinator that takes a Validator<T> and turns it into a Validator<null | T>. I’ll name it optional.

Definition:

export const optional = <T>(validator: Validator<T>): Validator<null|T> => oneOf(isNull, validator);

And in use:

import { optional, isNumber } from `./validator'; const validate = optional(isNumber); validate(1); // returns Success<null | number>; validate(null); // returns Success<null | number>; validate('hi'); // returns Failure

Again, we’re using a combinator to build up a complex Validator<T> without actually implementing any new Validator<T>s.

We can do the same thing to build Object and Array validators.

TypeScript’s Mapped types

The ideal API for validating should be as terse and declarative as a custom TypeScript type. Here’s a somewhat complex type:

type Record = { readonly name: string readonly owner: { readonly id: number readonly name: string readonly role: 'admin' | 'member' | 'visitor' } }

This is my ideal API:

const validateRecord = objectOf({ name: isString, owner: objectOf({ id: isNumber, name: isString, role: isValidRole, }); });

The combinator we want to make here is objectOf. It will take a plain object who’s keys point to values of Validator<T>s and returns a Validator<Result<{...}>> that matches the shape of the validator.

In TypeScript we can infer this type using Mapped types. One of the examples looks similar to what we want:

Now that you know how to wrap the properties of a type, the next thing you’ll want to do is unwrap them. Fortunately, that’s pretty easy:

type Proxify<T> = { [P in keyof T]: Proxy<T[P]>; }; function unproxify<T>(t: Proxify<T>): T { let result = {} as T; for (const k in t) { result[k] = t[k].get(); } return result; }

In terms of our domain we want to map the keys K of some generic object T into validators that validate the type at key K in T.

export function objectOf<T extends {}>( validators: {[K in keyof T]: Validator<T[K]>} ): Validator<T> { }

So far what does tsc think:

A function whose declared type is neither 'void' nor 'any' must return a value.

Time to implement the combinator:

  1. Declare an instance of validated T
  2. Iterate through the keys of the mapped validators.
  3. Validate the value at value[key] with its corresponding validators[key].
    1. If Success<T[K]> set validated[key] = result.value
    2. If Failure return the Failure
  4. return success(validated)
export function objectOf<T extends {}>( validators: {[K in keyof T]: Validator<T[K]>} ): Validator<T> { let result = {} as T; for (const key in validators) { const validated = validators[key](value ? value[key] : undefined); if (validated.type === 'failure') { return validated; } result[key] = validated.value; } return success(result); }

Now for a test:

describe('objectOf', () => { it('validates', () => { const validate = objectOf({ name: isString, child: objectOf({ id: isNumber }), }); const valid = { name: 'Valid', child: { id: 1 }, }; const invalid = { name: 'Invalid', child: { id: 'not-number' }, }; expect(validate(valid)).toEqual(success(valid)); expect(validate(invalid)).toEqual(failure(invalid, 'typeof value is string' )); }); });

And both tsc and jest are happy. Not only does it validate as expected, but it also infers the shape of the value:

Screen capture of Visual Studio Code showing the inferred shape of validate.

It knows that this particular use of objectOf creates a:

Validator<{name: string, child: {id: number}}>

Which returns a Result<T> type of:

Result<{name: string, child: {id: number}}>

An example in action:

const validate = objectOf({ id: isNumber, name: oneOf(isString, isNull), role: oneOf(isNull, objectOf({ type: isString(), groupId: isNumber() })) }); let result = validate(JSON.parse('{"name": "sam", "id": 5}'); if (result.type === 'success') { /** * result is Success<{ * id: number, * name: string | null, * role: null | {type: string, groupId: number } * }> */ result.value.name // null | string result.value.role // null | {type: string, groupId: number} } else { // Failure throw new Error(result.reason); }

If you already have a type you know you need to validate for, you can use it as the generic argument to objectOf and tsc will enforce that all of the keys are present:

type Record = { id: number, name: string }; const validate = objectOf<Record>({});

The tsc error shows:

Argument of type '{}' is not assignable to parameter of type '{ id: Validator; name: Validator; }'.
  Type '{}' is missing the following properties from type '{ id: Validator; name: Validator; }': id, name

It knows a validator for the Record type needs an id validator and a name validator.

It even knows which type of Validator<T> it needs:

const validate = objectOf<Record>({ id: isString, name: isString. });

id in Record has a type of number, but isString cannot validate to number:

(property) id: Validator
Type '(value: any) => Result' is not assignable to type 'Validator'.
  Type 'Result' is not assignable to type 'Result'.
    Type 'Readonly<{ type: "success"; value: string; }>' is not assignable to type 'Result'.
      Type 'Readonly<{ type: "success"; value: string; }>' is not assignable to type 'Readonly<{ type: "success"; value: number; }>'.
        Types of property 'value' are incompatible.
          Type 'string' is not assignable to type 'number'

You can see how it worked out that the id validator of isString does not return a Result<T> that is compatible with number which is the type of Record['id'].

One last thing to make use of objectOf a little nicer. When it iterates through the keys of the validators and reaches a Failure type, it returns the Failure as is. This resulted in a somewhat opaque failure reason:

const invalid = { name: 'Invalid', child: { id: 'not-number' }, }; expect(validate(invalid)).toEqual(failure(invalid, 'typeof value is string' ));

The "typeof value is string" message failed because invalid.child.id was a string, not a number. Given we know which key was being validated when the Failure was returned, we can improve the error message:

function keyedFailure(value: any, key: string | number, failure: Failure): Failure { return { ...failure, value, reason: `Failed at '${key}': ${failure.reason}`, }; }

Now the failure in objectOf can be passed through keyedFailure before returning:

for (const key in validators) { const validated = validators[key](value ? value[key] : undefined); if (validated.type === 'failure') { return keyedFailure(value, key, validated); }

The improved error message is now:

"Failed at 'child': Failed at 'id': typeof value is string"

The value at .child.id was a string, and that’s why there’s a failure. Much clearer.

We’re an arrayOf implementation away from a fully capable JSON validation library. But before we go there, we’re going to detour into more combinators.

Combinators

In Lamda calculus a combinator is an abstraction (function) whose identifiers are all bound within that abstraction. In short, no “global” variables.

If we consider the behavior of Validator<T> and how it returns one of two values Success<T> or Failure a natural branching control flow reveals itself.

In our example uses of Validator<T> instances, to continue using it, the next step is to first refine it by checking result.type for either success or failure.

Given how common this pattern is, we can write some combinators to make them slightly easier to work with.

In most uses of Validator<T> we want to do something with the boxed value of the Success<T> case of Result<T>.

This looks like:

const result: Result<Thing> = validate(thing); if (result.type === 'success') { const value: Thing = result.value; // do something interesting with value }

The pattern here is refining to the success case, then using the success value in a new domain. So if the user of validate had a function of type:

(thing: Thing) => OtherThing

It would be nice if they could forego the extra refinement work. We can define that pattern in a combinator.

We want to map the success case into a new domain.

function mapSuccess<A, B>(result: Result<A>, map: (value: A) => B): B|Failure { if (result.type === 'success') { return map(result.value); } return result; }

And in use:

function isAdmin(user: User): boolean { // something interesting return true; } const validate = objectOf<User>({ ... }); const isAdminResult: Result<boolean> = mapResult(validate(JSON.parse("{...}"), isAdmin);

And for the sake of completeness, the comparable mapFailure:

function mapFailure<A,B>(result: Result<A>, map: (value: Failure) => B): Success<A>|B { if (result.type === 'failure') { return map(result); } return result; }

Why would you want this? It allows you to write pure functions in your business domain, like isAdmin above, and then combine them with the Validator<T> domain, without using any glue code.

The fewer lines of code, the fewer variables to type. And we have tsc there to let us know when the function signatures don’t match.

For instance trying to use a function that takes something other than a User is going to fail type analysis when used with mapResult(Result<User>, ...).

The less often you need to cross domains within your APIs, the more decoupled they are.

Validating Array

A Validator<T> returns a Result<T>. What if we wanted to continue validating T and turn it into another type? Let’s consider Array.

The first step to turning an any type into an Array<T> is first checking if is in fact an Array.

This is similar to our other base validators:

const isArray: Validator<any[]> = value => Array.isArray(value) ? success(value) : failure('value is not an array');

The next step is iterating through each member in the Array<any> and validating the member. Since we’re practicing Type Driven Development, we’ll start with the type signature.

function arrayOf<T>(validator: Validator<T>): Validator<Array<T>> { }

And jus like before tsc isn’t happy:

A function whose declared type is neither 'void' nor 'any' must return a value.

We just defined isArray. It would be neat if we could use it here. Thinking about it, it would be nice to be able to take the success case of isArray and then do more validation to it and return a mapped Result<Array<T>>.

Let’s write one more combinator that maps a Validator<A> into a Validator<B> given a function of (value: A) => Result<B>.

function mapValidator<A, B>( validator: Validator<A>, map: (value: A) => Result<B> ): Validator<B> { }

If the Result<A> case is a Failure, it should be returned right away, but if it’s a Success<A> we want to unbox it and give it to (value: A) => Result<B>.

Does that sound familiar? We want to map the success result of Validator<A>. That’s mapSuccess. We can define mapValidator in terms of mapSuccess:

function mapValidator<A, B>( validator: Validator<A>, map: (value: A) => Result<B> ): Validator<B> { return value => mapSuccess(validator(value), map); }

Using mapValidator allows us to define a validation in terms of another Validator<T>.

So now we can define Validator<Array<T>> in terms of Validator<any[]>:

function arrayOf<T>(validate: Validator<T>): Validator<Array<T>> { return mapValidator(isArray, (value) => { }); }

At this point tsc can determine that value is type any[]. But to satisfy Validator<Array<T>> we need to validate each member of any[] with Validator<T>.

If any item fails validation, the whole Array fails validation. So not only are we validating each member, but potentially returning a Failure case. We need to reduce any[] to Result<Array<T>>.

We can seed the reduce call with an empty success case:

return mapValidator(isArray, (value) => value.reduce<Result<T[]>>( (result, member) => undefined, success([]) )

But what to use for our reduce function? We’re declaring to Array.prototype.reduce that the first argument and return value is a Result<T[]>. That means the type of our reduce function needs to be of type:

(result: Result<T[]>, member: any, index: number) => Result<T[]>

If result is ever the Failure case, we don’t want to do anything, we only want to handle the Success<T[]> case. That’s another case for mapSuccess:

(result, member, index) => mapSuccess( result, (items) => )

Now that we are within an iteration of the array, we have enough context to use our Validator<T> on the member. If it’s successful, we want to concat it with the rest of items, if a failure, we’ll just return it (for now).

Another case for mapSuccess:

(result, member, index) => mapSuccess( result, (items) => mapSuccess( validate(member), valid => success(items.concat([member]), ) )

And here’s the complete arrayOf:

function arrayOf<T>(validate: Validator<T>): Validator<Array<T>> { return mapValidator(isArray, (value) => value.reduce<Result<T[]>>( (result, member, index) => mapSuccess( result, items => mapSuccess( validate(member), valid => success(items.concat([valid]) ) ), success([]) ) ); }

In a test:

describe('arrayOf', () => { const validate = arrayOf(objectOf({ name: isString })); it('succeeds', () => { const values = [{name: 'Rumpleteazer'}]; expect(validate(values)).toEqual(success(values)); }); it('fails', () => { const values = [{name: 1}]; expect(validate(values)).toEqual(failure(values, 'Failed at \'name\': typeof value is number'); }); });

One last thing before we tie a ribbon on Validator<T>. The Falure case reason says:

"Failed at 'name': typeof value is number"

In the context of .reduce we know which index we are currently on while iterating. So when we validate the member, we can use mapFailure to enhance the Failure case. Here’s the new reducer:

(result, member, index) => mapSuccess( result, items => mapSuccess( mapFailure( validate(member), failure => keyedFailure(items, index, failure) ), valid => success(items.concat([valid]) ) ),

And now the Failure reason is:

"Failed at '0': Failed at 'name': typeof value is string"

Wrapping It Up

I have now used this library to create type safety for all of my project’s JSON based REST APIs.

Functions that once used half of their lines for type refinements are now one mapSuccess away type safe response values.

Taking my API responses was a matter of mapping my JSON decoders to Validator<T> instances.

Before:

export const v3SubmitOrders = jsonEncodedRequest( fw(build.post('/v3/submit_orders')), ({options}: SubmitOrders) => ({orders: options.orders, validate_only: options.validate_only !== false}), response.decodeJson );

After:

export const v3SubmitOrders = jsonEncodedRequest( fw(build.post('/v3/submit_orders')), ({options}: SubmitOrders) => ({orders: options.orders, validate_only: options.validate_only !== false}), response.mapHandler(response.decodeJson, objectOf({ status: validateStatus, orders: arrayOf(objectOf({ order_po: isString, order_id: isNumber, order_confirmation_id: isNumber, order_confirmation_datetime: isString, })), debug: isAnyValue, misc: isAnyValue, })) );

One Promise resolver later, and I have type safe JSON responses:

cost result = await v3SubmitOrders({orders: [123]).then(requireValidResponse);

Implementing a Validator<T> not only provides type safety, it also provides better documentation.

Without fail, every time I approach an API using Lambda calculus principles I end with an API that is declarative and easy to combine.