Zod
I am a big fan of TypeScript, but one thing that I still find lacking compared to a real strongly type language: the runtime validation of the data.
While a static type checker esnures everything works, eventually we have to interact with the scary outside world.
This can include user input, API responses, streamed data, reading data from disk or even loading environment variables.
Lying interfaces
One thing that is more harmful than no interface, is a lying interface. This will set you —and your team— up for a world of unnecessary debug sessions.
Let’s say you have a neat generic function that fetches data from an API:
async function fetchData<T>(url: string): Promise<T> {
const response = await fetch(url);
return response as T;
}
You have properly declared an interface for the data you expect:
type ApiData = {
foo: string;
bar: number;
}
And when you call the function you expect to get the data you want:
const response = await fetchData<ApiData>("example.api.org"); // ApiData
At first glance the interface looks fine, the function has explicit type inputs and outputs.
But theres a catch — or rather 2 lies:
- The function can return anything, not just
ApiData
. - The function can throw an error, but you don’t know that from the interface. You’re left in the dark about what could go wrong.
You can easily spot those lies in source code whenever you see an as T
(which can be read as as LIE
).
Removing the lies
We can enhance the interface by adding a runtime validation library like Zod1. This requires some initial extra typing, but it pays off in terms of new type-safe superpowers inside your IDE.
import { SafeParseReturnType } from 'zod';
type Response<T extends ZodSchema> =
| SafeParseReturnType<T, z.output<T>>
| { success: false; error: unknown };
async function fetchData<T extends ZodSchema>(url: string, schema: T): Promise<Response<T>> {
try {
const response = await fetch(url);
return schema.safeParse(response);
} catch (error) {
return { success: false, error };
}
}
Instead of defining a type directly, we define a schema with Zod
and infer the type from it.
import { z } from "zod";
const apiData = z.object({
foo: z.string(),
bar: z.number(),
});
// Direct replacement for the previous type
type ApiData = z.infer<typeof apiData>;
Now, when we call the function, we’re sure that the data we get is according to the schema. And if not, we get proper typed errors.
export async function getData() {
const response = await fetchData("example.api.org", apiData);
if (!response.success) {
console.log(response.data); // undefined
if (response.error instanceof ZodError) {
// The data we got from the API is not according to the schema
console.log(response.error); // ZodError
return null;
}
// The API request failed
console.log(response.error); // unknown
return null;
}
console.log(response.error); // undefined
return response.data; // ApiData
}
Back to the real world
For my client, Growy, we’re dealing with unstructured data from various sources.
Growy is revolutionizing vertical farming by heavily relying on (hundreds of) IoT devices sending and receiving data to/from the cloud.
Our software is handling data from two primary sources: DynamoDB and MQTT messages.
In our software, we’ve developed classes that allow for creating data objects for static type-safety.
export class GrowthCycle {
id: string;
type: string;
amount: number;
constructor(id: string, type: string, amount: number) {
this.id = id;
this.type = type;
this.amount = amount;
}
static fromDynamoRecord(data: Record<string, string | number | undefined | null>) {
return new GrowthCycle(
data.id as string,
data.type as string,
data.amount as number
);
}
// ... other methods
}
For DynamoDB queries, we have functions like:
function getRecord(id: string) {
const response = await dynamoClient.send(<QUERY>);
const Item = response.Item; // Record<string, any>
return GrowthCycle.fromDynamoRecord(Item); // GrowthCycle
}
And for MQTT messages, we’ve created handlers that parse data2 as follows:
mqqClient.onMessage(message => {
const data = JSON.parse(message.payload.toString()); // any
const example = new GrowthCycle(data.id, data.type, data.amount); // GrowthCycle
// ... do something with the data
})
But, you might have noticed some as LIE
s in the code.
When parsing a record from DynamoDB, we’re assuming that the data is always there and of the correct type. Similarly, when parsing an MQTT message, we’re assuming that the payload is always a stringified JSON object containing the correct data.
While all interfaces and static types are correct, runtime data could still be (and sometimes was) wrong.
IoT devices slowly being added to the platform
The problem
As we tested our platform with simulated (IoT) devices, everything seemed to work smoothly. But when the simulated devices were being replaced by their actual real-world counterparts, that’s when things got interesting.
When we started to get actual data from the real world, bugs started to surface. The data we were expecting and worked with in our simulated environment was not always the correct data. The interfaces from “the hardware team” and “the software team” were not always in sync.
Some bugs were found immediately, as the services started breaking down when the wrong data was received:
TypeError: Cannot read property 'foo' of undefined
or NaN
values after calculations were not uncommon.
But some bugs were more subtle. Wrongly received data was stored in the DynamoDB table and only processed in later stages, making it harder to identify the root cause.
Adding runtime validation
To prevent services from crashing and data from being stored incorrectly, we introduced runtime validation to our data objects.
When I joined the project, which was already quite large, it had adopted a heavy Object-Oriented Programming (OOP) style. The data objects were used exetensively throughout the codebase. As a result, we decided to not go full-in on the zod way.
Instead, we opted for a hybrid approach, adding zod parsing capabilities to the existing classes. This involved some TypeScript magic, but allowed us to leverage the benefits of runtime validation without disrupting the overall codebase structure.
The following code took some time to made, had its’ fair share of issues while developing and also gave me some headaches. Have a
The following Parsable
class serves as the foundation for our runtime validation framework:
import { SafeParseReturnType, ZodSchema, ZodTypeDef } from 'zod'
export interface GeneratedParsable<Output = unknown, TypeDef extends ZodTypeDef = ZodTypeDef, Input = Output> {
schema: ZodSchema<Output, TypeDef, Input>
// passing `unknown` instead of `any` will beak TS
// eslint-disable-next-line @typescript-eslint/no-explicit-any
parse<TFinal extends new (data: any) => InstanceType<TFinal>>(this: TFinal, value: unknown): InstanceType<TFinal>
// passing `unknown` instead of `any` will beak TS
// eslint-disable-next-line @typescript-eslint/no-explicit-any
safeParse(data: unknown): SafeParseReturnType<Input, Output>
new (input: Input): Output
}
export function Parsable<Output = unknown, TypeDef extends ZodTypeDef = ZodTypeDef, Input = Output>(schema: ZodSchema<Output, TypeDef, Input>) {
const Generated = class {
static schema = schema
static parse<T extends typeof Generated>(this: T, data: unknown) {
return new this(data as Input)
}
static safeParse(data: unknown) {
const result = schema.safeParse(data)
if (!result.success) {
console.warn('Failed to parse data', { data, error: result.error })
}
return result
}
constructor(input: Input) {
const result = Generated.safeParse(input)
if (!result.success) {
throw result.error
}
Object.assign(this, result.data)
}
}
return Generated as unknown as GeneratedParsable<Output, TypeDef, Input>
}
Okay, busted. In our code we also use some as LIE
s, but let’s break it down.
data as Input
Our constructor requires an Input
type and does not allow the unknown
data we pass to the parse
method.
By doing this, instead of allowing constructor(input: Input | unknown)
we get enhanced type hints in the IDE when constructing data objects. More on this later.
as unknown as GeneratedParsable<Output, TypeDef, Input>
A double LIE
! To make your IDE happy we need to return any form of our GeneratedParsable
interface.
As the shape of the schema can be (almost) anything, typescript will infer everything from the generics Output
, TypeDef
and Input
and use them in the GeneratedParsable
interface.
We first need to tell typescript Generated
is unknown
before we tell it that it is actually a GeneratedParsable
.
Luckely, all of the above happens behind the scenes and we can extend the Parsable
in our data objects to give it runtime validation capabilities.
import { z } from 'zod'
const schema = z.object({
id: z.string(),
type: z.string(),
amount: z.number()
})
export class GrowthCycle extends Parsable(schema) {
// ... other methods
}
With this setup, we can now use the parse
and safeParse
methods to validate any unknown data.
const validInput = { id: '123', type: "foo", amount: 42 }
const invalidInput = { id: '123', type: false, amount: 42 } // type should be a string
GrowthCycle.parse(validInput) // GrowthCycle
GrowthCycle.parse(invalidInput) // Throws a `ZodError`
// or
GrowthCycle.safeParse(validInput) // Returns a `GrowthCycle`
GrowthCycle.safeParse(input) // Gracefully returns a `ZodError`
On top of this, even new instances of any Parsable
class will be validated using the schema and giving type-hints in the IDE. Adding some extra static type checking magic to the mix.
const invalidInput = { id: '123', type: false, amount: 42 }
const example = new GrowthCycle(invalidInput)
// Argument of type '{ id: string; type: boolean; amount: number; }' is not assignable to parameter of type '{ type: string; id: string; amount: number; }'.
// Types of property 'type' are incompatible.
// Type 'boolean' is not assignable to type 'string'.
From this moment onwards we can be sure the data we are working with is in the expected format, no more need for as LIE
s.
When receiving faulty data, we’ll be better equipped to pinpoint and (hopefully) handle it gracefully.
Monitoring receiving of faulty data
The aftermath
After implementing runtime validation, our services started reporting a lot more faulty data inputs. Initially, the team was not very pleased with the constant need for fixing unnecessary errors. Instead of adding new features the focus was on data integrity for nearly a full month.
However, as time passed, received data became increasingly in line with the expected format. This led to better discussions between teams about data formats and interfaces, as faulty data would no longer be accepted.
As a result, our platform is now more stable, and the reliability of our data has improved significantly.
Room for improvement
While the hybrid approach worked well for this project, it’s not perfect. For instance, it does not work with more complex types like like unions and intersections.
When starting a new project, I would personally be going full-in on the zod way. This would remove the need for the Parsable
entirely.
import { z } from "zod";
export const GrowthCycle = z.object({
id: z.string(),
type: z.string(),
amount: z.number()
});
export type GrowthCycle = z.infer<typeof GrowthCycle>;
You could then use the GrowthCycle
type when only type information is needed.
import type { GrowthCycle } from "./GrowthCycle";
function handler(growthCycle: GrowthCycle) { /* ... */ }
Or pass the GrowthCycle
zod object when data validation is required.
import { GrowthCycle } from "./GrowthCycle";
export class Handler {
handle(data: unkown) {
const { success, data, error} = GrowthCycle.safeParse(data);
// ... handle the result
}
}