Remote-Url: https://gopiandcode.uk/logs/log-ways-of-sql-in-ocaml.html Retrieved-at: 2023-05-22 07:45:31.980109+00:00 Recall that we came to our macro-based solution because we wanted to do custom compile-time checks on our queries, incurring a large engineering cost to write our own custom validation and type inference logic. In language with a less expressive type system this really would be the only option, but, as it turns out, the features provided by functional programming languages such as OCaml or Haskell make it easy to encode such custom validation logic directly within the programming language itself.The mechanism that we will use to enable this is Generalised Algebraic Datatypes or GADTs. Many guides to GADTs can be found online (see theOCaml manualorReal World OCamlas two popular examples) — a proper introduction to the concept is outside the scope of this post. The rest of this section will continue assuming that the reader has at least a basic understanding of the concept, and illustrate a typed SQL DSL encoding in OCaml.At a high level, GADTs extend the type system to allow constraining the type parameters of a datatype by its constructors:type'a ty= |INTEGER: int ty |REAL: float ty |TEXT: string ty |BOOLEAN: bool ty ...Here, thistytype encodes a number of common SQL types and uses the type parameter to encode their corresponding OCaml representation — for example, integers in SQL, are encoded in this type using the constructorINTEGER. The type of this constructor is explicitly constrained to beint tyreflecting the fact that SQL integers are represented in OCaml byints.For the simple type above, it can be hard to see the benefits of GADTs, which really only kick in when you have more complex data-types. One standard examples of such a GADTs is an encoding a typed expression language:type'a expr= |ADD: int expr * int expr -> int expr |SUB: int expr * int expr -> int expr |AND: bool expr * bool expr -> bool expr |OR: bool expr * bool expr -> bool expr |CONST: 'a * 'a ty -> 'a expr ...Here, we have a type'a exprwhich represents a well typed expression with type'a. The "well-typedness" of our expressions is actually enforced by types of each constructor — for example, an additionADD(x,y)of typeint expr, i.e. returning an integer, is itself declared to only take sub-expressions that are of typeint expr.While working with constructors directly may seem a little heavyweight, we can introduce some simple functions over them to produce a fairly ergonomic interface that looks almost exactly like vanilla OCaml code:letivl=CONST(vl,INTEGER)let(+)xy=ADD(x,y) i 1 + i 10(*: int expr ==> 1 + 10*)Hopefully this should be sufficient to give the intuition for the standard technique of encoding type-safe DSLs using GADTs, and hint at the plan of providing a type-safe API around SQL queries: we define a GADT-based encoding of the expressions and statements of SQL and then provide helper functions to construct terms in this language. By using the type parameters of the GADT to encode the well formedness of our terms, we can then use the host language's type system to directly check our DSL. Furthermore, as our SQL expressions are simply first-class values in OCaml, they can easily be passed around, manipulated and abstracted over to simplify writing queries and handling meta-operations such as migrations.Beyond just expressions, the full embedding has to also provide GADT types to encode table schemas and SQL statements (SELECT, INSERT etc.). We will skip over the encoding of tables as it is fairly mundane, however encoding queries requires some nuance:type(_, _) query= |SELECT: { exprs: 'a expr_list; where: bool expr option; ... } -> ('a, [>`SELECT]) query |DELETE: { ... } -> (unit, [>`DELETE]) query |UPDATE: { ... } -> (unit, [>`UPDATE]) query |INSERT: { ... } -> (unit, [>`INSERT]) queryHere, the trick we use is to include an additional parameter to the query type(_, _) querythat statically encodes additional information about the query: the first parameter of the query represents the return type of the query, and the second parameter is a polymorphic variant which captures which particular type of query is being executed in the statement (either`SELECT,`INSERT,`UPDATEor`DELETE).By then constraining the types of our functions over this second parameter, we can then enforce additional well-formedness constraints on our typed DSL:valwhere: boolExpr.t -> ('c, [<`DELETE|`SELECT|`UPDATE]as'a) t -> ('c, 'a) tvalorder_by: 'bExpr.t -> ('c, [<`SELECT]as'a) t -> ('c, 'a) tHere, a where query is allowed to be applied toSELECT,UPDATEorDELETEoperations, but notINSERTfollowing the standard structure of SQL queries. Similarly, theORDERBYcan only be applied toSELECToperations as expected.Putting this all together, we can use this technique to construct an typed embedded DSL with an idiomatic functional API:Query.selectExpr.[LocalUser.id;LocalUser.username;LocalUser.password;]~from:LocalUser.table|>Query.whereExpr.(likeLocalUser.username~pat:(s pattern)||likeLocalUser.display_name~pat:(s pattern))|>Query.order_byLocalUser.username(*^ type checker verifies well formedness of our syntax*)Not only do the GADT's type parameters allow us to automate the process of encoding and decoding values to and from the database, our additional well-formedness constraints statically enforce that our queries will correspond to valid SQL syntax, using type inference to operate in a way that is transparent to the user. Finally, because our encoding of queries is simply a composition of functions, our interface is no-longer explicitly tied to SQL's syntax as with our macro, and we can introduce small extensions to allow the user to specify any information that can not be automatically inferred (such as the multiplicity of queries).