Remote-Url: https://gopiandcode.uk/logs/log-ways-of-sql-in-ocaml.html
Retrieved-at: 2023-05-22 07:45:31.980109+00:00

Recall that we came to our macro-based solution because we wanted to
do custom compile-time checks on our queries, incurring a large
engineering cost to write our own custom validation and type inference logic.
In language with a less expressive type system this really would be
the only option, but, as it turns out, the features provided by
functional programming languages such as OCaml or Haskell make it easy
to encode such custom validation logic directly within the programming
language itself.The mechanism that we will use to enable this is Generalised Algebraic
Datatypes or GADTs.  Many guides to GADTs can be found online (see theOCaml manualorReal World OCamlas two popular examples) — a proper
introduction to the concept is outside the scope of this post. The
rest of this section will continue assuming that the reader has at
least a basic understanding of the concept, and illustrate a typed SQL
DSL encoding in OCaml.At a high level, GADTs extend the type system to allow constraining
the type parameters of a datatype by its constructors:type'a ty= 
  |INTEGER: int ty
  |REAL: float ty
  |TEXT: string ty
  |BOOLEAN: bool ty
  ...Here, thistytype encodes a number of
common SQL types and uses the type parameter to encode their
corresponding OCaml representation — for example, integers in SQL,
are encoded in this type using the constructorINTEGER.  The type of this constructor is explicitly
constrained to beint tyreflecting the fact
that SQL integers are represented in OCaml byints.For the simple type above, it can be hard to see the benefits of
GADTs, which really only kick in when you have more complex
data-types.  One standard examples of such a GADTs is an encoding a
typed expression language:type'a expr=
  |ADD: int expr * int expr -> int expr
  |SUB: int expr * int expr -> int expr
  |AND: bool expr * bool expr -> bool expr
  |OR: bool expr * bool expr -> bool expr
  |CONST: 'a * 'a ty -> 'a expr
  ...Here, we have a type'a exprwhich represents
a well typed expression with type'a. The
"well-typedness" of our expressions is actually enforced by types of
each constructor — for example, an additionADD(x,y)of typeint expr,
i.e. returning an integer, is itself declared to only take
sub-expressions that are of typeint expr.While working with constructors directly may seem a little
heavyweight, we can introduce some simple functions over them to
produce a fairly ergonomic interface that looks almost exactly like
vanilla OCaml code:letivl=CONST(vl,INTEGER)let(+)xy=ADD(x,y)

i 1 + i 10(*: int expr ==> 1 + 10*)Hopefully this should be sufficient to give the intuition for the
standard technique of encoding type-safe DSLs using GADTs, and hint at
the plan of providing a type-safe API around SQL queries: we define a
GADT-based encoding of the expressions and statements of SQL and then
provide helper functions to construct terms in this language. By using
the type parameters of the GADT to encode the well formedness of our
terms, we can then use the host language's type system to directly
check our DSL. Furthermore, as our SQL expressions are simply
first-class values in OCaml, they can easily be passed around,
manipulated and abstracted over to simplify writing queries and
handling meta-operations such as migrations.Beyond just expressions, the full embedding has to also provide GADT
types to encode table schemas and SQL statements (SELECT, INSERT
etc.). We will skip over the encoding of tables as it is fairly
mundane, however encoding queries requires some nuance:type(_, _) query=
    |SELECT: {
      exprs: 'a expr_list;
      where: bool expr option;
      ...
     } -> ('a, [>`SELECT]) query
   |DELETE: { ... } -> (unit, [>`DELETE]) query
   |UPDATE: { ... } -> (unit, [>`UPDATE]) query
   |INSERT: { ... } -> (unit, [>`INSERT]) queryHere, the trick we use is to include an additional parameter to the
query type(_, _) querythat statically
encodes additional information about the query: the first parameter of
the query represents the return type of the query, and the second
parameter is a polymorphic variant which captures which particular
type of query is being executed in the statement (either`SELECT,`INSERT,`UPDATEor`DELETE).By then constraining the types of our functions over this second
parameter, we can then enforce additional well-formedness constraints
on our typed DSL:valwhere: boolExpr.t -> 
   ('c, [<`DELETE|`SELECT|`UPDATE]as'a) t -> ('c, 'a) tvalorder_by: 'bExpr.t -> 
   ('c, [<`SELECT]as'a) t -> ('c, 'a) tHere, a where query is allowed to be applied toSELECT,UPDATEorDELETEoperations, but notINSERTfollowing the standard structure of SQL queries. Similarly, theORDERBYcan only be applied toSELECToperations as expected.Putting this all together, we can use this technique to construct an
typed embedded DSL with an idiomatic functional API:Query.selectExpr.[LocalUser.id;LocalUser.username;LocalUser.password;]~from:LocalUser.table|>Query.whereExpr.(likeLocalUser.username~pat:(s pattern)||likeLocalUser.display_name~pat:(s pattern))|>Query.order_byLocalUser.username(*^ type checker verifies well formedness of our syntax*)Not only do the GADT's type parameters allow us to automate the
process of encoding and decoding values to and from the database, our
additional well-formedness constraints statically enforce that our
queries will correspond to valid SQL syntax, using type inference to
operate in a way that is transparent to the user. Finally, because our
encoding of queries is simply a composition of functions, our
interface is no-longer explicitly tied to SQL's syntax as with our
macro, and we can introduce small extensions to allow the user to
specify any information that can not be automatically inferred (such
as the multiplicity of queries).