Tuesday 14 August 2012

Opa Language Tutorial: Part 3

Continuing on from part 2...

I said last time the one should program defensively and say in spec. Two things I wasn't doing so let's look at some mechanisms for doing this. We'll concentrate first on our specification for POST:

Verb/expressions/expressions/"k"
POSTadd the requested object if the supplied key in the object doesn't exist. Return a 201 success code as well as the key "k", otherwise return a 400 error.not allowed, return a 400 error

Note that we actually have two cases to cater for, one with just the path /expressions and one with /expressions/"k" where "k" is some key. Opa's pattern matching helps greatly here and makes clear the distinction between the two cases which require different kinds of processing. Let's modify our dispatch function start()'s pattern matching:

match (url)
   {
       case {path: [] ... }: hello();
       case {path: ["expressions" ] ... } : expressionsRESTendpoint();
       case {~path ...}: error();
   }

Now we just match a path that contains the single element /expressions and call a function expressionRESTendpoint(), this time without any parameters - we've captured none and ignoring everything else. As a test:

ian@U11-VirtualBox:~/opatutorial$ curl -X POST -T "regex1" http://127.0.0.1:8080/expressions
ian@U11-VirtualBox:~/opatutorial$ curl -X POST -T "regex1" http://127.0.0.1:8080/expressions/fred

The first command above matches and if we examine the output on the terminal from our executable we'll see the succeessful record added output from the debugging statements. The second command above with the longer path does not match and ends up returning the output of the error() function. Excellent this is what we want.

Expanding this more, let's write some skeleton code for the case where we do want to catch a key. Modify the start() function's match-case statements to:

match (url) {
       case {path: [] ... }: hello();
       case {path: ["expressions" ] ... } : expressionsRESTendpoint();
       case {path: ["expressions" | [ key | _ ] ] ...} : expressionWithKeyRESTendpoint(key);
       case {~path ...}: error();
   }

and add the skeleton code for our new handler function:

function expressionWithKeyRESTendpoint(key) {
    Debug.warning("expression with key rest endpoint {key}");
    Resource.raw_status({bad_request});
}

It is worth now explaining a little about lists and how functional programming languages present them:
  • [] is an empty list
  • [ 1 ] is a list containing a single element
  • [ 1, 2, 3 ] is a list containing 3 elements
However, internally lists are recursive structures and are usually treated such that we have a head element and tail elements (If you've programmed in List, ML, Haskell etc then this already familiar): The list [ 1, 2, 3 ] is actually [ 1 | [2 | [3] ]]

Lists always contain a head and a tail. Given the list [1,2,3] the head is "1" and the tail is the list [2,3]. What is the head of the tail of the list [1,2,3]? "2", because the tail of [1,2,3] is [2,3] and the head of [2,3] is "2". This practice is a fantastically powerful way of thinking about, constructing and working with lists. I recommend a good book about functional programming [2]  (or even the one I contributed to [1]  <- that's a citation, not a list :-)

So what does that pattern we wrote mean?
  • Match against a list that, firstly has the head "expressions" and a tail. Note how this is already different from the earlier case where we just matched if the list had a head "expressions". 
  • The tail of the list must have a head which we bind to the variable "key"...
  • ...and may have a tail, which we ignore with the underscore "_" operator.
This still isn't precisely to specification as we shall see, but for the moment it works quite well and if we test it (test often!) as before, ie:

ian@U11-VirtualBox:~/opatutorial$ curl -X POST -T "regex1" http://127.0.0.1:8080/expressions
ian@U11-VirtualBox:~/opatutorial$ curl -X POST -T "regex1" http://127.0.0.1:8080/expressions/fred

and refer to the debug output. The first command inserts a record into our database and the latter two call our new handler function. Note the debug output for this latter two:

[Opa] Server dispatch Decoded URL to /expressions/fred
[Opa] Debug expression with key rest endpoint fred
[Opa] Server dispatch Decoded URL to /expressions/fred/bloggs
[Opa] Debug expression with key rest endpoint fred

The dispatcher is decoding the whole URL, the debug statement however is returning only the second value and nothing more as described in our pattern matching statement. Returning to why this isn't to spec; we should probably return an error if the path is too long - we haven't specified what happens in this case, again, defensive programming which I'm going to ignore for the moment as the above works just fine. Personally, I'd not deploy to production (or even beta) this until this is fixed.

So that tidies up the handling of the /expressions cases and Opa's pattern matching handles the URL/URI quite naturally. So onto the next part which is some better error handling.

What we'll do here is add a few functions to report back better error messages using JSON (we made this architectural choice earlier) and look a little at records, strong typing and JSON serialisation at the same time.

The two functions for error reporting are quite simple:

function messageSuccess(m,c){
    Resource.raw_response(
      OpaSerialize.serialize({ success:m }),"application/json", c )
}

function messageError(m,c){
    Resource.raw_response(
      OpaSerialize.serialize({ error:m }),"application/json", c )
}


Both functions take a string and an HTTP error code as parameters. Opa infers the types of these based upon how these are used - in this case in the function Resource.raw_response. Working backwards, the last parameter "c" is the http response code, which despite the naming of the functions can be any valid http response code. We could add some code to check whether the usage of the function is semantically correct based on the natural language definition of "error" or "success" but that's probably overkill somewhat (at least here anyway). The second parameter is a string which contains a description of the mimetype of the response - this could be anything but being well behaved we'll write "application/json".

The first parameter is interesting in that we require a string for the body of the response. We write:

OpaSerialize.serialize({ error:m })

which firstly generates an Opa record with a single field "error" and value, whatever was placed in the parameter "m". Actually the type of m could be anything that is valid as the type of a value of a record with a constrain as we shall see.

To call these new functions we'll update our expressionsPost() function to call these as necessary:

function expressionsPost(){
  match(HttpRequest.get_body()){
  case{some: body}:
    match(Json.deserialize(body)){
       case{some: jsonobject}:
          match(OpaSerialize.Json.unserialize_unsorted(jsonobject)){
             case{some: regexExpression e}:
                /regexDB/expressions[e.exprID] <- e;
                messageSuccess("{e.exprID}",{created});
             default:
                messageError("Missing or malformed fields",{bad_request});
          }
       default:
          messageError("Failed to deserialised the JSON",{bad_request});
    }
  default:
     messageError("Missing body",{bad_request});
  }
}

I'll admit this is a bit of a nightmare to read, but the basic structure is simply:
  1. Is there a body present in the request, if so...
  2. Attempt to deserialise the body into JSON, and if this works:
  3. Attempt to map this into an Opa record of type regexExpression
Aside: There is a much more elegant way of writing this - at least to some, I'll write about that in a later edition.

Aside: I corrected the above code slightly...silly error on my part which only showed itself at runtime in some obscure situations...that's what exhaustive testing is for.

This record is passed into the function OpaSerialize.serialize which takes a structure, such as a populated record and serialises this as JSON. If test the code now we see a response (certain fields removed for readability):

$ curl -i -X POST -T "jsonfiles/regex1" --noproxy 127.0.0.1 http://127.0.0.1:8080/expressions
HTTP/1.1 100 Continue

HTTP/1.1 201 Created
Content-Type: application/json; charset=utf-8

{"success":"abc"}


Note the perfectly formed JSON body, mimetype and response code.

Finally we'll write the code to process the get statements. We modify the code in expressionsRESTendpoint() as such (and also call our new standardised error handling functions):

function expressionsRESTendpoint(){
   match(HttpRequest.get_method())   {
      case{some: method}:
         match(method)         {
             case{get}:
                expressionsGet();
             case{post}:
                expressionsPost();
             case{put}:
                messageError("PUT method not allowed without a key",{method_not_allowed});
             case{delete}:
                messageError("DELETE method not allowed without a key",{method_not_allowed});
             default:
                messageError("Given REST Method not allowed with expressions",                           {method_not_allowed});                  }
      default:
          Resource.raw_status({bad_request});
   }
}


The first thing our new function has to do is query the database for its entries and then return these as a list inside a JSON object. We actually designed the database such that it already contains the keys as strings and if we recall how we entered records into the database, the exprID field was also used as the key. So we need to return a list of exprID fields from the database as a JSON object:

function expressionsGet() {
   collection = List.map(
                     function(i) { i.exprID },
                     StringMap.To.val_list(/regexDB/expressions)
                     );
   Resource.raw_response(
      OpaSerialize.serialize({expressions:collection}),
      "application/json",
     {success}
     )
}


/regexDB/expressions returns the whole database (there are optimisations for this kind of operation...you don't want to return multi gigabytes of data if you can help it) and we use the higher-order map function over the database to extract the exprID field of each record.

To make life simpler we map our hashtable structure to a list of values. The function StringMap.To.val_list performs this for us.

For each entry in that list map applies an anonymous function which takes a parameter "i" of type regexExpression and returns the exprID field. How do we know the typing of this?

We stated earlier that /regexDB/expressions is a hashtable of regexExpressions (type regexExpression), we extract from this just the values in the hashtable ignoring the keys and map extracts each entry from this list of regexExpressions, ie: individual records of regexExpression type and applies a function which takes a record of type regexExpression and extracts the exprID field.

The Resource.raw_response function performs the serialisation of the record in a similar manner as made in the two error and success functions described earlier.

Aside: There's actually a nice consistency check or invariant there to make sure that all keys actually match the record being addressed by that key. I'll leave that as an exercise to reader on how to code such an invariant or check.

We can write this whole function a little more in the functional style as well by removing the local variable collection - actually a good compiler should optimise this out under suitable circumstances:

function expressionsGet()
{
   Resource.raw_response(
      OpaSerialize.serialize({expressions:
                              List.map(
                                function(i) { i.exprID },
                                StringMap.To.val_list(/regexDB/expressions) 
                                      )
                             }),
      "application/json",
     {success}  
   )
}

and if we test this (omitting some details from the response) we get a JSON object with a list of expression identifiers from our database:

$ curl -i -X GET  http://127.0.0.1:8080/expressions
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-
{"expressions":["abc","zyx"]}

So that concludes the first part of the application - we've demonstrated JSON serialisation, the first set of simple POST and GET cases, handling errors and simple working with the database. In the next parts we'll develop the cases where we work with the keys and accessing specific entries in the database.

For now, there is one function we haven't mentioned to make the above complete and that's how the handle the expressions with keys, simply return an error for the moment:

function expressionWithKeyRESTendpoint(key) {
    messageError("Not implemented yet",{bad_request});
}


Now while writing this I discovered a bug or two in Opa and also had some ideas about how something should work - or at least how some of this gets used in a production environment. My plea here is that with any project of this nature - Open Source projects in general - ALWAYS reports bugs and if you have some good ideas then contribute them back. That way we make the community stronger, the developers of these various open source projects get a better idea of how people are using their products and also the reassurance that these are actually being used. Which in turn leads to better software which makes us more productive.

See you in part 4...


References

[1] Richard Bosworth (1995) A Practical Course in Functional Programming Using Standard ML. Mcgraw-Hill
[2] Bruce Maclennan (1990) Functional Programming: Practice and Theory. Addison-Wesley

No comments: