CouchDB: Design documents
This article is part of our Academy Course titled CouchDB – Database for the Web.
This is a hands-on course on CouchDB. You will learn how to install and configure CouchDB and how to perform common operations with it. Additionally, you will build an example application from scratch and then finish the course with more advanced topics like scaling, replication and load balancing. Check it out here!
Table Of Contents
1. Introduction
Design documents are a special type of CouchDB document that contains application code. As it runs inside a database, the application API is highly structured. In this article, we’ll take a look at the function APIs, and talk about how functions in a design document are related within applications.
2. Show
CouchDB is a document database which one would call key/value store. It allows for storage of JSON documents that are uniquely identified by keys.
CouchDB is build on the web and for the web. Besides the JSON storage structure and its innate ability to scale horizontally, the CoucbDB creators have build some pretty awesome features that make it very appealing for a particular type of an application. The task is to decide whether the application you’re building is that application.
CouchDB exposes a RESTful API, so it is rather easy to use it from any language which supports HTTP. Most popular languages have abstraction libraries on top of that, to abstract away the HTTP layer.
Here is a list of available clients: http://wiki.apache.org/couchdb/Basics.
For our purposes we’re going to use curl, a command line utility which allows us to make HTTP requests. So let’s see how we can easily accomplish this with CouchDB.
Now that we have installed CouchDB and it is successfully running, let’s create a database and insert some sample data.
curl -X PUT http://localhost:5984/sample_db
The line above create a database called sample_db. If the command is successful, we will see the following output:
{“ok”:true}
Now lets add three files to this database.
curl -X PUT -d @rec1.json http://localhost:5984/sample_db/record1 curl -X PUT -d @rec2.json http://localhost:5984/sample_db/record2 curl -X PUT -d @rec3.json http://localhost:5984/sample_db/record3
Again, each command should yield a JSON response with “ok” set to true if the addition succeeded. Here is what one would expect from the first command:
{"ok":true,"id":"record1"?"rev":"1-7c15e9df17499c994439b5e3ab1951d2"?}
Again, ok is set to true making this a success response. The id field is set to the name of the record which we created. You can see that names are set through the URL as they are just resources in the world of REST. The rev field displays the revision of this document. CouchDB’s concurrency model is based on MVCC, though it versions the documents as it updates them, so each document modification gets it’s unique revision id.
Below are the JSon Files:
rec1.json
{ "name": "John Doe", "date": "2001-01-03T15:14:00-06:00", "children": [ {"name": "Brian Doe", "age": 8, "gender": "Male"}, {"name": "Katie Doe", "age": 15, "gender": "Female"} ] }
rec2.json
{ "name": "Ilya Sterin", "date": "2001-01-03T15:14:00-06:00", "children": [ {"name": "Elijah Sterin", "age": 10, "gender": "Male"} ] }
rec3.json
{ "name": "Emily Smith", "date": "2001-01-03T15:14:00-06:00", "children": [ {"name": "Mason Smith", "age": 3, "gender": "Male"}, {"name": "Donald Smith", "age": 2, "gender": "Male"} ] }
CouchDB supports views. They are used to query and report on the data stored in the database. Views can be permanent, meaning they are stored in CouchDB as named queries and are accessed through their name. Views can also be temporary, meaning they are executed and discarded.
CouchDB computes and stores view indexes, so view operations are very efficient and can span across remote nodes. Views are written as map/reduce operations, though they land themselves well for distribution.
3. Shows & lists
There are two really cool features, which allow for more effective data filtering and transformation. These features are shows and lists. The purpose of shows and lists is to render a JSON document in a different format. Shows allow to transform a single document into another format. A show is similar to a view function, but it takes two parameters function (doc, req), doc is the document instance being iterated and request is an abstraction over CouchDB request object.
Here is a simple show function
function(doc, req) { var person = <person />; person.@name = doc.name; person.@joined = doc.date; person.children = <children />; if (doc.children) { for each (var chldInst in doc.children) { var child = <child />; child.text()[0] = chldInst.name; child.@age = chldInst.age; child.@gender = chldInst.gender; person.children.appendChild(child); } } return { 'body': person.toXMLString(), 'headers': { 'Content-Type': 'application/xml' } } }
This show function takes a particular JSON record and turns it into XML. Creating a show is pretty simple, you just encapsulate the function above into a design document and create the record through PUT.
Here is the design document for the show above, xml_show.json:
{ "shows": { "toxml": "Here you inline the show function above. Make sure all double quotes are escaped..." } }
Once we have the design document, we can create it‚
curl -X PUT -H "Content-Type: application/json" -d '@xml_show.json' http://localhost:5984/sample_db/_design/shows
Note: In (‚./_design/shows), shows is just a name of the design document, we can call it.
Now let’s invoke the show‚ as follows:
curl -X GET ‚http://localhost:5984/sample_db/_design/shows/_show/toxml/record1
Here is the output:
<person name="John Doe" joined="2001-01-03T15:14:00-06:00"> <children> <child age="8" gender="Male">Brian Doe</child> <child age="15" gender="undefined">Katie Doe</child> </children> </person>
So, how would I transform a record collection or view results into a different format? Well, this is where lists come in. Lists are similar to shows, but they are applied to the results of an already present view. Here is a sample list function.
function(head, req) { start({'headers': {'Content-Type': 'application/xml'}}); var people = <people/>; var row; while (row = getRow()) { var doc = row.value; var person = <person />; person.@name = doc.name; person.@joined = doc.date; person.children = <children />; if (doc.children) { for each (var chldInst in doc.children) { var child = <child />; child.text()[0] = chldInst.name; child.@age = chldInst.age; child.@gender = chldInst.gender; person.children.appendChild(child); } } people.appendChild(person); } send(people.toXMLString()); }
Again, we can encapsulate this list function into a design document, along with a simple view function:
xml_list.json { "views": { "all": { "map": "function(doc) { emit(null, doc); }" } }, "lists": { "toxml": "Here you inline the show function above. Make sure all double quotes are escaped as it must be stringified due to the fact that JSON can't store a function type." } }
Now, we create the design document:
curl -X PUT -H ‚Content-Type: application/json" -d @xml_list.json ‚http://localhost:5984/sample_db/_design/lists
Once the design document is created, we can request our xml document listing all person records:
curl -X GET http://localhost:5984/sample_db/_design/lists/_list/toxml/all
And the output is:
<people> <person name="John Doe" joined="2001-01-03T15:14:00-06:00"> <children> <child age="8" gender="Male">Brian Doe</child> <child age="15" gender="Female">Katie Doe</child> </children> </person> <person name="Ilya Sterin" joined="2001-01-03T15:14:00-06:00"> <children> <child age="10" gender="Male">Elijah Sterin</child> </children> </person> <person name="Emily Smith" joined="2001-01-03T15:14:00-06:00"> <children> <child age="3" gender="Male">Mason Smith</child> <child age="2" gender="Male">Donald Smith</child> </children> </person> </people>
With this example we can see how shows and lists are really useful and provide a convenient way to transform views into different formats.
4. Map Reduce
For experienced relational database programmers, MapReduce can take some time getting used to. Rather than declaring which rows from which tables to include in a result set and depending on the database to determine the most efficient way to run the query, reduce queries are based on simple range requests against the indexes generated by your map functions.
Map functions are called once, with each document as the argument. The function can choose to skip the document altogether or emit one or more view rows as key/value pairs. Map functions may not depend on any information outside of the document. This independence is what allows CouchDB views to be generated incrementally and in parallel.
CouchDB views are stored as rows that are kept sorted by key. This makes retrieving data from a range of keys efficient even when there are thousands or millions of rows. When writing CouchDB map functions, our primary goal is to build an index that stores related data under nearby keys.
Before we can run an example MapReduce view, we’ll need some data to run it on. We will create documents carrying the price of various supermarket items as found at different stores. Let’s create documents for apples, oranges, and bananas. (Allow CouchDB to generate the _id and _rev fields.) Use Futon to create documents that have a final JSON structure that looks like this:
{ "_id" : "bc2a41170621c326ec68382f846d5764", "_rev" : "2612672603", "item" : "apple", "prices" : { "Fresh Mart" : 1.59, "Price Max" : 5.99, "Apples Express" : 0.79 } }
Let’s create the document for oranges:
{ "_id" : "bc2a41170621c326ec68382f846d5764", "_rev" : "2612672603", "item" : "orange", "prices" : { "Fresh Mart" : 1.99, "Price Max" : 3.19, "Citrus Circus" : 1.09 } }
And finally, the document for bananas:
{ "_id" : "bc2a41170621c326ec68382f846d5764", "_rev" : "2612672603", "item" : "banana", "prices" : { "Fresh Mart" : 1.99, "Price Max" : 0.79, "Banana Montana" : 4.22 } }
Imagine we’re catering a big luncheon, but the client is very price-sensitive. To find the lowest prices, we’re going to create our first view, which shows each fruit sorted by price.
Edit the map function, on the left, so that it looks like the following:
function(doc) { var store, price, value; if (doc.item && doc.prices) { for (store in doc.prices) { price = doc.prices[store]; value = [doc.item, store]; emit(price, value); } } }
This is a Javascript function that CouchDB runs for each of our documents as it computes the view. We’ll leave the reduce function blank for the time being.
Click “Run” and we should see result rows with the various items sorted by price. This map function could be even more useful if it grouped the items by type so that all the prices for bananas were next to each other in the result set. CouchDB’s key sorting system allows any valid JSON object as a key. In this case, we’ll emit an array of [item, price] so that CouchDB groups by item type and price.
Let’s modify the view function so that it looks like this:
function(doc) { var store, price, key; if (doc.item && doc.prices) { for (store in doc.prices) { price = doc.prices[store]; key = [doc.item, price]; emit(key, store); } } }
Here, we first check that the document has the fields we want to use. CouchDB recovers gracefully from a few isolated map function failures, but when a map function fails regularly (due to a missing required field or other Javascript exception), CouchDB shuts off its indexing to prevent any further resource usage. For this reason, it’s important to check for the existence of any fields before you use them.
Once we know we’ve got a document with an item type and some prices, we iterate over the item’s prices and emit key/values pairs. The key is an array of the item and the price, and forms the basis for CouchDB’s sorted index. In this case, the value is the name of the store where the item can be found for the listed price.
View rows are sorted by their keys‚ in this example, first by item, then by price. This method of complex sorting is at the heart of creating useful indexes with CouchDB.
5. Validation
CouchDB uses the validate_doc_update function to prevent invalid or unauthorized document updates from proceeding. We use it in the example application to ensure that blog posts can be authored only by logged-in users. CouchDB’s validation functions—like map and reduce functions—can’t have any side effects; they run in isolation of a request. They have the opportunity to block not only end-user document saves, but also replicated documents from other CouchDBs.
5.1 Document Validation Functions
To ensure that users may save only documents that provide these fields, we can validate their input by adding another member to the _design/ document: the validate_doc_update function. CouchDB sends functions and documents to a JavaScript interpreter. This mechanism is what allows us to write our document validation functions in JavaScript. The validate_doc_update function gets executed for each document you want to create or update. If the validation function raises an exception, the update is denied; when it doesn’t, the updates are accepted.
Document validation is optional. If we don’t create a validation function, no checking is done and documents with any content or structure can be written into your CouchDB database. If we have multiple design documents, each with a validate_doc_update function, all of those functions are called upon each incoming write request. Only if all of them pass the validation, does the write succeed. The order of the validation execution is not defined. Each validation function must act on its own.
Validation functions can cancel document updates by throwing errors. To throw an error in such a way that the user will be asked to authenticate, before retrying the request, we will use JavaScript code like the following:
throw({unauthorized : message});
When we are trying to prevent an authorized user from saving invalid data, we will use :
throw({forbidden : message});
This function throws forbidden errors when a post does not contain the necessary fields. In places it uses a validate() helper to clean up the JavaScript. We also use simple JavaScript conditionals to ensure that the doc._id is set to be the same as doc.slug for the sake of pretty URLs.
If no exceptions are thrown, CouchDB expects the incoming document to be valid and will write it to the database. By using JavaScript to validate JSON documents, we can deal with any structure a document might have. Validation can also be a valuable form of documentation.
5.2 Validation’s Context
Before we delve into the details of our validation function, let’s talk about the context in which they run and the effects they can have.
Validation functions are stored in design documents under the validate_doc_update field. There is only one per design document, but there can be many design documents in a database. In order for a document to be saved, it must pass validations on all design documents in the database (the order in which multiple validations are executed is left undefined).
5.3 Writing One
The function declaration is simple. It takes three arguments: the proposed document update, the current version of the document on disk, and an object corresponding to the user initiating the request.
function(newDoc, oldDoc, userCtx) {}
Above is the simplest possible validation function, which would allow all updates regardless of content or user roles. The converse, which never lets anyone do anything, looks like this:
function(newDoc, oldDoc, userCtx) { throw({forbidden : 'no way'}); }
Note that if we install this function in database, we won’t be able to perform any other document operations until you remove it from the design document or delete the design document. Admins can create and delete design documents despite the existence of this extreme validation function.
We can see from these examples that the return value of the function is ignored. Validation functions prevent document updates by raising errors. When the validation function passes without raising errors, the update is allowed to proceed.
5.4 Type
The most basic use of validation functions is to ensure that documents are properly formed to fit application’s expectations. Without validation, we need to check for the existence of all fields on a document that MapReduce or user-interface code needs to function. With validation, we know that any saved documents meet whatever criteria we require.
A common pattern in most languages, frameworks, and databases is using types to distinguish between subsets of our data.
CouchDB itself has no notion of types, but they are a convenient shorthand for use in application code, including MapReduce views, display logic, and user interface code. The convention is to use a field called type to store document types, but many frameworks use other fields, as CouchDB itself doesn’t care which field we use.
Here’s an example validation function that runs only on posts:
function(newDoc, oldDoc, userCtx) { if (newDoc.type == "post") { // validation logic goes here } }
Since CouchDB stores only one validation function per design document, you’ll end up validating multiple types in one function, so the overall structure becomes something like:
function(newDoc, oldDoc, userCtx) { if (newDoc.type == "post") { // validation logic for posts } if (newDoc.type == "comment") { // validation logic for comments } if (newDoc.type == "unicorn") { // validation logic for unicorns } }
It bears repeating that type is a completely optional field. We present it here as a helpful technique for managing validations in CouchDB, but there are other ways to write validation functions.
Here’s an example :
function(newDoc, oldDoc, userCtx) { if (newDoc.title && newDoc.body) { // validate that the document has an author } }
This validation function ignores the type attribute altogether and instead makes the somewhat simpler requirement that any document with both a title and a body must have an author. For some applications, typeless validations are simpler. For others, it can be a pain to keep track of which sets of fields are dependent on one another.
In practice, many applications end up using a mix of typed and untyped validations. We don’t care what sort of document we’re validating. If the document has a created_at field, we ensure that the field is a properly formed timestamp. Similarly, when we validate the author of a document, we don’t care what type of document it is; we just ensure that the author matches the user who saved the document.
5.5 Required Fields
The most fundamental validation is ensuring that particular fields are available on a document. The proper use of required fields can make writing MapReduce views much simpler, as we don’t have to test for all the properties before using them—we know all documents will be well-formed.
Required fields also make display logic much simpler. If we know for certain that all documents will have a field, we can avoid lengthy conditional statements to render the display differently depending on document structure.
If a design document requires a different set of fields on posts and comments. Here’s a subset of the validation function:
function(newDoc, oldDoc, userCtx) { function require(field, message) { message = message || "Document must have a " + field; if (!newDoc[field]) throw({forbidden : message}); }; if (newDoc.type == "post") { require("title"); require("created_at"); require("body"); require("author"); } if (newDoc.type == "comment") { require("name"); require("created_at"); require("comment", "You may not leave an empty comment"); } }
This is our first look at actual validation logic. We can see that the actual error throwing code has been wrapped in a helper function. Helpers like the require function just shown go a long way toward making your code clean and readable. The require function is simple. It takes a field name and an optional message, and it ensures that the field is not empty or blank.
Once we’ve declared our helper function, we can simply use it in a type-specific way. Posts require a title, a timestamp, a body, and an author. Comments require a name, a timestamp, and the comment itself. If we wanted to require that every single document contained a created_at field, we could move that declaration outside of any type conditional logic.
5.6 Timestamps
Timestamps are an interesting problem in validation functions. Because validation functions are run at replication time as well as during normal client access, we can not have as a requirement the timestamps to be set close to the server’s system time. We require two things: that timestamps do not change after they are initially set, and that they are well formed.
First, let’s look at a validation helper that does not allow fields, once set, to be changed on subsequent updates:
function(newDoc, oldDoc, userCtx) { function unchanged(field) { if (oldDoc && toJSON(oldDoc[field]) != toJSON(newDoc[field])) throw({forbidden : "Field can't be changed: " + field}); } unchanged("created_at"); }
The unchanged helper is more complex than the require helper. The first line of the function prevents it from running on initial updates. The unchanged helper doesn’t care at all what goes into a field the first time it is saved. However, if there exists an already-saved version of the document, the unchanged helper requires that fields it is used on are the same between the new and the old version of the document.
JavaScript’s equality test is not well suited to working with deeply nested objects. We use CouchDB’s JavaScript runtime’s built-in toJSON function in our equality test, which is better than testing for raw equality.
js> [] == [] false
JavaScript considers these arrays to be different because it doesn’t look at the contents of the array when making the decision. Since they are distinct objects, JavaScript must consider them not equal. We use the toJSON function to convert objects to a string representation, which makes comparisons more likely to succeed in the case where two objects have the same contents. This is not guaranteed to work for deeply nested objects, as toJSON may serialize objects in an undefined order.