Query Languages for Data

Dated May 29, 2023; last modified on Mon, 29 May 2023

Query Languages for Data

If you have a list of animal species and you want to return only the sharks in the list, a relational algebra expression would be \( \text{sharks} = \sigma_{\text{family = “Sharks”}}(\text{animals}) \). SQL queries follow the structure of relational algebra closely:

SELECT * FROM animals WHERE family = 'Sharks';

… while an imperative query would be:

function getSharks() {
  let sharks = [];
  for (let i = 0; i < animals.length; i++) {
    if (animals[i].family === "Sharks") {
      sharks.push(animals[i]);
    }
  }
  return sharks;
}

To be fair though, JS’s Array.prototype.filter supports more succinct code:

let sharks = animals.filter(animal => animal.family === "Sharks");

An imperative language tells the computer to perform certain operations in a certain order. A declarative language specifies the pattern of the data you want, but not how to achieve that goal.

Ah, so that’s what animals.filter(animal => animal.family === "Sharks") is!

Advantages of declarative query languages:

  • Typically more concise and easier to work with.
  • Hides implementation details, and can therefore freely benefit from improvements in the query optimizer. An imperative query would need to be rewritten to take advantage of newer and better APIs and/or data layout.
  • Lend themselves to parallelization because they do not specify an algorithm.

MapReduce Querying

MapReduce is somewhere in between a declarative query language and a fully imperative query API. Suppose you want to generate a report on how many sharks sighted per month. In PostgreSQL, you’d have:

SELECT date_trunc('month', observation_timestamp) AS observation_month,
       sum(num_animals) AS total_animals
FROM observations
WHERE family = 'Sharks'
GROUP BY observation_month;

… but the same can be expressed with MongoDB’s MapReduce feature as:

db.observations.mapReduce(
  // Called once for every document that matches `query`, with `this` set to the
  // document object.
  function map() {
    let year = this.observationTimestamp.getFullYear();
    let month = this.observationTimestamp.getMonth() + 1;
    emit(`${year}-${month}`, this.numAnimals);
  },
  // The key-value pairs emitted by map are grouped by key, such that `reduce`
  // is called once for every `key`, e.g., `reduce("2023-2", [23, 1, 43])`.
  function reduce(key: string, values: numbers[]) {
    return Array.sum(values);
  },
  {
    // A MongoDB-specific extension to MapReduce that allows declarative filters
    query: { family: "Sharks" },
    // The final output is written to the collection `monthlySharkReport`.
    out: "monthlySharkReport"
  }
);

map and reduce must be pure functions (only depend on input params and have no side effects). As such, the DB can run them anywhere in any order, and rerun them on failure. That said, they are powerful in that they can parse strings, call library functions, perform calculations, etc., and therefore convenient for advanced queries.

Enforcing that map and reduce be pure functions seems hard to do in JS. For example, has a “here be dragons” warning for impure implementations of the filter function.

However, one has to write two coordinated functions, and the imperativeness limits the opportunities for a query optimizer to improve the performance of a query. MongoDB 2.2 thus added a declarative query language called the aggregation pipeline:

db.observations.aggregate([
  { $match: { family: "Sharks" } },
  { $group: {
      _id: {
        year:  { $year:  "$observationTimestamp" },
        month: { $month: "$observationTimestamp" }
      },
      totalAnimals: { $sum: "$numAnimals" }
  } }
]);

MongoDB 5.0 deprecated mapReduce in favor of the aggregation pipeline, citing better performance and usability.

This lends credence to ’s quip that a NoSQL system may find itself accidentally reinventing SQL, albeit in disguise.

References

  1. Designing Data-Intensive Applications: The big ideas behind reliable, scalable, and maintainable systems. Chapter 2: Data Models and Query Languages. Kleppmann, Martin. Mar 16, 2017.
  2. Array.prototype.filter() - JavaScript | MDN. developer.mozilla.org . Accessed May 29, 2023.
  3. Map-Reduce — MongoDB Manual. www.mongodb.com . Accessed May 29, 2023.