
Tom Hollevoet
When designing a Drupal website one often chooses a relational database to store the data of a website. MySQL is a good example of this. But from the moment a website has a lot of data, with a large amount of fields, a relational database is no longer performant enough. In such a situation it is interesting to choose a NoSQL database, such as MongoDB. But what exactly are the big differences between a relational and a NoSQL database? We would like to explain this for you.
When installing a Drupal website, a relational database management system (RDBMS) will be chosen by default. Usually a MySQL database is chosen, but other relational databases available for Drupal are MariaDB, SQLite and PostgreSQL.
The data at one's disposal is stored in different tables. In these different tables the data is then divided over different columns.
Relationships in an RDBMS
In an RDBMS, a table may have a relationship with the second table. For example, a table 'cities' can have a reference to the table 'countries' by means of a foreign key, a key that refers to another table.
When creating fields on a content type, different tables are created in the database. For example, for the content type 'Person', with fields 'field_person_name', 'field_person_email' and 'field_person_country', three tables will be found in the MySQL-database.
In Drupal, however, it is possible to create a relation to another content type, by means of an 'Entity Reference'-field. In the example above, 'Country' is a content type and 'field_person_country' is an entity reference field.
In the backend of Drupal, the data of a certain node will be retrieved by a SELECT query with different JOINs to the different fields and tables. The more JOINs need to be applied to a query, the more time it will take to execute this query. Since a RDBMS is vertically scaled, one will have to increase the resources (CPU/RAM) of the web server to increase the speed and feasibility of these queries.
If a certain node has 500 fields, then 500 JOINs must be executed in the back-end. This can lead to performance problems. The web server must have a larger capacity to execute this query, and the execution time of this query will take more time. In such a situation, a relational database is no longer a good option. A NoSQL database is then a better choice.
With a NoSQL database one no longer uses the typical JOINs that occur with an RDBMS. The data will no longer be stored in different tables, but with the NoSQL database the data will be stored in different structures such as documents and collections.
The big advantage of NoSQL is that it can be scaled horizontally. This means that the load can be distributed across the various resources of the web server.
MongoDB will store the data in the form of a document structure. A document can be compared to one large JSON-file, where the data is stored using field names and values. In this way, a field name can be changed very quickly and, as with an RDBMS, the table structure does not need to be changed. The data is in one document and is no longer divided over different tables, which is the case with an RDBMS.
For Drupal 6, 7 and 8 there is a contrib module that offers the possibility to store Drupal data in MongoDB. The contrib module has several sub-modules to implement different functionalities of Drupal in a MongoDB database.
Because of these different submodules the functionality cache, field_storage, session, watchdog, block and/or queue can be stored in a MongoDB-database, and no longer in a RDBMS like MySQL.
They are one or more of the above functionalities that are stored in the MongoDB database. The other functionalities and (standard) tables of Drupal remain stored in the RDBMS (MySQL). This means that the RDBMS will still be used.
In this article we will take a closer look at the field_storage submodule.
In Drupal 7, the RDBMS (MySQL) contains a 'field_config' table where all field configurations (machine name, field type, cardinality...) are stored. This table also contains a storage_type/storage_module column. By default in a Drupal installation this is set to field_sql_storage, so that the data of these fields is stored in the RDBMS.
As soon as you choose to install the mongo_field_storage contrib submodule, new fields will be created with as storage_type/storage_module value mongodb_field_storage. This setting is installed by default when activating the mongo_field_storage module by equating the Drupal variable field_storage_default with mongodb_field_storage.
In the backend of the mongodb_field_storage module there are several 'hook_field_storage_X' functions that are called up by the fields that have mongodb_field_storage as storage_module/storage_type.
The most important implementations are:
Writing a value to the MongoDB database (mongodb_field_storage_write)
Retrieving a field from the MongoDB database (mongodb_field_storage_load)
Performing an Entity Field Query (mongodb_field_storage_query).
The most important change compared to MySQL is that a node will be stored as one large document and no longer spread over different tables.
The node is still stored with its Node ID and other parameters in the node table in the RDBMS. All data of the fields of the node will be stored in MongoDB. Per node, one document will be created.
The big advantage is that when retrieving a node, only one document has to be retrieved from the MongoDB database. With the standard RDBMS, five hundred different joins would effectively have to be executed for five hundred fields in order to retrieve the data from the different tables.
Both the revisions and the current data (at mongodb_field_storage) are stored in the MongoDB database. The different elements that could be found in the 'field_sql_storage' in the different columns of the table (such as language, delta...) will be stored as objects in the document.
In practice, the MongoDB Field Storage module will be used in a situation with a lot of fields/data.
There are some remarks to use this sub module.
The contrib module Views will create standard queries suitable for relational databases. There are some solutions for this:
Solution 1: Search with facets (Search API/Solr)
MongoDB is chosen for a large number of nodes/fields. In such a situation the Solr technology (Search API) is often chosen. This technology will index the nodes with the chosen fields, in order to search the data faster. Views can make use of the data indexed by Solr.
Solution 2: Not all content types 'convert' to MongoDB
As we told before, a choice can always be made not to 'convert' certain fields/content types to MongoDB, but to still use the default storage_type/storage_module as storage (field_sql_storage).
Solution 3: Using EFQ Views
Contribution module EFQ Views will use EntityFieldQuery as query back-end.
Paragraphs/Field Collections
Since MongoDB is not yet widely used (about five hundred Drupal websites), there are still some bugs in this module.
Calibrate is maintainer of the MongoDB contrib module and we have made sure that MongoDB will work with Paragraphs and Field Collections.
Directly addressing MySQL database (db_query)
There is always a chance that certain (contrib) modules will directly use a MySQL query by means of db_query. A golden tip is to always test all new functionalities when using new contrib modules. This way, any problems can be solved faster.
If you choose to use MongoDB, it is best to install and configure it at the start of your project, so that people can notice the above problems faster.