Drupal 8 as a Static Site: Search Architecture and Drupal Configuration

Submitted by nigel on Thursday 15th November 2018

As we've already discussed in earlier blogs, any successful odyssey towards a static version of the Badzilla blog using the Drupal module Tome will depend upon having a good replacement search facility in place. The built-in Drupal search won't work since there will be no back-end to process the search request. There are many alternative ways of achieving a viable search solution on a static site, and I've listed a few in my introductory page Drupal 8 as a Static Site. Ideally I would like a solution that will work on both my dynamic, 'Drupal' version of the site (i.e. where I create my content) and the Tome generated static version of the site (i.e. the version viewable by the public). 

To my mind, the obvious answer is Elasticsearch combined with a JavaScript client. I would have opted for Google Site Search but it was closed down in April 2018 and replaced with Custom Search Engine (CSE). CSE offers similar capabilities to Site Search but carries ads. That automatically deselects itself from consideration. 

Elasticsearch Architecture

My proposed architecture is simplicity itself. I have a local sandboxed Virtual Machine with my development environment installed on it. For reference, as well as blatant self-promotion, I use the BadzillaVM which is a Ubuntu image with a raft of utilities and tools pre-installed, including Elasticsearch. 

The architectural solution is to create an Elasticsearch index on my sandbox. To that end I'll use the Drupal Elasticsearch connector and the Search API modules. Every time content is added on to my local version of Drupal, the index will be updated automatically.

I will then trigger an automated build process to generate the static pages and to dump the Elasticsearch index using elasticdump. This will then be deployed to badzilla.co.uk - the static html into the traditional /var/www/html structure on the prod server, and the elastic index imported into the Elasticsearch server I will have on prod. Clearly locking down the Elasticsearch instance on prod will be crucial - and we'll cover that later. 

Ok - so that's the theory. Let's see how easy (or otherwise) it is in practice. 

Drupal configuration - Add and enable the modules
Let's make a start. Firstly we need to add and enable the two modules we'll be using - the Elasticsearch connector and the Search API.
$ composer require drupal/search_api
<-snipped->
$ composer require drupal/elasticsearch_connector:^6.0-alpha1 
<-snipped->
$ drush en search_api elasticsearch_connector -y
Drupal configuration - Add an Elasticsearch cluster
Cluster

Navigate to admin/config/search/elasticsearch-connector/cluster/add to create a cluster. I have called mine the arbitrary BadzillaStatic and note I have pointed it to my local Elasticsearch instance on my sandbox with a url of http://localhost:9200

Drupal configuration - add a Search API server
Search API Server
Notification

Now we have to add a Search API server - this is achieved by navigating to admin/config/search/search-api/add-server. I have added the name DrupalStatic, and I also checked that the cluster and the  backend were selected correctly. Once I submitted this form, I got an encouraging notification page suggesting everything went well. 

Drupal Configuration: Add index
Index

Navigate to admin/config/search/search-api/add-index to create a search index. Whilst the screenshot will be small, it is quite easy and intuitive to populate. Add an index name - I went for StaticContent. Then I ticked the 'Content' checkbox since that is all I want for my search. Further down I selected my content types. Many in my list are artefacts from my original D6 blog and not required any more. I checked English language, made sure the radio button for the search API server DrupalStatic was selected, and also made sure the Enabled checkbox is ticked. Finally I checked 'Index items immediately'. This is imperative for me since I don't want to wait for a cron run once I've create or updated content in my sandbox. 

Fields

Next fields have to be added to the index. This is where we need to be careful. For instance what I want for the 'authored by' field is not the default value - the uid - but the user name instead. so when picking the fields, it is necessary to expand the entries to ensure no ids are picked instead of the actual value. Don't forget we won't have views or any other backend preprocessor at our disposal and therefore we need real values and not primary keys. 

Field Management

Next is the field management screen - this gives a good overview of what you have selected, and providing the fields are fulltext they can be boosted to increase their search precedence. I have done this for title, terms and body text although I reserve the right to loop back and change these values! Also note that in most instances I have changed the machine names to something more applicable. This will pay dividends later when we are dealing with data fetched by the client in the app's frontend. 

Processors

By clicking on the Processors tab there is the opportunity to fine tune the index. I won't go into these in any great detail since they are purely personal preference and the supporting text on the Processors page is self-explanatory. I elected for Entity Status, HTML Filter and Transliteration. 

Indexing

Finally we are ready for indexing. This can be achieved through cron or by using UI and indexing the lot in one go by setting the parameters at the bottom of the screen. Note this doesn't mean the state of the index will be reflected on the server - double check everything has completed by referring to the ringed message in the screenshot above. 

Checking the search works
Kibana
If you are using Kibana, or even better Kibana in my BadzillaVM, then you can use this excellent tool to check the search is working. Point a browser to {your_ip}:5601 and then navigate to Dev Tools and apply the following query:
GET /elasticsearch_index_badzilla_meedjum_staticcontent/_search
{
  "query": {
    "match_all": {}
  },
  "size": 5
}
and as per the screenshot above, you should see the results output. Note that your index in the query will be different.