3 Ways To Use ElasticSearch in a Symfony Project With ApiPlatform

When your SQL Database does not cut it anymore.

Marco Pfeiffer
Better Programming

--

Photo by Marten Newhall on Unsplash

The situation: You have a working project and you need to add a fulltext search. But most databases (SQL, Document) do not contain good fuzzy searches, so you need to bring a specialized database (like Elasticsearch) into your project.

But how?:

The Architecture choices

You have multiple ways how you can implement your search database.

1. The attached elastic (for frequent changes or legacy systems)

Architecture overview (red = critical components)

Here, you build your own API Platform filter that runs the full-text search on Elasticsearch and then adapts it into an id in (:results) query.

This has some advantages:

  • Legacy Friendly: You don’t need to change existing filters and privilege checks.
  • Computed properties: you can search computed properties since elasticsearch has the denormalized representation.
  • Consistency: You always fetch the real data from sql so you’ll never have stale data in your results.

But, this method has a lot of downsides:

  • Performance: You always need to run 2 database connections.
  • Scaling: You need to get all ids from elastic and forward them to sql.
  • Features: Using advanced Elasticsearch features becomes difficult.
  • Stale Search: You might not notice stale data in Elasticsearch directly

When to use it?

  • If you already have an API that must not change
  • If your data changes frequently and search index updates can’t keep up

2. The elastic replacement (for read-heavy API's)

Architecture overview (red = critical components)

Here, you put everything you need directly in Elasticsearch. This means that you don’t need your primary Database during requests anymore.

  • Performance, since you store everything in Elasticsearch. There is also no overhead for joining complex relations anymore. And Elasticsearch can parallelize queries over multiple nodes, making it very fast.
  • Computed properties: you can search computed properties since elasticsearch has the denormalized representation.
  • Library Support: This is the model some libraries expect you to use.

But again, there are downsides:

  • No write endpoint (API Platform): You lose the capability to write to that endpoint. You might be able to trick API Platform to write in SQL and read in elastic.
  • Legacy Support: You have to (re)write filters and access rights for Elasticsearch, if you already have them.
  • Stale Data and Search: If your update task isn’t perfect, you’ll have outdated data in your search results.

When to use it?

  • When your database model can be cleanly serialized the way you need it
  • When your search is a primary feature and you want to use every trick elasticsearch has

3. The elastic addition (flexible, but hard to implement)

Architecture overview (red = critical components)

You, again, put your data directly into Elasticsearch.

However, you build a separate Model for your API. That way, it can be something completely different than to what you store. For example aggregations, values from files, or external API's.

  • Maximum Flexibility: You can build a completely different representation than what you have in your database
  • Performance, since you store everything in Elasticsearch. There is also no overhead for joining complex relations anymore.
  • Denormalization Included: If your data model is complex or slow, then you can just denormalize into elasticsearch (instead of another representation first).
  • Can mix models: If you have news, user post and products you want to search at the same time, then this architecture supports this.

But

  • Effort: You build a completely different search model and endpoint.
  • Library Support: For some reason, there is barely any library out there expecting a separate search model, so you are on you own here.
  • Stale Data and Search: If your update task isn’t perfect, you’ll have to deal with outdated data in your search results.

When to use it?

  • when your database model has a lot of relations and/or is slow
  • if you need a model, that is completely different to your domain model
  • if you want to put multiple different models in one search index

Let’s discuss Issues you will encounter

Elasticsearch Versions

This is a decision you’ll have to make quite early:

  • Elasticsearch 6.* is still the most supported in the PHP world but is EOL. It is also the last version to support unencrypted HTTP, which makes local development a lot simpler.
  • Elasticsearch 7.* removed types and is therefore somewhat incompatible (the url of the search endpoint does not include the type anymore)
  • Elasticsearch 8.* has basically no support in the php world and has the licensing issue if you use AWS which forked Opensearch.
  • Opensearch is an elasticsearch fork that is compatible with version 7.

Elasticsearch 7 is supposed to get support until 2023–08–01, so I’d stick with that. If you plan to use AWS, then you can switch to Opensearch.

ApiPlatform Support

ApiPlatform technically has support for elasticsearch. But all API's are marked as experimental in source code and that is for a reason. The concepts aren’t fleshed out. They don’t even support scoring so you basically can’t build a full text search it…

You are better off building your own providers and filters, which is actually quiet simple if you have a guide (I'll show you in the end)

FOSElasticaBundle

This is a great bundle to start your Elasticsearch integration with, especially to populate your search index.

But actually searching with it will cause you some headaches since it needlessly has its own query language, which you’ll want to avoid.

With that out of the way…

Implementation example

Since I discussed 3 architectures, I’m only showing some general details. I build multiple projects with elasticsearch and somehow every project is slightly different.

Configure Indexing

You can pretty much follow the installation instructions of the FOS Elastica Bundle. Define a serialization group on the properties you want to index and then configure the Elastica Bundle to use that model and serialization group. This is similar in all 3 architectures.

If your model is in the doctrine orm, then it is automatically indexed when updated. You can trigger the first populate using:

bin/console fos:elastica:populate

If you try to implement architecture 3., then you’ll need to create your own provider. There is only small documentation for manual-providers. But you’ll basically have to create a pager that provides the objects that are serialized. This provider can then be configured under persistence.provider. There you can do whatever stunt you want to create your Search Model.

Configure API Platform for Variant 2 and 3

API Platform can natively talk to Elasticseach… but only with version 6 at the time of writing this. And since there are many other limitations as well, I wouldn’t recommend that implementation.

Instead, you should write our own API Platform DataProvider.

This provider implements 2. and 3., as we can reuse existing classes from API Platforms implementation. You can leave out the serializer, if you indexed the correct representation first and safe some performance there.

Configure API Platform for variant 1

Here, you need to build a filter into your existing structure model. There are guides on how to build custom filters, like this one:

Your filter will then look something like this:

Actually Searching

You now have everything to integrate your models into Elasticsearch.
Next, you’ll need to build up your query, but there are enough guides out there, like this one:

I hope this helps you get started on your journey of good search.

--

--

Full-Stack Web Developer for hauptsache.net. I document my findings on Symfony, TYPO3, React. See more at: www.marco.zone