Filtering Nested Array Objects in Elasticsearch Document with Painless Scripting
Painless is a simple, secure scripting language designed specifically for use with Elasticsearch. It is the default scripting language for Elasticsearch and can safely be used for inline and stored scripts.
https://bit.ly/2TRuUNj
Overview
I have been using Elasticsearch for a while. Starting as a log aggregations for microservices (ELK stack) and progressing towards as a document searching functionalities. Because the usages are advancing, therefore the demand for searching criteria is increasing as well. Of course, as a consequence, the searching requirement is getting complex. Luckily, Elasticsearch has one of the most powerful features, which is painless scripting.
Also, the demands are growing to Kibana as reporting and monitoring tools as well.
Use Case
I want to search for a product in which stocks are less than or equal to 50. This is the product document looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
{ "productId": 5, "productName": "Shoes", "warehouses": [ { "location": "Location A", "quantity": 100 }, { "location": "Location B", "quantity": 10 }, { "location": "Location C", "quantity": 50 } ] } |
Index Creation
First, I will create a new index and give it a name: product.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
PUT /products { "mappings": { "properties": { "productId": { "type": "integer" }, "productName": { "type": "text", "fields": { "raw": { "type": "keyword", "ignore_above": 256 } } }, "warehouses": { "properties": { "location": { "type": "text" }, "quantity": { "type": "integer" } } } } } } |
Prepare Document
Insert documents with BULK API
1 2 3 4 5 6 7 |
POST products/_bulk {"index" : {"_id":1}} {"productId":1,"productName":"Bags","warehouses":[{"location":"Location A","quantity":20},{"location":"Location B","quantity":30},{"location":"Location C","quantity":50}]} {"index" : {"_id":2}} {"productId":2,"productName":"Shirts","warehouses":[{"location":"Location A","quantity":100},{"location":"Location B","quantity":150},{"location":"Location C","quantity":150}]} {"index" : {"_id":3}} {"productId":3,"productName":"Shoes","warehouses":[{"location":"Location A","quantity":100},{"location":"Location B","quantity":10},{"location":"Location C","quantity":50}]} |
Searching with Painless
It is time to searching with painless scripting. Syntax over words.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
GET /products/_search { "query": { "bool": { "must": { "match_all": {} }, "filter": { "bool": { "must": { "script": { "script": { "lang": "painless", "source": """ def x = doc['warehouses.quantity']; def flag = false; for(int i = 0; i < x.length; i++) { if(x[i] <= params.limit) { flag = true; } } return flag; """, "params": { "limit": 50 } } } } } } } } } |
You will get results such as:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
{ "took" : 10, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "products", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : { "productId" : 1, "productName" : "Bags", "warehouses" : [ { "location" : "Location A", "quantity" : 20 }, { "location" : "Location B", "quantity" : 30 }, { "location" : "Location C", "quantity" : 50 } ] } }, { "_index" : "products", "_type" : "_doc", "_id" : "3", "_score" : 1.0, "_source" : { "productId" : 3, "productName" : "Shoes", "warehouses" : [ { "location" : "Location A", "quantity" : 100 }, { "location" : "Location B", "quantity" : 10 }, { "location" : "Location C", "quantity" : 50 } ] } } ] } } |
Of course, you can play around by changing the limit, or combining with other filters. It is up to you. You have total control over your search queries.
Conclusion
I find this painless scripting approach in this article is similar to the EXISTS clause in traditional SQL language. To be honest, the SQL query is still a more elegant abstraction language compare to the JSON approach query language. However, in the end, it is about functionalities. As long as it works perfectly well, we can put aside the elegance aspect.
And lots more functionalities come with this feature. One of the lovely ones is debugging features. If you are curious to know more, please visit this link.