TL;DR
Introduction
Querying and filtering are two fundamental operations when working with Elasticsearch. Querying involves searching for documents that match certain criteria, often using full-text search capabilities. This is useful when you need to find documents based on relevance or specific content. Filtering, on the other hand, is used to narrow down the search results by applying specific conditions, such as range filters or term filters. Filters are generally faster and more efficient because they do not score documents like queries do. Understanding the distinction between querying and filtering can help you optimize your search operations and improve the performance of your Elasticsearch queries.
Hands-on
I’ve prepared a Jupyter notebook that demonstrates how to query and filter Elasticsearch using Elastic.Clients.Elasticsearch
. You can find the source code here.
⚠️ Note: to run the queries that follow you need the “book_index” dataset from my previous post Semantic Search with Elasticsearch in .NET.
Querying
In the query context, a query clause answers the question “How well does this document match this query clause?”. In addition to deciding whether or not the document matches, the query clause also calculates a relevance score in the _score
metadata field.
Full text queries
Full text queries enable you to search analyzed text fields such as the body of an email. The query string is processed using the same analyzer that was applied to the field during indexing.
- match. The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries.
- multi-match. The multi-field version of the match query.
Match query
Returns documents that match
a provided text, number, date or boolean value. The provided text is analyzed before matching.
The match
query is the standard query for performing a full-text search, including options for fuzzy matching.
var searchResponse = await client.SearchAsync<Book>(s => s
.Query(q => q
.Match(m => m
.Field(f => f.Summary)
.Query("guide"))
)
.Size(5)
);
DumpRequest(searchResponse);
PrettyPrint(searchResponse);
Multi-match query
The multi_match
query builds on the match query to allow multi-field queries.
var searchResponse = await client.SearchAsync<Book>(s => s
.Query(q => q .MultiMatch(m => m
.Fields(Fields.FromStrings(["summary", "title"]))
.Query("javascript"))
)
);
DumpRequest(searchResponse);
PrettyPrint(searchResponse);
➕ Individual fields can be boosted with the caret (^) notation. Note in the following query how the score of the results that have “JavaScript” in their title is multiplied.
var searchResponse = await client.SearchAsync<Book>(s => s
.Query(q => q .MultiMatch(m => m
.Fields(Fields.FromStrings(["summary", "title^3"]))
.Query("javascript"))
)
);
DumpRequest(searchResponse);
PrettyPrint(searchResponse);
Term-level Queries
You can use term-level queries to find documents based on precise values in structured data. Examples of structured data include date ranges, IP addresses, prices, or product IDs.
Term search
Returns document that contain exactly the search term.
var searchResponse = await client.SearchAsync<Book>(s => s
.Query(q => q.Term(t => t
.Field(f => f.Publisher)
.Value("addison-wesley"))
)
);
DumpRequest(searchResponse);
PrettyPrint(searchResponse);
Range search
Returns documents that contain terms within a provided range.
The following example returns books that have at least 45 reviews.
var searchResponse = await client.SearchAsync<Book>(s => s
.Query(q => q.Range(r => r
.NumberRange(nr => nr.Field(f => f.num_reviews).Gte(45)))
)
);
DumpRequest(searchResponse);
PrettyPrint(searchResponse);
Prefix search
Returns documents that contain a specific prefix in a provided field.
var searchResponse = await client.SearchAsync<Book>(s => s
.Query(q => q.Prefix(p => p
.Field(f => f.Title)
.Value("java"))
)
);
DumpRequest(searchResponse);
PrettyPrint(searchResponse);
Fuzzy search
Returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.
An edit distance is the number of one-character changes needed to turn one term into another. These changes can include:
- Changing a character (box → fox)
- Removing a character (black → lack)
- Inserting a character (sic → sick)
- Transposing two adjacent characters (act → cat)
var searchResponse = await client.SearchAsync<Book>(s => s
.Query(q => q
.Fuzzy(f => f
.Field(ff => ff.Title)
.Value("pyvascript")
)
)
);
DumpRequest(searchResponse);
PrettyPrint(searchResponse);
Combining Query Conditions
Compound queries wrap other compound or leaf queries, either to combine their results and scores, or to change their behaviour. They also allow you to switch from query to filter context, but that will be covered later in the Filtering section.
bool.must (AND)
The clauses must appear in matching documents and will contribute to the score. This effectively performs an “AND” logical operation on the given sub-queries.
var searchResponse = await client.SearchAsync<Book>(s => s
.Query(q => q.Bool(b => b
.Must(m => m
.Term(t => t
.Field(f => f.Publisher)
.Value("addison-wesley")
),
m => m
.Term(t => t
.Field(f => f.Authors)
.Value("richard helm")
)
))
)
);
DumpRequest(searchResponse);
PrettyPrint(searchResponse);
bool.should (OR)
The clause should appear in the matching document. This performs an “OR” logical operation on the given sub-queries.
var searchResponse = await client.SearchAsync<Book>(s => s
.Query(q => q.Bool(b => b
.Should(m => m
.Term(t => t
.Field(f => f.Publisher)
.Value("addison-wesley")
),
m => m
.Term(t => t
.Field(f => f.Authors)
.Value("richard helm")
)
))
)
);
DumpRequest(searchResponse);
PrettyPrint(searchResponse);
Filtering
In a filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data, for example:
- Does this
timestamp
fall into the range 2015 to 2016? - Is the
status
field set to “published”?
Filter context is in effect whenever a query clause is passed to a filter
parameter, such as the filter
or must_not
parameters in the bool
query.
bool.filter
The clause (query) must appear for the document to be included in the results. Unlike query context searches such as term, bool.must or bool.should, a matching score isn’t calculated because filter clauses are executed in filter context.
var searchResponse = await client.SearchAsync<Book>(s => s
.Query(q => q.Bool(b => b
.Filter(m => m
.Term(t => t
.Field(f => f.Publisher)
.Value("prentice hall")
)
))
)
);
DumpRequest(searchResponse);
PrettyPrint(searchResponse);
bool.must_not
The clause (query) must not appear in the matching documents. Because this query also runs in filter context, no scores are calculated; the filter just determines if a document is included in the results or not.
var searchResponse = await client.SearchAsync<Book>(s => s
.Query(q => q.Bool(b => b
.MustNot(m => m
.Range(r => r.NumberRange(nr => nr.Field(f => f.num_reviews).Lte(45)))
))
)
);
DumpRequest(searchResponse);
PrettyPrint(searchResponse);
Using Filters with Queries
Filters are often added to search queries with the intention of limiting the search to a subset of the documents. A filter can cleanly eliminate documents from a search, without altering the relevance scores of the results.
The next example returns books that have the word “javascript” in their title, only among the books that have more than 45 reviews.
var searchResponse = await client.SearchAsync<Book>(s => s
.Query(q => q.Bool(b => b
.Must(m => m
.Match(t => t
.Field(f => f.Title).Query("javascript")
)
)
.MustNot(m => m
.Range(r => r.NumberRange(nr => nr.Field(f => f.num_reviews).Lte(45)))
))
)
);
DumpRequest(searchResponse);
PrettyPrint(searchResponse);
Conclusion
That is all for now! 🎉 We’ve covered the basics of querying and filtering in Elasticsearch using the Elastic.Clients.Elasticsearch
NuGet package. Understanding how to construct queries and filters can help you build powerful search capabilities in your applications.
🙌 I hope you found it helpful. If you have any questions, please feel free to reach out. If you’d like to support my work, a star on GitHub would be greatly appreciated! 🙏