Implementing Efficient Search with PostgreSQL's Full-Text Search and GIN Indexing
6 min read
Efficient search is critical for large-scale applications because it directly impacts user experience, ensuring quick access to relevant data. As databases grow, unoptimized search can lead to slow response times, frustrating users and increasing server load. By implementing efficient indexing and query handling, we improve speed, relevance, and scalability, keeping the application responsive as it scales.
In a recent project, I tackled the challenge of implementing efficient search within a large database with over 110,000+ rows, where speed and relevance were crucial for enhancing user experience. Using Supabase as the backend infrastructure and PostgreSQL’s powerful full-text search capabilities, I set up a schema optimized for high-performance querying. Additionally, Prisma ORM served as the core interface for managing and querying the data, providing both flexibility and speed.
In this post, I’ll walk through the entire setup: defining the database schema, implementing GIN indexing for efficient data retrieval, creating triggers to maintain up-to-date search indexes, and leveraging raw SQL queries with Prisma for advanced search functionality. By the end, you’ll have a step-by-step guide on implementing fast, scalable search within large datasets using Supabase, PostgreSQL, and Prisma. Let’s go!
1. Setting Up the Schema
The main objective was to define a schema that would support full-text search and indexing for optimized query performance.
- Solution: I used PostgreSQL’s
tsvector
type for storing indexed search data, which supports full-text search by turning text into a searchable vector.
Here’s the initial schema setup:
CREATE TABLE "Recipe" (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
slug TEXT UNIQUE NOT NULL,
ingredients TEXT[],
instructions TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
search_vector TSVECTOR
);
Here, @ignore
tells Prisma to skip search_vector
management. This column will be handled in PostgreSQL as a tsvector
type to support full-text search.
2. Set up search_vector
in PostgreSQL and Add Trigger to Update search_vector
In the Supabase SQL editor, I set up the search_vector
column and configure it to store combined text for efficient search.
To ensure a clean setup, I dropped existing/incomplete configurations and started by re-creating the search_vector
column. This column holds the tsvector
data type, allowing PostgreSQL to create a search index with multiple fields.
DROP INDEX IF EXISTS idx_recipe_search_vector;
ALTER TABLE "Recipe" DROP COLUMN IF EXISTS search_vector;
ALTER TABLE "Recipe" ADD COLUMN search_vector tsvector;
Here, we delete any previous indexes (idx_recipe_search_vector
) and drop the search_vector
column if it exists. We then add the search_vector
column with the tsvector
data type.
SQL Commands:
CREATE OR REPLACE FUNCTION update_search_vector() RETURNS trigger AS $$
BEGIN
NEW.search_vector :=
to_tsvector('english',
coalesce(NEW.title, '') || ' ' ||
coalesce(array_to_string(NEW.ingredients, ' '), '') || ' ' ||
coalesce(NEW.instructions, '')
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS trigger_update_search_vector ON "Recipe";
CREATE TRIGGER trigger_update_search_vector
BEFORE INSERT OR UPDATE ON "Recipe"
FOR EACH ROW EXECUTE FUNCTION update_search_vector();
The
update_search_vector
function combines data fromtitle
,ingredients
, andinstructions
into a singletsvector
entry, enabling full-text search on all fields.The trigger
trigger_update_search_vector
automatically updatessearch_vector
whenever a new entry is added or updated.
3. Populate search_vector
for Existing Data
To ensure the search_vector
column is populated for all existing records, we run an update across the entire Recipe
table. This initialization step will populate the search_vector
field with data from existing rows.
UPDATE "Recipe"
SET search_vector =
to_tsvector('english',
coalesce(title, '') || ' ' ||
coalesce(array_to_string(ingredients, ' '), '') || ' ' ||
coalesce(instructions, '')
);
This step guarantees that the search_vector
field is correctly initialized for all current records in the database, providing a foundation for full-text search.
4. Create a GIN Index on search_vector
Now that search_vector
is set up, we create a GIN (Generalized Inverted Index) on this column to speed up search queries. The GIN index is optimized for tsvector
types and significantly improves the efficiency of full-text search queries.
-- Create GIN index on search_vector for efficient full-text search
CREATE INDEX idx_recipe_search_vector ON "Recipe" USING GIN (search_vector);
The GIN
index is ideal for full-text search since it quickly indexes and retrieves words in the search_vector
. With this in place, our searches will be more efficient.
5. Querying with Prisma and Full-Text Search
Using Prisma’s $queryRaw
method, you can perform full-text search queries against the search_vector
field.
Example Code in Your API Route:
In my API code, I decided to use raw SQL queries for search functionality.
const recipes = await prisma.$queryRaw<{ id: bigint; title: string; slug: string }[]>`
SELECT id, title, slug
FROM "Recipe"
WHERE search_vector @@ plainto_tsquery('english', ${query})
ORDER BY ts_rank(search_vector, plainto_tsquery('english', ${query})) DESC
OFFSET ${skip}
LIMIT ${pageSize};
`;
In this example, plainto_tsquery
converts the search query into a format suitable for full-text search, and ts_rank
ranks the results by relevance.
Here is an example of the search in action:
SELECT id, title, slug
FROM "Recipe"
WHERE search_vector @@ plainto_tsquery('english', 'chicken pasta')
ORDER BY ts_rank(search_vector, plainto_tsquery('english', 'chicken pasta')) DESC
LIMIT 10;
This query performs a full-text search on the Recipe
table in a PostgreSQL database to find the most relevant results for recipes that contain the terms "chicken" and "pasta".
search_vector
is atsvector
column that holds the indexed, combined text data (e.g., title, ingredients, instructions).plainto_tsquery('english', 'chicken pasta')
creates a search query using the terms "chicken" and "pasta", interpreting them in English for a natural language search.The
@@
operator checks if thesearch_vector
contains these terms.
ORDER BY ts_rank(search_vector, plainto_tsquery('english', 'chicken pasta')) DESC
:Orders the results by relevance.
ts_rank
calculates a relevance score based on the frequency and position of the search terms withinsearch_vector
.DESC
sorts the results in descending order, so the most relevant results appear first.
The result is as follows:
This approach is efficient and provides highly relevant search results based on the GIN index in PostgreSQL.
Summary
Using Prisma ORM with raw SQL queries provides flexibility for implementing advanced database features, such as full-text search with GIN indexing. Here’s a recap of what we did:
Defined
tsvector
Schema: Set up asearch_vector
column to store preprocessed text data.Implemented Triggers: Used triggers to keep the
search_vector
updated onINSERT
andUPDATE
operations.Added GIN Indexing: Optimized search performance by indexing the
search_vector
column.Querying with Raw SQL: Used Prisma’s
$queryRaw
for complex queries, ensuring efficient and fast full-text search.
This setup allows for quick, accurate search results in large datasets, ensuring scalability and a smooth user experience.
Conclusion
This full-text search setup is essential for applications handling large datasets, allowing for efficient and relevant results with minimal delay. Using PostgreSQL's tsvector
and GIN indexing provides high-performance search capabilities that scale well, even with datasets over 100,000 records. This configuration ensures you get fast, relevant search results, significantly enhancing the application's usability and performance.