Implementing Efficient Search with PostgreSQL's Full-Text Search and GIN Indexing

6 min read

Efficient search is critical for large-scale applications because it directly impacts user experience, ensuring quick access to relevant data. As databases grow, unoptimized search can lead to slow response times, frustrating users and increasing server load. By implementing efficient indexing and query handling, we improve speed, relevance, and scalability, keeping the application responsive as it scales.

In a recent project, I tackled the challenge of implementing efficient search within a large database with over 110,000+ rows, where speed and relevance were crucial for enhancing user experience. Using Supabase as the backend infrastructure and PostgreSQL’s powerful full-text search capabilities, I set up a schema optimized for high-performance querying. Additionally, Prisma ORM served as the core interface for managing and querying the data, providing both flexibility and speed.

In this post, I’ll walk through the entire setup: defining the database schema, implementing GIN indexing for efficient data retrieval, creating triggers to maintain up-to-date search indexes, and leveraging raw SQL queries with Prisma for advanced search functionality. By the end, you’ll have a step-by-step guide on implementing fast, scalable search within large datasets using Supabase, PostgreSQL, and Prisma. Let’s go!

1. Setting Up the Schema

The main objective was to define a schema that would support full-text search and indexing for optimized query performance.

  • Solution: I used PostgreSQL’s tsvector type for storing indexed search data, which supports full-text search by turning text into a searchable vector.

Here’s the initial schema setup:

CREATE TABLE "Recipe" (
    id BIGSERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    slug TEXT UNIQUE NOT NULL,
    ingredients TEXT[],
    instructions TEXT NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    search_vector TSVECTOR
);

Here, @ignore tells Prisma to skip search_vector management. This column will be handled in PostgreSQL as a tsvector type to support full-text search.

2. Set up search_vector in PostgreSQL and Add Trigger to Update search_vector

In the Supabase SQL editor, I set up the search_vector column and configure it to store combined text for efficient search.

To ensure a clean setup, I dropped existing/incomplete configurations and started by re-creating the search_vector column. This column holds the tsvector data type, allowing PostgreSQL to create a search index with multiple fields.

DROP INDEX IF EXISTS idx_recipe_search_vector;
ALTER TABLE "Recipe" DROP COLUMN IF EXISTS search_vector;
ALTER TABLE "Recipe" ADD COLUMN search_vector tsvector;

Here, we delete any previous indexes (idx_recipe_search_vector) and drop the search_vector column if it exists. We then add the search_vector column with the tsvector data type.

SQL Commands:

CREATE OR REPLACE FUNCTION update_search_vector() RETURNS trigger AS $$
BEGIN
  NEW.search_vector := 
    to_tsvector('english', 
      coalesce(NEW.title, '') || ' ' || 
      coalesce(array_to_string(NEW.ingredients, ' '), '') || ' ' || 
      coalesce(NEW.instructions, '')
    );
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

DROP TRIGGER IF EXISTS trigger_update_search_vector ON "Recipe";

CREATE TRIGGER trigger_update_search_vector
BEFORE INSERT OR UPDATE ON "Recipe"
FOR EACH ROW EXECUTE FUNCTION update_search_vector();
  • The update_search_vector function combines data from title, ingredients, and instructions into a single tsvector entry, enabling full-text search on all fields.

  • The trigger trigger_update_search_vector automatically updates search_vector whenever a new entry is added or updated.

3. Populate search_vector for Existing Data

To ensure the search_vector column is populated for all existing records, we run an update across the entire Recipe table. This initialization step will populate the search_vector field with data from existing rows.

UPDATE "Recipe" 
SET search_vector = 
  to_tsvector('english', 
    coalesce(title, '') || ' ' || 
    coalesce(array_to_string(ingredients, ' '), '') || ' ' || 
    coalesce(instructions, '')
  );

This step guarantees that the search_vector field is correctly initialized for all current records in the database, providing a foundation for full-text search.

4. Create a GIN Index on search_vector

Now that search_vector is set up, we create a GIN (Generalized Inverted Index) on this column to speed up search queries. The GIN index is optimized for tsvector types and significantly improves the efficiency of full-text search queries.

-- Create GIN index on search_vector for efficient full-text search 
CREATE INDEX idx_recipe_search_vector ON "Recipe" USING GIN (search_vector);

The GIN index is ideal for full-text search since it quickly indexes and retrieves words in the search_vector. With this in place, our searches will be more efficient.

Using Prisma’s $queryRaw method, you can perform full-text search queries against the search_vector field.

Example Code in Your API Route:

In my API code, I decided to use raw SQL queries for search functionality.

const recipes = await prisma.$queryRaw<{ id: bigint; title: string; slug: string }[]>`
  SELECT id, title, slug
  FROM "Recipe"
  WHERE search_vector @@ plainto_tsquery('english', ${query})
  ORDER BY ts_rank(search_vector, plainto_tsquery('english', ${query})) DESC
  OFFSET ${skip}
  LIMIT ${pageSize};
`;

In this example, plainto_tsquery converts the search query into a format suitable for full-text search, and ts_rank ranks the results by relevance.

Here is an example of the search in action:

SELECT id, title, slug
FROM "Recipe"
WHERE search_vector @@ plainto_tsquery('english', 'chicken pasta')
ORDER BY ts_rank(search_vector, plainto_tsquery('english', 'chicken pasta')) DESC
LIMIT 10;

This query performs a full-text search on the Recipe table in a PostgreSQL database to find the most relevant results for recipes that contain the terms "chicken" and "pasta".

  • search_vector is a tsvector column that holds the indexed, combined text data (e.g., title, ingredients, instructions).

  • plainto_tsquery('english', 'chicken pasta') creates a search query using the terms "chicken" and "pasta", interpreting them in English for a natural language search.

  • The @@ operator checks if the search_vector contains these terms.

  • ORDER BY ts_rank(search_vector, plainto_tsquery('english', 'chicken pasta')) DESC:

    • Orders the results by relevance.

    • ts_rank calculates a relevance score based on the frequency and position of the search terms within search_vector.

    • DESC sorts the results in descending order, so the most relevant results appear first.

The result is as follows:

This approach is efficient and provides highly relevant search results based on the GIN index in PostgreSQL.

Summary

Using Prisma ORM with raw SQL queries provides flexibility for implementing advanced database features, such as full-text search with GIN indexing. Here’s a recap of what we did:

  1. Defined tsvector Schema: Set up a search_vector column to store preprocessed text data.

  2. Implemented Triggers: Used triggers to keep the search_vector updated on INSERT and UPDATE operations.

  3. Added GIN Indexing: Optimized search performance by indexing the search_vector column.

  4. Querying with Raw SQL: Used Prisma’s $queryRaw for complex queries, ensuring efficient and fast full-text search.

This setup allows for quick, accurate search results in large datasets, ensuring scalability and a smooth user experience.

Conclusion

This full-text search setup is essential for applications handling large datasets, allowing for efficient and relevant results with minimal delay. Using PostgreSQL's tsvector and GIN indexing provides high-performance search capabilities that scale well, even with datasets over 100,000 records. This configuration ensures you get fast, relevant search results, significantly enhancing the application's usability and performance.