Live fulltext search in Ruby on Rails

Some time ago I promised to create a small tutorial about live fulltext search. A fulltext search, that gives you results as you type.

Ingredients:

  • Ruby on rails
  • ferret gem (gem install ferret)
  • acts_as_ferret gem (gem install acts_as_ferret)
  • auto_complete plugin (from the application root: ruby script/plugin install auto_complete)

What we will do

  1. Create an empty application – simple book database
  2. Add fulltext search capabilities
  3. Create the live search
    1. Create search pane partial – the one that will display the search box
    2. Create the search results partial – that will render the hints (search results)
    3. Modify controller to respond to the search pane

Create a book database application

We will create a small application for book management. It will store, list, update books and it will also provide the live search.
So, lets create the skeleton of the aplication:

# Create the rails application
rails books
# create database books
echo "create database books"  | mysql -u root -p
cd books

Configure database login and password in app/config/database.yml.

development:
  adapter: mysql
  database: books
  username: root
  password: password
  host: localhost
  port: 3306

Create skeleton of the application. From root of the application run:

ruby script/generate scaffold Book title:string abstract:text

Create the books table

rake db:migrate

Start up the development server

ruby script/server

Now, browse to http://127.0.0.1:3000/books and type in some data.

Add fulltext search capabilities

Change the app/models/book.rb to support fulltext search

require "acts_as_ferret"
 
class Book < ActiveRecord::Base
    acts_as_ferret
end

You can check in the console, that the fulltext is enabled. Just start the console via
ruby script/console and put there

Book.find_by_contents("book").

It should return a result set, similar to this:

=> #<ActsAsFerret::SearchResults:0x2540f54 @results=[#<Book id: 2, title: "Book secondo", abstract: "Book about book", created_at: "2008-07-07 23:16:38", updated_at: "2008-07-07 23:16:38">, #<Book id: 1, title: "First book", abstract: "This is a first book", created_at: "2008-07-07 23:16:23", updated_at: "2008-07-07 23:16:23">], @total_hits=2>

Create the live search

Finally, create the live search.

Create search pane partial

The search_pane will be used to display search box.

Create a partial _search_pane.html.erb in app/views/books and put there simple tag. The tag create Ajax Autocompleter that calls auto_complete_for_search_query method of the default controller (in our case it will be books)

<%= text_field_with_auto_complete :search, :query %>

Add javascript include and partial rendering to the books template app/views/layouts/books.html.erb.

Do not forget! The javascript include must be in the head of the template.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
  <meta http-equiv="content-type" content="text/html;charset=UTF-8" />
  <title>Books: <%= controller.action_name %></title>
  <!-- HERE -->
  <%= stylesheet_link_tag 'scaffold' %>
	<%= javascript_include_tag :defaults %> 
</head>
<body>
 
<!-- AND HERE -->
<%= render :partial=>"books/search_pane" %>
 
<p style="color: green"><%= flash[:notice] %></p>
 
<%= yield  %>
 
</body>
</html>

Create the search results partial

The search_results will format the results of the full text search and will “offer” the resulting records. Create a partial app/views/books/_search_results.html.erb and add there the formatting code:

<ul>
	<% for book in @books %>
	<li><%= link_to h(book.title), :controller=>"books", :action=>"show", :id=>book %></li>
  	<% end %>
</ul>

Modify controller

Add the following line at the beginning of the books_controller.

protect_from_forgery :only => [:create, :update, :destroy]

Create a method in books_controller that will search for the books

 def auto_complete_for_search_query
   @books = Book.find_by_contents(params["search"]["query"]+"*", {:limit => 5})
   render :partial => "search_results"
 end

We do not want to generate the whole page layout, so it is necessary to specify it in the books controller:

layout 'books', :except => [:auto_complete_for_search_query]

And now, navigate to http://127.0.0.1:3000/books and start searching. As soon as you start typing into the search box, it shows results. Click on one of the proposed links to see what happens. Source code is here.

Full text search in Ruby on Rails 3 - ferret

There are several possibilities how to use ferret in RoR. This post will show the easy way – using the acts_as_ferret plugin.

To show the syntax and code, I will use the same data objects as in the Full text search in ruby on rails 2 – MySQL

Installation

Ferret installation is easy

gem install ferret

will do the job.

In addition, it is necessary to install the acts_as_ferret plugin.

script/plugin install svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret

Setup

The most simple setup is

class Article > ActiveRecord::Base
  acts_as_ferret	
end

This is enough to make the full text engine working. Now you can test it in the Rails console

Article.find_by_contents("sybase")

If you have a lot of data to be indexed, be patient with the first run. It is slow, because the index needs to be built.

The acts_as_ferret with no argument indexes automatically all fields of the Article, including arrays of child objects. This behaviour could be overwritten. You can narrow the field set

# Index only id and body, not title
acts_as_ferret :fields => [ 'id', 'body' ]

Or you can widen the field set.

acts_as_ferret :fields => [ 'id', 'body', 'title', 'long_article' ]
 
# Compute the article length
def long_article
  self.body.length > 40
end

Note 1: see usage of long_article in Query syntax below

Note 2: once you change the structure of the index, you need to rebuild it. The easiest way is to stop your application and delete the index/~environment~/~Indexed object~ folder. It will be created automatically with the next search request.

Query syntax

Since ferret is a port of the lucene engine, it uses the same query syntax. I will show only a few queries that you can use.

For details see Lucene documentation

  # Search for pages with "sybase" keyword
  Article.find_by_contents("sybase")
 
  # "sybase" and "replication" keywords
  Article.find_by_contents("sybase replication")
 
  # "sybase" or "replication"
  Article.find_by_contents("sybase OR replication")
 
  # short articles about sybase
  Article.find_by_contents("long_article:(false) *:sybase")
 
  # articles containing similar words like "increase"
  # will return e.g. increasing
  Article.find_by_contents("increase~")

Pagination

Ferret is fast, ferret is flexible, but… it is not an active record object, so you cannot use the pre-defined pagination. You have to implement it on your own. Here is how we did it in our project www.tamtami.com.

1. Create full text search function in the model

  def self.full_text_search(q, options = {})
    return nil if q.nil? or q==""
    default_options = {:limit => 10, :page => 1}
    options = default_options.merge options
    options[:offset] = options[:limit] * (options[:page].to_i-1)
    results_ids = []
 
    num = self.ferret_index.search_each("*:(#{q})", {:num_docs => options[:limit], :first_doc => options[:offset]}) { |doc, score|
      results_ids << self.ferret_index[doc]["id"]
    }
    results = Article.find(results_ids)
    return [num, results]
  end

or more elegant, as proposed by Jens Kraemer

  def self.full_text_search(q, options = {})
    return nil if q.nil? or q==""
    default_options = {:limit => 10, :page => 1}
    options = default_options.merge options
    options[:offset] = options[:limit] * (options.delete(:page).to_i-1)  
    results = Article.find_by_contents(q, options)
    return [results.total_hits, results]
  end

2. Create method that creates paginator in application.rb

  def pages_for(size, options = {})
    default_options = {:per_page => 10}
    options = default_options.merge options
    pages = Paginator.new self, size, options[:per_page], (params[:page]||1)
    pages
  end

3. Perform the search in the controller

  def search
    @query=params[:query]
    @total, @articles = Article.full_text_search(@query, :page => (params[:page]||1))	  
    @pages = pages_for(@total)
  end

4. Use it in the article view

...
   <%= pagination_links(@pages, :params => {:query=>@query}) %>
...

Final word

The ferret fulltext engine is fast, flexible, but needs more programming than MySQL full text index.

Full text search in Ruby on Rails 2 - MySQL

My previous post compared MySQL and ferret full text search
engines. For our project, the ferret was the winner. Nevertheless, I
will try to show the beauty and simplicity of using MySQL indexes.


Create table and indices


First of all it is necessary to create table and the corresponding
index.

CREATE TABLE articles(
  id integer NOT NULL PRIMARY KEY AUTO_INCREMENT,
  title varchar(20),
  body varchar(100),
  fulltext(title, body)
) engine = MyISAM;<BR>

or create index after the table exists.

CREATE fulltext INDEX x_f_articles_body ON articles(body);

Please note the MyISAM engine. You cannot create full text index
on InnoDB tables.


And now, let’s insert some data

INSERT INTO articles(title, body)
  SELECT "Databases and IT", "Todays world ... database... MySQL, Sybase";
 
INSERT INTO articles(title, body) 
  SELECT "Sybase RS manual", "Sybase Replication server is a ...";
 
INSERT INTO articles(title, body) 
  SELECT "Sybase technology", "ASE, RS, IQ, PowerBuilder, all of them are...";
 
INSERT INTO articles(title, body) 
  SELECT "Databases and people", "People are using databases without knowing it...";
 
INSERT INTO articles(title, body) 
  SELECT "People everywhere", "Human population is increasing...";

Query syntax


Querying is simple. It is part of the MySQL dialect. Simple Boolean query searches for articles with “Databases”
keyword looks similar to this:

SELECT * 
FROM articles 
WHERE match(title,body) against ("Databases");






















id

title

body

4

Databases and people

People are using databases without knowing it…

1

Databases and IT

Today’s world … database… MySQL, Sybase

or you can create query looking for all database related articles

SELECT * 
FROM articles 
WHERE match(title,body) against ("Databases" WITH query expansion);



























id

title

body

1

Databases and IT

Todays world … database… MySQL, Sybase

4

Databases and people

People are using databases without knowing it…

5

People everywhere

Human population is increasing…

Note, that the “match” columns must be the same as they were in the
create index statement.


See also the record ID=5. There is nothing about databases, but it is
in the result set anyway. The query expansion means, that the MySQL
engine goes through the index twice. In first run it finds all
records with the searched keyword and builds a set of keywords that
appear together with the search string. In the second run it
searches for the expanded set of keywords. Since the “Databases”
are together with “People” in record ID=4, it was
returned as relevant.

This is useful. Unfortunately the result set is often too big.


Also note, that there are not records 2 and 3 containing the Sybase keyword (remember, record 1 contains both Databases and Sybase keywords). The reason is
simple. MySQL weights the words according to their frequency. If the
word is too often in the articles, the weight becomes 0 and thus not
relevant. You have always to remember it, because it’s a feature, not a
bug!


Search


So, let’s see it in action. The fulltext search in rails is as simple as
any other search.

Articles.find(:all, :conditions => ["match(title,body) against  (?)", "Databases"] )

naturally it is possible to combine it with other expressions:

Articles.find(:all, :conditions => ["match(title,body) against  (?) and id > ?",
"Databases", 2] )

Pagination


Pagination is as simple as it could be. You can use the same
pagination methods as for any other ActiveRecord query.

def list
  @article_pages, @articles = paginate :articles,
    :per_page => 10, 
    :conditions => ["match(title,body) against  (?)", "Databases"]
end

Scoping


Scoping is a feature you cannot do with ferret. How does it work?


First of all, imagine that you have a really “complex”
function looking for new articles (ID > 2).

def new_articles
  Articles.find( :all, :conditions => [ "id > ?", 2 ] )
end

Than your boss come and say… OK, but I would like to have
there a possibility to display only new articles about… for example
“Databases”.

Well, you can change the function, or you can scope it. Since you cannot learn now anything new by rewriting functions, we will try to
scope it.

def bosses_DB_articles
  Articles.with_scope(:find => {:conditions => ["match(title,body) against  (?)", "Databases"]} ) do
    new_articles
  end
end

And that’s the whole trick!


Final word


As I said. Searching with MySQL is really easy, but sometimes
it gives unexpected results.