Full text search in Ruby on Rails 3 - ferret

There are several possibilities how to use ferret in RoR. This post will show the easy way – using the acts_as_ferret plugin.

To show the syntax and code, I will use the same data objects as in the Full text search in ruby on rails 2 – MySQL

Installation

Ferret installation is easy

gem install ferret

will do the job.

In addition, it is necessary to install the acts_as_ferret plugin.

script/plugin install svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret

Setup

The most simple setup is

class Article > ActiveRecord::Base
  acts_as_ferret	
end

This is enough to make the full text engine working. Now you can test it in the Rails console

Article.find_by_contents("sybase")

If you have a lot of data to be indexed, be patient with the first run. It is slow, because the index needs to be built.

The acts_as_ferret with no argument indexes automatically all fields of the Article, including arrays of child objects. This behaviour could be overwritten. You can narrow the field set

# Index only id and body, not title
acts_as_ferret :fields => [ 'id', 'body' ]

Or you can widen the field set.

acts_as_ferret :fields => [ 'id', 'body', 'title', 'long_article' ]
 
# Compute the article length
def long_article
  self.body.length > 40
end

Note 1: see usage of long_article in Query syntax below

Note 2: once you change the structure of the index, you need to rebuild it. The easiest way is to stop your application and delete the index/~environment~/~Indexed object~ folder. It will be created automatically with the next search request.

Query syntax

Since ferret is a port of the lucene engine, it uses the same query syntax. I will show only a few queries that you can use.

For details see Lucene documentation

  # Search for pages with "sybase" keyword
  Article.find_by_contents("sybase")
 
  # "sybase" and "replication" keywords
  Article.find_by_contents("sybase replication")
 
  # "sybase" or "replication"
  Article.find_by_contents("sybase OR replication")
 
  # short articles about sybase
  Article.find_by_contents("long_article:(false) *:sybase")
 
  # articles containing similar words like "increase"
  # will return e.g. increasing
  Article.find_by_contents("increase~")

Pagination

Ferret is fast, ferret is flexible, but… it is not an active record object, so you cannot use the pre-defined pagination. You have to implement it on your own. Here is how we did it in our project www.tamtami.com.

1. Create full text search function in the model

  def self.full_text_search(q, options = {})
    return nil if q.nil? or q==""
    default_options = {:limit => 10, :page => 1}
    options = default_options.merge options
    options[:offset] = options[:limit] * (options[:page].to_i-1)
    results_ids = []
 
    num = self.ferret_index.search_each("*:(#{q})", {:num_docs => options[:limit], :first_doc => options[:offset]}) { |doc, score|
      results_ids << self.ferret_index[doc]["id"]
    }
    results = Article.find(results_ids)
    return [num, results]
  end

or more elegant, as proposed by Jens Kraemer

  def self.full_text_search(q, options = {})
    return nil if q.nil? or q==""
    default_options = {:limit => 10, :page => 1}
    options = default_options.merge options
    options[:offset] = options[:limit] * (options.delete(:page).to_i-1)  
    results = Article.find_by_contents(q, options)
    return [results.total_hits, results]
  end

2. Create method that creates paginator in application.rb

  def pages_for(size, options = {})
    default_options = {:per_page => 10}
    options = default_options.merge options
    pages = Paginator.new self, size, options[:per_page], (params[:page]||1)
    pages
  end

3. Perform the search in the controller

  def search
    @query=params[:query]
    @total, @articles = Article.full_text_search(@query, :page => (params[:page]||1))	  
    @pages = pages_for(@total)
  end

4. Use it in the article view

...
   <%= pagination_links(@pages, :params => {:query=>@query}) %>
...

Final word

The ferret fulltext engine is fast, flexible, but needs more programming than MySQL full text index.

Full text search in Ruby on Rails 2 - MySQL

My previous post compared MySQL and ferret full text search
engines. For our project, the ferret was the winner. Nevertheless, I
will try to show the beauty and simplicity of using MySQL indexes.


Create table and indices


First of all it is necessary to create table and the corresponding
index.

CREATE TABLE articles(
  id integer NOT NULL PRIMARY KEY AUTO_INCREMENT,
  title varchar(20),
  body varchar(100),
  fulltext(title, body)
) engine = MyISAM;<BR>

or create index after the table exists.

CREATE fulltext INDEX x_f_articles_body ON articles(body);

Please note the MyISAM engine. You cannot create full text index
on InnoDB tables.


And now, let’s insert some data

INSERT INTO articles(title, body)
  SELECT "Databases and IT", "Todays world ... database... MySQL, Sybase";
 
INSERT INTO articles(title, body) 
  SELECT "Sybase RS manual", "Sybase Replication server is a ...";
 
INSERT INTO articles(title, body) 
  SELECT "Sybase technology", "ASE, RS, IQ, PowerBuilder, all of them are...";
 
INSERT INTO articles(title, body) 
  SELECT "Databases and people", "People are using databases without knowing it...";
 
INSERT INTO articles(title, body) 
  SELECT "People everywhere", "Human population is increasing...";

Query syntax


Querying is simple. It is part of the MySQL dialect. Simple Boolean query searches for articles with “Databases”
keyword looks similar to this:

SELECT * 
FROM articles 
WHERE match(title,body) against ("Databases");






















id

title

body

4

Databases and people

People are using databases without knowing it…

1

Databases and IT

Today’s world … database… MySQL, Sybase

or you can create query looking for all database related articles

SELECT * 
FROM articles 
WHERE match(title,body) against ("Databases" WITH query expansion);



























id

title

body

1

Databases and IT

Todays world … database… MySQL, Sybase

4

Databases and people

People are using databases without knowing it…

5

People everywhere

Human population is increasing…

Note, that the “match” columns must be the same as they were in the
create index statement.


See also the record ID=5. There is nothing about databases, but it is
in the result set anyway. The query expansion means, that the MySQL
engine goes through the index twice. In first run it finds all
records with the searched keyword and builds a set of keywords that
appear together with the search string. In the second run it
searches for the expanded set of keywords. Since the “Databases”
are together with “People” in record ID=4, it was
returned as relevant.

This is useful. Unfortunately the result set is often too big.


Also note, that there are not records 2 and 3 containing the Sybase keyword (remember, record 1 contains both Databases and Sybase keywords). The reason is
simple. MySQL weights the words according to their frequency. If the
word is too often in the articles, the weight becomes 0 and thus not
relevant. You have always to remember it, because it’s a feature, not a
bug!


Search


So, let’s see it in action. The fulltext search in rails is as simple as
any other search.

Articles.find(:all, :conditions => ["match(title,body) against  (?)", "Databases"] )

naturally it is possible to combine it with other expressions:

Articles.find(:all, :conditions => ["match(title,body) against  (?) and id > ?",
"Databases", 2] )

Pagination


Pagination is as simple as it could be. You can use the same
pagination methods as for any other ActiveRecord query.

def list
  @article_pages, @articles = paginate :articles,
    :per_page => 10, 
    :conditions => ["match(title,body) against  (?)", "Databases"]
end

Scoping


Scoping is a feature you cannot do with ferret. How does it work?


First of all, imagine that you have a really “complex”
function looking for new articles (ID > 2).

def new_articles
  Articles.find( :all, :conditions => [ "id > ?", 2 ] )
end

Than your boss come and say… OK, but I would like to have
there a possibility to display only new articles about… for example
“Databases”.

Well, you can change the function, or you can scope it. Since you cannot learn now anything new by rewriting functions, we will try to
scope it.

def bosses_DB_articles
  Articles.with_scope(:find => {:conditions => ["match(title,body) against  (?)", "Databases"]} ) do
    new_articles
  end
end

And that’s the whole trick!


Final word


As I said. Searching with MySQL is really easy, but sometimes
it gives unexpected results.