Ad-hoc fulltext search in RoR ActiveRecord

I came to a situation where I needed to search my Active record, but I did not know which field contains the information. The solution with Ferret was just three steps away…

Let’s say, you want to search Stories for ‘Giant’ keyword. You have to create a Ferret index in memory (ferret gem needs to be installed), index all active records and gather all IDs matching the keyword.

1
2
3
4
5
6
7
index=Ferret::I.new
 
Story.find(:all).each { |s| index << {:id=>s.id, :content=>s.inspect} }
 
index.search_each('Giant', :limit=>100) do |id, score| 
  puts "Active record ID: #{index[id][:id]} with score #{score}"
end

… now you have the full power of the Ferret engine in your hands.

acts_as_ferret tip: uninitialized constant Ferret::Index::FieldInfos

I have upgraded acts_as_ferret plugin to the latest version and my fulltext search stopped searching. It was throwing an error message:

NameError (uninitialized constant Ferret::Index::FieldInfos):
    /opt/local/lib/ruby/gems/1.8/gems/activesupport-1.4.2/lib/active_support/dependencies.rb:263:in `load_missing_constant'
    /opt/local/lib/ruby/gems/1.8/gems/activesupport-1.4.2/lib/active_support/dependencies.rb:452:in `const_missing'
    .//vendor/plugins/acts_as_ferret/lib/acts_as_ferret.rb:119:in `field_infos'
    .//vendor/plugins/acts_as_ferret/lib/local_index.rb:58:in `rebuild_index'
    .//vendor/plugins/acts_as_ferret/lib/local_index.rb:37:in `ensure_index_exists'
    .//vendor/plugins/acts_as_ferret/lib/local_index.rb:9:in `initialize'
    .//vendor/plugins/acts_as_ferret/lib/class_methods.rb:304:in `new'
    .//vendor/plugins/acts_as_ferret/lib/class_methods.rb:304:in `create_index_instance'
    .//vendor/plugins/acts_as_ferret/lib/class_methods.rb:55:in `aaf_index'
    .//vendor/plugins/acts_as_ferret/lib/class_methods.rb:120:in `find_id_by_contents'
    .//vendor/plugins/acts_as_ferret/lib/class_methods.rb:176:in `ar_find_by_contents'
    .//vendor/plugins/acts_as_ferret/lib/class_methods.rb:170:in `find_records_lazy_or_not'
    .//vendor/plugins/acts_as_ferret/lib/class_methods.rb:86:in `find_by_contents'

If you observe the same error, try to run ruby setup.rb in directory …/ruby/gems/1.8/gems/ferret-0.11.4. It seems that gem install ferret needs a small help.

Full text search in Ruby on Rails 3 - ferret

There are several possibilities how to use ferret in RoR. This post will show the easy way – using the acts_as_ferret plugin.

To show the syntax and code, I will use the same data objects as in the Full text search in ruby on rails 2 – MySQL

Installation

Ferret installation is easy

gem install ferret

will do the job.

In addition, it is necessary to install the acts_as_ferret plugin.

script/plugin install svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret

Setup

The most simple setup is

class Article > ActiveRecord::Base
  acts_as_ferret	
end

This is enough to make the full text engine working. Now you can test it in the Rails console

Article.find_by_contents("sybase")

If you have a lot of data to be indexed, be patient with the first run. It is slow, because the index needs to be built.

The acts_as_ferret with no argument indexes automatically all fields of the Article, including arrays of child objects. This behaviour could be overwritten. You can narrow the field set

# Index only id and body, not title
acts_as_ferret :fields => [ 'id', 'body' ]

Or you can widen the field set.

acts_as_ferret :fields => [ 'id', 'body', 'title', 'long_article' ]
 
# Compute the article length
def long_article
  self.body.length > 40
end

Note 1: see usage of long_article in Query syntax below

Note 2: once you change the structure of the index, you need to rebuild it. The easiest way is to stop your application and delete the index/~environment~/~Indexed object~ folder. It will be created automatically with the next search request.

Query syntax

Since ferret is a port of the lucene engine, it uses the same query syntax. I will show only a few queries that you can use.

For details see Lucene documentation

  # Search for pages with "sybase" keyword
  Article.find_by_contents("sybase")
 
  # "sybase" and "replication" keywords
  Article.find_by_contents("sybase replication")
 
  # "sybase" or "replication"
  Article.find_by_contents("sybase OR replication")
 
  # short articles about sybase
  Article.find_by_contents("long_article:(false) *:sybase")
 
  # articles containing similar words like "increase"
  # will return e.g. increasing
  Article.find_by_contents("increase~")

Pagination

Ferret is fast, ferret is flexible, but… it is not an active record object, so you cannot use the pre-defined pagination. You have to implement it on your own. Here is how we did it in our project www.tamtami.com.

1. Create full text search function in the model

  def self.full_text_search(q, options = {})
    return nil if q.nil? or q==""
    default_options = {:limit => 10, :page => 1}
    options = default_options.merge options
    options[:offset] = options[:limit] * (options[:page].to_i-1)
    results_ids = []
 
    num = self.ferret_index.search_each("*:(#{q})", {:num_docs => options[:limit], :first_doc => options[:offset]}) { |doc, score|
      results_ids << self.ferret_index[doc]["id"]
    }
    results = Article.find(results_ids)
    return [num, results]
  end

or more elegant, as proposed by Jens Kraemer

  def self.full_text_search(q, options = {})
    return nil if q.nil? or q==""
    default_options = {:limit => 10, :page => 1}
    options = default_options.merge options
    options[:offset] = options[:limit] * (options.delete(:page).to_i-1)  
    results = Article.find_by_contents(q, options)
    return [results.total_hits, results]
  end

2. Create method that creates paginator in application.rb

  def pages_for(size, options = {})
    default_options = {:per_page => 10}
    options = default_options.merge options
    pages = Paginator.new self, size, options[:per_page], (params[:page]||1)
    pages
  end

3. Perform the search in the controller

  def search
    @query=params[:query]
    @total, @articles = Article.full_text_search(@query, :page => (params[:page]||1))	  
    @pages = pages_for(@total)
  end

4. Use it in the article view

...
   <%= pagination_links(@pages, :params => {:query=>@query}) %>
...

Final word

The ferret fulltext engine is fast, flexible, but needs more programming than MySQL full text index.