Full text search in Ruby on Rails 3 – ferret

There are several possibilities how to use ferret in RoR. This post will show the easy way – using the acts_as_ferret plugin.

To show the syntax and code, I will use the same data objects as in the Full text search in ruby on rails 2 – MySQL

Installation

Ferret installation is easy

gem install ferret

will do the job.
In addition, it is necessary to install the acts_as_ferret plugin.

script/plugin install svn://projects.jkraemer.net/acts_as_ferret/tags/stable/acts_as_ferret

Setup

The most simple setup is

class Article > ActiveRecord::Base
  acts_as_ferret	
end

This is enough to make the full text engine working. Now you can test it in the Rails console

Article.find_by_contents("sybase")

If you have a lot of data to be indexed, be patient with the first run. It is slow, because the index needs to be built.

The acts_as_ferret with no argument indexes automatically all fields of the Article, including arrays of child objects. This behaviour could be overwritten. You can narrow the field set

# Index only id and body, not title
acts_as_ferret :fields => [ 'id', 'body' ]

Or you can widen the field set.

acts_as_ferret :fields => [ 'id', 'body', 'title', 'long_article' ]

# Compute the article length
def long_article
  self.body.length > 40
end

Note 1: see usage of long_article in Query syntax below
Note 2: once you change the structure of the index, you need to rebuild it. The easiest way is to stop your application and delete the index/~environment~/~Indexed object~ folder. It will be created automatically with the next search request.

Query syntax

Since ferret is a port of the lucene engine, it uses the same query syntax. I will show only a few queries that you can use.
For details see Lucene documentation

  # Search for pages with "sybase" keyword
  Article.find_by_contents("sybase")

  # "sybase" and "replication" keywords
  Article.find_by_contents("sybase replication")

  # "sybase" or "replication"
  Article.find_by_contents("sybase OR replication")

  # short articles about sybase
  Article.find_by_contents("long_article:(false) *:sybase")

  # articles containing similar words like "increase"
  # will return e.g. increasing
  Article.find_by_contents("increase~")

Pagination

Ferret is fast, ferret is flexible, but… it is not an active record object, so you cannot use the pre-defined pagination. You have to implement it on your own. Here is how we did it in our project www.tamtami.com.

1. Create full text search function in the model

  def self.full_text_search(q, options = {})
    return nil if q.nil? or q==""
    default_options = {:limit => 10, :page => 1}
    options = default_options.merge options
    options[:offset] = options[:limit] * (options[:page].to_i-1)
    results_ids = []

    num = self.ferret_index.search_each("*:(#{q})", {:num_docs => options[:limit], :first_doc => options[:offset]}) { |doc, score|
      results_ids << self.ferret_index[doc]["id"]
    }
    results = Article.find(results_ids)
    return [num, results]
  end

or more elegant, as proposed by Jens Kraemer

  def self.full_text_search(q, options = {})
    return nil if q.nil? or q==""
    default_options = {:limit => 10, :page => 1}
    options = default_options.merge options
    options[:offset] = options[:limit] * (options.delete(:page).to_i-1)  
    results = Article.find_by_contents(q, options)
    return [results.total_hits, results]
  end

2. Create method that creates paginator in application.rb

  def pages_for(size, options = {})
    default_options = {:per_page => 10}
    options = default_options.merge options
    pages = Paginator.new self, size, options[:per_page], (params[:page]||1)
    pages
  end

3. Perform the search in the controller

  def search
    @query=params[:query]
    @total, @articles = Article.full_text_search(@query, :page => (params[:page]||1))	  
    @pages = pages_for(@total)
  end

4. Use it in the article view

...
   <%= pagination_links(@pages, :params => {:query=>@query}) %>
...

Final word

The ferret fulltext engine is fast, flexible, but needs more programming than MySQL full text index.

21 comments

  1. Do searches with acts_as_ferret also search foreign keys fields? For example if Article had a field author_id which linked to a model Author would I be able to search by author name and get articles back?

  2. In order to conform to the usage of +options[:limit]+ in #find_by_contents, Model::full_text_search could be modified like so (in psuedo-diff format):

    - options[:offset] = options[:limit] * ( options.delete(:page).to_i - 1 )
    + if options[:limit].respond_to?('*')
    + options[:offset] = options[:limit] * ( options.delete(:page).to_i - 1 )
    + end

  3. Matthew, to do what you want, the easiest thing to do is define a method in your Article class that acts_as ferret refers to. For example, in the article class

      acts_as_ferret :fields => [:title, :body, :author_name]
      has_one :author
    
      def author_name
        author.name
      end
    

    The :author_name symbol refers to the method which ferret calls when it does its magic.

    You can also do something similar with has_many relationships like this:

      acts_as_ferret :fields => [:title, :body, :author_names]
      has_many :authors
    
      def author_names
        authors.collect{|a| a.name}
      end
    

    If you found that useful, then you can thank me by visiting my website (squeat.com).

  4. Any chance on Ajaxing? I like to figure out how to do instant feedback search with ferret … any help would be great.

  5. 2 Nico
    Well, I would change the search query to something like “YourString*”.
    But there is a danger. The ferret engine tries first to expand a wilcard query into non-wildcard queries. As a result, there might be too much “subqueries” and the search might fail with error message:

    Exception (: Error occured at :54
    Error: exception 6 not handled: Too many clauses
    ):

  6. I use a common function added in my application.rb for paginate:

      def paginate_collection(collection, options = {})
        default_options = {:per_page => 5, :page => 1}
        options = default_options.merge options
        
        pages = Paginator.new self, collection.size, options[:per_page], options[:page]
        first = pages.current.offset
        last = [first + options[:per_page], collection.size].min
        slice = collection[first...last]
        return [pages, slice]
      end
    

    Then, I can use in Article controller:

      def search
        @article_pages, @articless = paginate_collection Article.find_by_contents(params[:tag]), :per_page => 10, :page => @params[:page]
        render :action => 'list'
      end
    

    I hope this could be helpful for you.

  7. 2 Gregg Pollack
    Thanks Gregg, your tutorial is great!
    It is a pity I did not find it before. It could save me a lot of time :o)

  8. Related to Matthew and Toms post…

    I am trying to search a model’s relationships. For example Books have many authors and I want a search on Shakespeare to return all books by him. So in my Books model I have:

    acts_as_ferret :fields => [:title, :abstract, :author_name]
    has_many: authors

    def author_name
    authors.collect{|a| a.name}
    end

    But no results are found when I search for Shakespeare.

    Any suggestions?

  9. 2 Kim:
    I would guess the problem is the author_name method. It does not return string, but collection. I would try to change it to:

    def author_name
    authors.collect{|a| a.name}.join(‘ ‘)
    end

  10. Roman, thanks that worked. But what is the proper syntax when you are searching authors and want to search by books?

    
    Authors
    belongs_to books
    

    I tried

    
    def author_name
      author.name
    end
    

    Didn't work. Any suggestions? Thanks.

    *My app doesn't really have author and book model. Just an example.*

  11. 2 Kim:

    If you want to search for authors by books, you need to create a new index for your authors model. The same way as you did for books model.

  12. Hi Roman,
    I guess my question was not clear. My question is to do with relationships and the proper syntax for using acts_as_ferret.

    So when a book has_many Authors and we want to search books my authors we set the acts_as_ferret fields and then do something like this:

    
    def author_name
      authors.collect{|a| a.name}.join(' ')
    end
    

    But now lets say that Books belongs_to a publisher and we want to search books by publishers name. So we set the acts_as_ferret fields and when I try:

    
    def publisher_name
      publisher.name
    end
    

    This does not work. So my question is what is the proper syntax for a belongs_to relationship and for that matter a has_one relationship?

    Thanks for your help. This article has helped me out greatly.

  13. Hallo Kim,
    this should work. What is not working? What error message do you get?

Comments are closed.