Archive for the ‘How-to’ Category

Ignore files in subversion

This is just a simple procedure how to tell subversion to ignore files or directories.

cd parent_directory
 
# Check the current setup
svn proplist -v .
 
# set the editor to edit the properties
export EDITOR=vi
 
# open up editor with the properties
svn propedit svn:ignore .

Text editor opens (vi in my case) Here you have to specify files and directories to be ignored. E.g.

docs
*.log

and last, but not least, commit.

svn commit

Improve performance of MySQL driver for RoR

Last week I was working on the performance tuning of a rails application. I ran a profiler and found something very interesting.

I found that there is a procedure that is called very often and takes a lot of time.

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 14.99    18.13     18.13    43430     0.42     0.85  Mysql#get_length
  8.84    28.82     10.69   172751     0.06     0.09  Kernel.===
  8.40    38.98     10.16   306964     0.03     0.04  Fixnum#==
  7.59    48.16      9.18     5566     1.65    30.60  Integer#times
  6.42    55.92      7.76     7582     1.02     1.63  Mysql::Net#read
...
  0.00   120.92      0.00        1     0.00 120920.00  #toplevel

The procedure was Mysql#get_length:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# File src/rails-1.2.3/activerecord/lib/active_record/vendor/mysql.rb
  def get_length(data, longlong=nil)
    return if data.length == 0
    c = data.slice!(0)
    case c
    when 251
      return nil
    when 252
      a = data.slice!(0,2)
      return a[0]+a[1]*256
    when 253
      a = data.slice!(0,3)
      return a[0]+a[1]*256+a[2]*256**2
    when 254
      a = data.slice!(0,8)
      if longlong then
        return a[0]+a[1]*256+a[2]*256**2+a[3]*256**3+
          a[4]*256**4+a[5]*256**5+a[6]*256**6+a[7]*256**7
      else
        return a[0]+a[1]*256+a[2]*256**2+a[3]*256**3
      end
    else
      c
    end
  end

There is obviously space for improvement! Replace the multiplications by shifts

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class Mysql
  def get_length(data, longlong=nil)
    return if data.length == 0
    c = data.slice!(0)
 
    case c
    when 251
      return nil
    when 252
      a = data.slice!(0,2)
      return a[0]+(a[1]<<8)
    when 253
      a = data.slice!(0,3)
      return a[0]+(a[1]<<8)+(a[2]<<16)
    when 254
      a = data.slice!(0,8)
      if longlong then
        return a[0]+(a[1]<<8)+(a[2]<<16) +(a[3]<<24)+(a[4]<<32)+(a[5]<<40)+(a[6]<<48)+(a[7]<<56)
      else
        return a[0]+(a[1]<<8)+(a[2]<<16)+(a[3]<<24)
      end
    else
      c
    end
  end
end

I ran the profiler again and, well… a wisdom of my university times popped on my mind: “There is an elegant, simple, nice and obvious solution for each problem. Unfortunately, it is wrong!”

Performance remained the same. So, deeper investigation is needed! It was not difficult to find out that most of the times the “else” branch is executed. I tried something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
  def get_length(data, longlong=nil)
    return if data.length == 0
    c = data.slice!(0)
 
    return c if c<251
 
    case c
    when 251
      return nil
    when 252
      a = data.slice!(0,2)
      return a[0]+(a[1]<<8)
    when 253
      a = data.slice!(0,3)
      return a[0]+(a[1]<<8)+(a[2]<<16)
    when 254
      a = data.slice!(0,8)
      if longlong then
        return a[0]+(a[1]<<8)+(a[2]<<16) +(a[3]<<24)+(a[4]<<32)+(a[5]<<40)+(a[6]<<48)+(a[7]<<56)
      else
        return a[0]+(a[1]<<8)+(a[2]<<16)+(a[3]<<24)
      end
    else
      c
    end
  end

And the performance?

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 11.26    11.25     11.25    43430     0.26     0.38  Mysql#get_length
  8.73    19.97      8.72     5566     1.57    23.25  Integer#times
  7.77    27.73      7.76     7582     1.02     1.63  Mysql::Net#read
  7.27    34.99      7.26     5140     1.41     3.69  Array#each
  5.52    40.50      5.51   139526     0.04     0.05  Fixnum#==
...
  0.00    99.88      0.00        1     0.00 99880.00  #toplevel

Improved! The toplevel cumulative time is down by over 20 seconds.
Now, here’s how you can embed this hack into your application:

  • Create a file called e.g. mysql_fix.rb
  • Add there
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
require 'active_record/vendor/mysql'
 
class Mysql
  def get_length(data, longlong=nil)
    return if data.length == 0
    c = data.slice!(0)
 
    return c if c < 251
 
    case c
    when 251
      return nil
    when 252
      a = data.slice!(0,2)
      return a[0]+(a[1]<<8)
    when 253
      a = data.slice!(0,3)
      return a[0]+(a[1]<<8)+(a[2]<<16)
    when 254
      a = data.slice!(0,8)
      if longlong then
        return a[0]+(a[1]<<8)+(a[2]<<16) +(a[3]<<24)+(a[4]<<32)+(a[5]<<40)+(a[6]<<48)+(a[7]<<56)
      else
        return a[0]+(a[1]<<8)+(a[2]<<16)+(a[3]<<24)
      end
    else
      c
    end
  end
end
  • Put it into e.g. lib/zmok directory
  • Add into your environment.rb following line
1
require File.join(File.dirname(__FILE__), '../lib/zmok/mysql_fix')

It is funny how my performance tuning session ended. Instead of changing my Rails application, I ended up improving the MySQL driver.

Bayes classification in Ruby made easy

Recently I was experimenting with ruby bayes classification. At first sight it looks like a difficult topic, but with the right libraries it is interesting and funny.

Before you start experimenting, you have to install 3 gems.

gem install classifier
gem install madeleine

Confirm the required stemmer gem.

For the beginning, lets experiment with the plain bayes classifier.

1
2
3
4
5
6
7
8
require 'classifier'
 
bayes = Classifier::Bayes.new 'funny', 'sad', 'neutral'
 
# Train it slightly...
bayes.train 'funny', 'Finally all of them were smiling'
bayes.train :sad, 'Little ill puppy'
bayes.train :neutral, 'Tax declaration'

The classifier is “trained”, so lets ask it something interesting…

1
2
bayes.classify 'Everybody have to pay taxes'
=> "Neutral"

Hmmm… this does not look like the expected answer :o). We have probably trained it incorrectly. So, let’s undo it:

1
2
3
4
5
6
7
8
# Remove the incorrect statement
bayes.untrain :neutral, 'Tax declaration'
 
# Train it right
bayes.train :sad, 'Tax declaration'
 
# And provide something neutral (if there is no statement for a category, the classifier does not work as expected.
bayes.train :neutral, 'Rainbow is full of colors'

So, how does the classifier sees it now?

1
2
bayes.classify 'Everybody have to pay taxes'
=>'Sad'

Yes, this is how people feel it :o). For those who does not agree (and also for debugging purposes) it is possible to see score for each category.

1
2
bayes.classifications 'Everybody have to pay taxes'
=> {"Sad"=>-9.43348392329039, "Neutral"=>-10.2035921449865, "Funny"=>-10.2035921449865}

The classifier that was created and trained is nice, but disappears as soon as you stop your ruby console. To make it more persistent, you have to use Madeleine class.
“Madeleine is a Ruby implementation of Object Prevalence, that is, transparent persistence of business objects using command logging and complete system snapshots.”

require ‘madeleine’

  1. Store the data into bayes-dir directory
    madeleine = SnapshotMadeleine.new(“bayes-dir”) { bayes }
    madeleine.take_snapshot

    Next time load the classifier with command

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    madeleine = SnapshotMadeleine.new("bayes-dir")
     
    # Perform more training
    madeleine.system.train "sad", "Many people were injured by the earthquake"
     
    # And test it once more
    madeleine.system.classify 'smiling face'
    =>'funny'
    madeleine.system.classify 'strong earthquake'
    =>'sad'

    The classifier is a nice piece of code. I did enjoy it, and hope you will enjoy it too.

Ad-hoc fulltext search in RoR ActiveRecord

I came to a situation where I needed to search my Active record, but I did not know which field contains the information. The solution with Ferret was just three steps away…

Let’s say, you want to search Stories for ‘Giant’ keyword. You have to create a Ferret index in memory (ferret gem needs to be installed), index all active records and gather all IDs matching the keyword.

1
2
3
4
5
6
7
index=Ferret::I.new
 
Story.find(:all).each { |s| index << {:id=>s.id, :content=>s.inspect} }
 
index.search_each('Giant', :limit=>100) do |id, score| 
  puts "Active record ID: #{index[id][:id]} with score #{score}"
end

… now you have the full power of the Ferret engine in your hands.

Utilizing Caches.rb with Ferret

We needed to cache a Ruby class method calling the Ferret indexing engine.
Yurii Rashkovskii developed a great library called Caches.rb.

When I googled it out, it seemed very simple to use and promised to do EXACTLY what I need (even the default timeout was JUST IT). I especially liked the very Rails-like tutorial Don’t tell, show me!. However, it required quite some effort to make it work, mainly because of the rather sparse documentation. Still, in the end the usage is very elegant, the solution is simple and it does what it promises. Thank you, Yurii!

To help our esteemed readers get faster over that less agreeable middle phase, here are a few tips:

  • Downloading it: I tried gems but the gem list server seemed to be overloaded, and when it worked at last, I just got an older version (0.2.0). When checking out (or exporting) version 0.4.0 from SVN trunk directly, the trick was in finding out the latest working SVN URL:
    ruby script/plugin install http://svn.verbdev.com/rb/caches.rb/trunk

  • With all the typical Rails mixin stuff petrified in my mind, it took me a while to notice that caching should be declared AFTER the definition of the method to be cached, and not at the beginning of the class definition.
  • The example in the documentation shows it, but it’s easily overseen.

  • We do use Rails, so I included class_cache_storage Caches::Storage::Global as suggested here
  • For some reason (I suspect my shallow knowledge of Ruby ;-), I did not manage to successfully require ‘caches.rb’ from the plugin installation dir _#{RAILS_ROOT}/vendor/plugins/caches.rb/lib_, so I copied it to _#{RAILS_ROOT}/lib_, which helped.
  • For similar reasons, I had to use a workaround to extend the class definition with caching, instead of the recommended way of extending conf/environment.rb by ActiveRecord::Base.extend Caches::ClassMethods

So our class looks as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
require 'ferret'
include Ferret
require 'caches.rb'
 
class FIndex
  extend Caches
  ...
  def self.search(user=nil, results_per_page=10)
    ...
  end
  class_cache_storage Caches::Storage::Global
  class_caches :search
end

N.B.: The class to be cached works with Ferret and not the DB, so it did not inherit from ActiveRecord.

  • For a pure RoR developer, the terminology may be a little confusing (which indicates that Yurii can deal with more programming languages than just Ruby):
    • class methods mentioned in the library description are obviously just a shorthand for “the methods of a class”
  • in Yurii’s documentation, Ruby class methods are called static methods (as known in C/C++ or Java), and their caching is supported by class_caches (feature not available in caches.rb 0.2.0 from gems or RubyForge, but included in the more recent versions from SVN, like 0.4.0 we are using)