Jun 6, 2009

Quick Tip: Search Email in Thinking Sphinx

If you use Sphinx as your search backend you may have noticed that it won't search for an entire email address. It's a bit perplexing at first since everything else works so well. After banging my head on the issue for a few hours, I got some direct help from Pat Allan, the creator of Thinking Sphinx. The problem is with the "@" symbol which is a reserved character in Sphinx. To get around that we need to modify Sphinx's allowable character set.

Now for better or for worse, you don't actually need to define any sort of config file by default to use Thinking Sphinx.. it's smart enough to pick sensible defaults. But in this case we'll have to go ahead and create one.


Every Ruby on Rails project comes with a config folder where many files you may need to modify are stored. In this directory create a file called sphinx.yml. It's a YAML file so be careful about spacing here. It's as sensitive as your database.yml file so if you accidentally alter the spacing things may not work.

For our example, let's use the UTF-8 character set. So in that sphinx.yml file, paste this content:

development:
port: 3312
charset_table: "0..9, a..z, _, @, A..Z->a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F"
test:
port: 3313
charset_table: "0..9, a..z, _, @, A..Z->a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F"
production:
port: 3312
charset_table: "0..9, a..z, _, @, A..Z->a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F"


There are a number of other options you can put in your configuration file of course, but the charset_table one is going to be the one that helps you search email. In order to see the effect, you'll need to rebuild your index. If you've got a recent version of the Thinking Sphinx plugin:

rake thinking_sphinx:rebuild


Otherwise, you'll need to stop, index, and then start:
rake thinking_sphinx:stop
rake thinking_sphinx:index
rake thinking_sphinx:start


After that searching for emails should work!

1 comment:

  1. This may break some extended search functionality, I believe. My solution was to just escape the @ symbol. Admittedly my application is php, but it shouldn't be too hard with RoR.

    My code was:
    $search = str_replace("@", "\\\@", $search);

    This ensures that "\@" is passed to Sphinx which solved the problem in my case

    Best of luck!

    ReplyDelete