Sphinx search server

Sphinx is an open source search engine designed to search text extremely quickly. It can be included in your programs to provide custom search functionality. In this document I am trying to explain you how to set it up in a linux machine and how php programs, that use mysql database, can call the sphinx search engine to provide custom search feature. So let us start with the installation part.

Installation steps for Sphinx Search Server

Download and extract the tarball of sphinx

[bash]

$ tar xzvf sphinx-0.9.8.tar.gz
$ cd sphinx

[/bash]

Run the configuration program:

[bash]

$ ./configure

[/bash]

We can specify the location where sphinx should be installed by using –prefix option.
Build the binaries.

[bash]

$ make

[/bash]

Install the binaries.

[bash]
$ make install

[/bash]

By default, Sphinx utilities are installed in /usr/local/bin/. Sphinx has three components: an index generator, a search engine, and a command-line search utility: The index generator is called indexer. It queries your database, indexes each column in each row of the result, and ties each index entry to the row’s primary key. The search engine is a daemon called searchd. The daemon receives search terms and other parameters, scours one or more indices, and returns a result. If a match is made, searchd returns an array of primary keys. Given those keys, an application can run a query against the associated database to find the complete records that comprise the match. Searchd communicates to applications through a socket connection on port 3312. The handy search utility lets you conduct searches from the command line without writing code. If searchd returns a match, search queries the database and displays the rows in the match set. The search utility is useful for debugging your Sphinx configuration.

To use Sphinx, you will need to create a configuration file. Default configuration file name is sphinx.conf. All Sphinx programs look for this file in current working directory by default. Sample configuration file, sphinx.conf.dist, which has all the options documented, is created by configure. Copy and edit that sample file to make your own configuration.

[bash]
$ cd /usr/local/etc
$ cp sphinx.conf.dist sphinx.conf
[/bash]

To start with sphinx, you must define one or more sources and one or more indexes. A source identifies the database to index, provides authentication information, and defines the query to use to construct each row. An index requires a source (that is, a set of rows) and defines how the data extracted from the source should be cataloged. You define your source(s) and index(es) in the sphinx.conf file. Sample configuration file is setup to index documents table from Mysql database test. So there’s example.sql sample data file to populate that table with a few documents for testing purposes.

[bash]

$ mysql -u test < /usr/local/sphinx/etc/example.sql

[/bash]

We need to specify database information in configuration file like the following.

[bash]

source src1
{
# data source type. mandatory, no default value
# known types are mysql, pgsql, mssql, xmlpipe, xmlpipe2, odbc
type                                    = mysql

#####################################################################
## SQL settings (for ‘mysql’ and ‘pgsql’ types)
#####################################################################

# some straightforward parameters for SQL source types
sql_host                                = localhost
sql_user                                = test
sql_pass                                = test
sql_db                                  = test
sql_port                                = 3306  # optional, default is 3306

[/bash]

Next, create a query to produce rows to be indexed. The sql_query must include the primary key you want to use for subsequent lookups, and it must include all the fields you want to index and use as groups. It is specified in the configuration file as the following.

[bash]
sql_query                               = \
SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
FROM documents
[/bash]

And the search utility uses sql_query_info to find the records that match. In the query, $id is replaced with each primary key that searchd returns.

[bash]
sql_query_info          = SELECT * FROM documents WHERE id=$id
[/bash]

Next we need to build an index.

[bash]
index test1
{
# document source(s) to index
source= src1
# index files path and file name, without extension
# mandatory, path must be writable, extensions will be auto-appended
path                    = /var/test1
}
[/bash]

Here source is src1 and path defines where to store the index data. You have to make sure that this directory exists before generating the index. The searchd section at bottom configures the searchd daemon itself.

[bash]
searchd
{
port                = 3312
log                    = /var/log/searchd/searchd.log
query_log            = /var/log/searchd/query.log
pid_file            = /var/log/searchd/searchd.pid
}
[/bash]

We are now ready to build the index for the database table. Before running indexer program to generate indices you have to make sure that mysql program is running.

[bash]
$/usr/local/bin/indexer –config /usr/local/etc/sphinx.conf –all
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file ‘/usr/local/etc/sphinx.conf’…
indexing index ‘test1’…
collected 6 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 6 docs, 243 bytes
total 0.018 sec, 12926 bytes/sec, 319.16 docs/sec

[/bash]

The -all argument rebuilds all the indexes listed in sphinx.conf. You can use a different argument to rebuild fewer if you don’t need to rebuild every index.

You can now test the index with the search utility

[bash]
# /usr/local/bin/search –config /usr/local/etc/sphinx.conf spark
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file ‘/usr/local/etc/sphinx.conf’…
index ‘test1’: query ‘spark ‘: returned 2 matches of 2 total in 0.006 sec

displaying matches:
1. document=5, weight=2, group_id=2, date_added=Sun Feb 13 11:50:30 2011
id=5
group_id=2
group_id2=9
date_added=2011-02-13 11:50:30
title=spark
content=spark support
2. document=6, weight=2, group_id=2, date_added=Sun Feb 13 11:53:30 2011
id=6
group_id=2
group_id2=10
date_added=2011-02-13 11:53:30
title=spark cochin
content=spark support cochin
[/bash]

To query the index from your PHP scripts, you need to run the search daemon which your script will talk to and Include the API (it’s located in api/sphinxapi.php) into your own scripts and use it.

Leave a Reply

Your email address will not be published. Required fields are marked *