Sphinx (search engine) 使用笔记
Table of Contents

Sphinx 简介

Sphinx 是一个轻量级的全文检索引擎。

部分同类开源产品

另外,Redis 的不稳定版已经支持加载外部模块RediSearch 就是一个全文检索的模块。以 Redis 社区的活跃程度,此类基于 Redis 模块方式的检索引擎应该会越来越成熟。

查看更多同类产品

基本原理

Sphinx 包含几个可执行程序:wordbreaker,indexer,searchd 等。

部分功能

中文支持

API 协议

Sphinx 源码已经提供了 Python,Ruby,PHP,Java,C 这几种语言的 API 包了,如需实现其他语言的 API 包,就要分析 Sphinx API 协议了。具体可以通过 Python 的客户端 sphinx-x.x.x/api/sphinxapi.py 分析。下面分析一个 wireshark 抓取的 Sphinx 请求包:

请求

发出一个检索 "hello" 的请求,返回数量限制(limit)为 20。

DATA 内容:

00:00:00:00:00:00:00:14:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:
00:00:00:06:68:65:6c:6c:6f:20:00:00:00:02:00:00:00:64:00:00:00:01:00:00:
00:01:2a:00:00:00:01:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:
00:00:00:00:00:00:00:00:00:00:00:00:00:03:e8:00:00:00:0b:40:67:72:6f:75:
70:20:64:65:73:63:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:
00:01:2a

对应的结构:

offset:         00:00:00:00:
limit:          00:00:00:14:
mode:           00:00:00:00:
ranker:         00:00:00:00:
sortMode:   00:00:00:00:
sortByLen:  00:00:00:00:
queryLen:   00:00:00:06:
query:          68:65:6c:6c:6f:20: (hello )
weightsLen: 00:00:00:02:
weights:    00:00:00:64:
weights:    00:00:00:01:
indexLen:   00:00:00:01:
index:          2a: (*)
id64 range marker:  00:00:00:01:
minID:          00:00:00:00:00:00:00:00:
maxID:          00:00:00:00:00:00:00:00:
filters len:    00:00:00:00:
groupFunc:  00:00:00:00:
groupBy len:    00:00:00:00:
maxMatches: 00:00:03:e8: (1000)
groupSort len:  00:00:00:0b:
groupSort:  40:67:72:6f:75:70:20:64:65:73:63: ("@group desc")
cutoff:         00:00:00:00:
retryCount: 00:00:00:00:
retryDelay: 00:00:00:00:
groupDistinct len:  00:00:00:00:
anchor point flag:  00:00:00:00:
indexWeights len:   00:00:00:00:
maxQueryTime:   00:00:00:00:
fieldWeights:   00:00:00:00:
comment len:    00:00:00:00:
overrides len:  00:00:00:00:
select len: 00:00:00:01:
select:     2a: ("*")