柳州 网站推广国际网站怎么进
目标:
 一、搭建准确的千万级数据库的准实时搜索(见详情)
 二、实现词语高亮(客户端JS渲染,服务器端渲染,详见7.3)
 三、实现搜索联想(输入框onchange,ajax请求搜索,取10条在层上展示方可)
 四、实现词库管理(仅需管理scws下的自定义词库dd.txt即可)
 五、实现全文搜索(提供了两种方案,详见8)
案例:
 本文第五部分,针对实际应用场景,典型案例分析。
软件:
sphinx: sphinx-2.0.2-beta
 scws: scws-1.2.0
 ===========================================================================
一、Sphinx安装
 1、安装
# ./configure --prefix=/opt/server/sphinx --with-mysql=/opt/server/mysql # make # make install
|   1 2 3  |   # ./configure --prefix=/opt/server/sphinx --with-mysql=/opt/server/mysql # make # make install  | 
2、配置
 见sphinx.conf
 详见下文,多索引增量索引方案
3、php 扩展
 性能方面,扩展和直接使用API文件,差别不大;可以做选择;都在源码API中;
 个人建议使用API文件,系统更稳定
3.1 sphinx客户端libsphinxclient
# ./configure --prefix=/opt/server/libsphinxclient # make # make install
|   1 2 3  |   # ./configure --prefix=/opt/server/libsphinxclient # make # make install  | 
3.2 扩展
 下载 http://pecl.php.net/package/sphinx
# /opt/server/php/bin/phpize./configure --with-sphinx=/opt/server/libsphinxclient --with-php-config=/opt/server/php/bin/php-config # make # make install 查看 # /opt/server/php/bin/php -m |grep sphinx
|   1 2 3 4 5  |   # /opt/server/php/bin/phpize./configure --with-sphinx=/opt/server/libsphinxclient --with-php-config=/opt/server/php/bin/php-config # make # make install 查看 # /opt/server/php/bin/php -m |grep sphinx  | 
使用手册
 http://docs.php.net/manual/zh/book.sphinx.php
4、索引 启动服务
# /opt/server/sphinx/bin/indexer --all # /opt/server/sphinx/bin/searchd
|   1 2  |   # /opt/server/sphinx/bin/indexer --all # /opt/server/sphinx/bin/searchd  | 
二、php 分词 scws
 官网 http://www.ftphp.com/scws/
 1、 安装
# ./configure --prefix=/opt/server/scws # make # make install
|   1 2 3  |   # ./configure --prefix=/opt/server/scws # make # make install  | 
2、 词库
 scws-dict-chs-utf8.tar.bz2 解压放入 /opt/server/scws/etc
 词库 dict.utf-8.xdb
 规则 rules.utf-8.ini
3、 php 扩展
 源码在phpext下
# /opt/server/php/bin/phpize./configure --with-scws=/opt/server/scws --with-php-config=/opt/server/php/bin/php-config # make # make install
|   1 2 3  |   # /opt/server/php/bin/phpize./configure --with-scws=/opt/server/scws --with-php-config=/opt/server/php/bin/php-config # make # make install  | 
# vi php.ini [scws] extension = scws.so scws.default.charset = utf-8 scws.default.fpath = /opt/server/scws/etc 查看 # /opt/server/php/bin/php -m |grep scws
|   1 2 3 4 5 6 7  |   # vi php.ini [scws] extension = scws.so scws.default.charset = utf-8 scws.default.fpath = /opt/server/scws/etc 查看 # /opt/server/php/bin/php -m |grep scws  | 
4、 分词测试
 http://www.ftphp.com/scws/docs.php
 详见测试文件 test_all.php
三、 索引
//索引某个索引 # /opt/server/sphinx/bin/indexer test1 //searchd 索引某个索引 # /opt/server/sphinx/bin/indexer test1 --rotate //指定索引搜索 # /opt/server/sphinx/bin/indexer -i test1 '逗她男'
|   1 2 3 4 5 6  |   //索引某个索引 # /opt/server/sphinx/bin/indexer test1 //searchd 索引某个索引 # /opt/server/sphinx/bin/indexer test1 --rotate //指定索引搜索 # /opt/server/sphinx/bin/indexer -i test1 '逗她男'  | 
1、 增量索引方案
//创建表记录偏移 CREATE TABLE IF NOT EXISTS `search_counter` ( `counterid` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '统计标示', `max_doc_id` int(11) unsigned NOT NULL COMMENT '已统计数', PRIMARY KEY (`counterid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ; //增量索引 # /opt/server/sphinx/bin/indexer test1stemmed --rotate //合并索引 # /opt/server/sphinx/bin/indexer --merge test1 test1stemmed --rotate
|   1 2 3 4 5 6 7 8 9 10  |   //创建表记录偏移 CREATE TABLE IF NOT EXISTS `search_counter` ( `counterid` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT '统计标示', `max_doc_id` int(11) unsigned NOT NULL COMMENT '已统计数', PRIMARY KEY (`counterid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ; //增量索引 # /opt/server/sphinx/bin/indexer test1stemmed --rotate //合并索引 # /opt/server/sphinx/bin/indexer --merge test1 test1stemmed --rotate  | 
索引策略
 1、搜索时,同时从主索引和增量索引取数据
 2、每5分钟,运行一次增量索引;满足新数据搜索需求
 3、每晚,运行一次主索引,同时会更新索引标示;再运行增量索引,实质为清空增量索引,避免与主索引重复索引
 4、好处:避免开合并索引,合并索引效率较差
 5、如数据量特别大,可考虑合并索引的方案
索引策略shell
//add.sh #!/bin/sh /opt/server/sphinx/bin/indexer test1stemmed --rotate >> /opt/server/sphinx/var/log/add.sh.log //all.sh #!/bin/sh /opt/server/sphinx/bin/indexer test1 --rotate >> /opt/server/sphinx/var/log/all.sh.log /opt/server/sphinx/bin/indexer test1stemmed --rotate >> /opt/server/sphinx/var/log/add.sh.log
|   1 2 3 4 5 6 7  |   //add.sh #!/bin/sh /opt/server/sphinx/bin/indexer test1stemmed --rotate >> /opt/server/sphinx/var/log/add.sh.log //all.sh #!/bin/sh /opt/server/sphinx/bin/indexer test1 --rotate >> /opt/server/sphinx/var/log/all.sh.log /opt/server/sphinx/bin/indexer test1stemmed --rotate >> /opt/server/sphinx/var/log/add.sh.log  | 
四、 多个表独立索引方案
 场景:如有用户搜索、商品搜索等多个索引需求
 策略:配置一个多索引方案,每个表单独建立索引
 前端根据不同类型选择不同的查询索引;全部,即选择所有索引
 ===========================================================================
