2010-08-08

Xapian的检索

Xapian中有关查询的代码比索引复杂的多,因为它支持多种查询机制，而索引就只是循环叠加。如下面的多种查询机制:

概率性搜索排名
相关度反馈
词组和邻近搜索
全方位的布尔型搜索器
支持提取搜索关键字的词干
支持通配符查询
支持别名查询
Xapian支持拼写纠正

Xapian的查询语法有两种，在Xapian中Query类便起着“查询”的作用，Query类的生成方法有两种，第一种是由QueryParser类解析查询字符串生成，别一种则是创建多个表示不同描述表达式的Query类，然后再将这些Query按需组合起来,具体的有关Xapian的查询机制和查询语法的情况请参考博客利用Xapian构建自己的搜索引擎：检索。这篇博客详细的介绍了有关搜索引擎的性能，评价标准，xapian的检索机制和查询机制等。相信对使用Xapian的开发具有非常大的帮助。

Xapian中两种查询语法本质上都是一样的。第一种方法首先进行字符串解析，确定是哪一种查询机制，接着就会生成对应的Query类(代码中加粗的部分)，然后按需组合起来。下面是Query QueryParser::Internal::parse_query(const string &qs, unsigned flags, const string &default_prefix)方法中的一段代码:

if (op.size() == 3) {
        if (op == "AND") {
            Parse(pParser, AND, NULL, &state);
            goto just_had_operator;
        }
        if (op == "NOT") {
            Parse(pParser, NOT, NULL, &state);
            goto just_had_operator;
        }
        if (op == "XOR") {
            Parse(pParser, XOR, NULL, &state);
            goto just_had_operator;
        }
        if (op == "ADJ") {
            if (it != end && *it == '/') {
                size_t width = 0;
                Utf8Iterator p = it;
                while (++p != end && U_isdigit(*p)) {
                    width = (width * 10) + (*p - '0');
                }
                if (width && (p == end || is_whitespace(*p))) {
                    it = p;
                    Parse(pParser, ADJ, new Term(width), &state);
                    goto just_had_operator;
                }
            }

            Parse(pParser, ADJ, NULL, &state);
            goto just_had_operator;
        }
    } else if (op.size() == 2) {
        if (op == "OR") {
            Parse(pParser, OR, NULL, &state);
            goto just_had_operator;
        }
    } else if (op.size() == 4) {
        if (op == "NEAR") {
            if (it != end && *it == '/') {
                size_t width = 0;
                Utf8Iterator p = it;
                while (++p != end && U_isdigit(*p)) {
                    width = (width * 10) + (*p - '0');
                }
                if (width && (p == end || is_whitespace(*p))) {
                    it = p;
                    Parse(pParser, NEAR, new Term(width), &state);
                    goto just_had_operator;
                }
            }
            Parse(pParser, NEAR, NULL, &state);
            goto just_had_operator;
        }
    }

Xapian中有关查询的代码并没有完全看懂，希望知道的高手多多指教。

Go 语言解析 git config	2019-03-17	Comments
二分查找捉虫记	2016-02-29	Comments
做一个有品位的程序员	2015-12-23	Comments

World Hello

Xapian的检索

Related Posts