# Custom Analyzer（自定义分析器）

当内置分析器不能满足您的需求时，您可以创建一个custom分析器，它使用以下相应的组合：

* 零个或多个[字符过滤器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-custom-analyzer.html)
* 一个[分析器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-tokenizers.html)
* 零个或多个[token过滤器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-custom-analyzer.html)

### 配置

custom（自定义）分析器接受以下的参数：

| `tokenizer`              | 内置或定制的标记器。 （需要）                                                                                                                                                                                                      |
| ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `char_filter`            | 内置或自定义[字符过滤器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-charfilters.html)的可选阵列。                                                                                                           |
| `filter`                 | 可选的内置或定制token过滤器阵列。                                                                                                                                                                                                  |
| `position_increment_gap` | 在索引文本值数组时，Elasticsearch会在一个值的最后一个值和下一个值的第一个项之间插入假的“间隙”，以确保短语查询与不同数组元素的两个术语不匹配。 默认为100.有关更多信息，请参阅[position\_increment\_gap](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/position-increment-gap.html)。 |

## 配置示例 <a href="#id-zi-ding-yi-fen-xi-qi-pei-zhi-shi-li" id="id-zi-ding-yi-fen-xi-qi-pei-zhi-shi-li"></a>

以下是一个结合以下内容的示例：

字符过滤器

* [HTML Strip Character Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-htmlstrip-charfilter.html)

分词器

* [Standard Tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-standard-tokenizer.html)

Token 分析器

* [Lowercase Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lowercase-tokenfilter.html)
* [ASCII-Folding Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-asciifolding-tokenfilter.html)

```
PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type":      "custom",
          "tokenizer": "standard",
          "char_filter": [
            "html_strip"
          ],
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_custom_analyzer",
  "text": "Is this <b>déjà vu</b>?"
}
```

上述句子将产生以下词语：

```
[ is, this, deja, vu ]
```

前面的例子使用了默认配置的tokenizer，令牌过滤器和字符过滤器，但是可以创建每个配置的版本并在自定义分析器中使用它们。

以下是一个比较复杂的例子：

字符过滤器

* [Mapping Character Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-mapping-charfilter.html), 分词器
* [Pattern Tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-pattern-tokenizer.html), 配置为分割标点符号

Token 分析器

* [Lowercase Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lowercase-tokenfilter.html)(小写分析器)
* [Stop Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-stop-tokenfilter.html)(停止分析器), 配置为使用预定义的英文停止词列表

### 示例

```
PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "char_filter": [
            "emoticons" # 1
          ],
          "tokenizer": "punctuation",  # 2
          "filter": [
            "lowercase",
            "english_stop"
          ]
        }
      },
      "tokenizer": {
        "punctuation": {
          "type": "pattern",
          "pattern": "[ .,!?]"
        }
      },
      "char_filter": {
        "emoticons": {
          "type": "mapping",
          "mappings": [
            ":) => _happy_",
            ":( => _sad_"
          ]
        }
      },
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_custom_analyzer",
  "text":     "I'm a :) person, and you?"
}
```

| [![](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/images/icons/callouts/1.png)](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-custom-analyzer.html#CO283-1) | 表情符号字符过滤器，标点符号化器和english\_stop令牌过滤器是在相同索引设置中定义的自定义实现。 |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- |

以上示例产生以下词语：

```
[ i'm, _happy_, person, you ]
```
