# Custom Analyzer（自定义分析器）

当内置分析器不能满足您的需求时，您可以创建一个custom分析器，它使用以下相应的组合：

* 零个或多个[字符过滤器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-custom-analyzer.html)
* 一个[分析器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-tokenizers.html)
* 零个或多个[token过滤器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-custom-analyzer.html)

### 配置

custom（自定义）分析器接受以下的参数：

| `tokenizer`              | 内置或定制的标记器。 （需要）                                                                                                                                                                                                      |
| ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `char_filter`            | 内置或自定义[字符过滤器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-charfilters.html)的可选阵列。                                                                                                           |
| `filter`                 | 可选的内置或定制token过滤器阵列。                                                                                                                                                                                                  |
| `position_increment_gap` | 在索引文本值数组时，Elasticsearch会在一个值的最后一个值和下一个值的第一个项之间插入假的“间隙”，以确保短语查询与不同数组元素的两个术语不匹配。 默认为100.有关更多信息，请参阅[position\_increment\_gap](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/position-increment-gap.html)。 |

## 配置示例 <a href="#id-zi-ding-yi-fen-xi-qi-pei-zhi-shi-li" id="id-zi-ding-yi-fen-xi-qi-pei-zhi-shi-li"></a>

以下是一个结合以下内容的示例：

字符过滤器

* [HTML Strip Character Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-htmlstrip-charfilter.html)

分词器

* [Standard Tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-standard-tokenizer.html)

Token 分析器

* [Lowercase Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lowercase-tokenfilter.html)
* [ASCII-Folding Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-asciifolding-tokenfilter.html)

```
PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type":      "custom",
          "tokenizer": "standard",
          "char_filter": [
            "html_strip"
          ],
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_custom_analyzer",
  "text": "Is this <b>déjà vu</b>?"
}
```

上述句子将产生以下词语：

```
[ is, this, deja, vu ]
```

前面的例子使用了默认配置的tokenizer，令牌过滤器和字符过滤器，但是可以创建每个配置的版本并在自定义分析器中使用它们。

以下是一个比较复杂的例子：

字符过滤器

* [Mapping Character Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-mapping-charfilter.html), 分词器
* [Pattern Tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-pattern-tokenizer.html), 配置为分割标点符号

Token 分析器

* [Lowercase Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lowercase-tokenfilter.html)(小写分析器)
* [Stop Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-stop-tokenfilter.html)(停止分析器), 配置为使用预定义的英文停止词列表

### 示例

```
PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "char_filter": [
            "emoticons" # 1
          ],
          "tokenizer": "punctuation",  # 2
          "filter": [
            "lowercase",
            "english_stop"
          ]
        }
      },
      "tokenizer": {
        "punctuation": {
          "type": "pattern",
          "pattern": "[ .,!?]"
        }
      },
      "char_filter": {
        "emoticons": {
          "type": "mapping",
          "mappings": [
            ":) => _happy_",
            ":( => _sad_"
          ]
        }
      },
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_custom_analyzer",
  "text":     "I'm a :) person, and you?"
}
```

| [![](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/images/icons/callouts/1.png)](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-custom-analyzer.html#CO283-1) | 表情符号字符过滤器，标点符号化器和english\_stop令牌过滤器是在相同索引设置中定义的自定义实现。 |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- |

以上示例产生以下词语：

```
[ i'm, _happy_, person, you ]
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://xiaoxiami.gitbook.io/elasticsearch/ji-chu/33-analysisfen-679029/333analyzersfen-xi-566829/custom-analyzerff08-zi-ding-yi-fen-xi-qi-ff09.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
