# Pattern Analyzer（模式分析器）

pattern analyzer 使用正则表达式将文本拆分为词语。 正则表达式应该不是**token**本身匹配 **token separators**。 正则表达式默认为`\ W +`（或所有非字符字符）。

> ## B**eware of Pathological 正则表达式**
>
> pattern analyzer 使用java正则表达式
>
> 一个严重的正则表达式可能会运行得非常慢，甚至会抛出一个StackOverflowError，并导致它正在运行的节点突然退出。
>
> 阅读更多关于pathological正则表达式和如何避免它们。

## **定义**

它包括：

Tokenizer(分词器)

* [Pattern Tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-pattern-tokenizer.html) 模式分词器

Token filters(词语过滤器)

* [Lower Case Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lowercase-tokenfilter.html) 小写过滤器
* [Stop Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-stop-tokenfilter.html)  停用词过滤器 (默认禁用)

## **输出实例**

```
POST _analyze
{
  "analyzer": "pattern",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
```

上述的句子将产生以下的词语：

```
[ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]
```

## **配置**

pattern analyzer 接受以下参数：

| pattern         | Java正则表达式默认为`\ W +`。                                    |
| --------------- | ------------------------------------------------------- |
| flags           | Java正则表达式标志。 标志应分开管道，例如“CASE\_INSENSITIVE \| COMMENTS”。 |
| lowercase       | 是否应该降低条件？ 默认为true。                                      |
| stopwords       | 预定义的 stop 词列表，如\_english\_或包含停止词列表的数组。 默认为 \_none\_。    |
| stopwords\_path | 包含停止词的文件的路径。                                            |

有关stop word配置的更多信息，请参阅[Stop Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.2/analysis-stop-tokenfilter.html)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://xiaoxiami.gitbook.io/elasticsearch/ji-chu/33-analysisfen-679029/333analyzersfen-xi-566829/pattern-analyzerff08-mo-shi-fen-xi-qi-ff09.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
