# Pattern Replace Character Filter(正则替换字符)

**Pattern Replace Character Filter**使用正则表达式来匹配应替换为指定替换字符串的字符。替换字符串可以引用正则表达式中的捕获组。

> ### 小心"病态"正则表达式
>
> **Pattern Replace Character Filter**使用[Java正则表达式](http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html)。
>
> 一个“病态”的正则表达式可能会运行得非常慢，甚至会抛出一个**StackOverflowError**，还会导致其运行的节点突然退出。
>
> 更多信息请查看[pathological regular expressions and how to avoid them](http://www.regular-expressions.info/catastrophic.html)[.](http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html)

## 配置

**Pattern Replace Character Filter**会接收以下参数：

| 参数名称          | 参数说明                                                                                                                                                                     |
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `pattern`     | 一个[Java正则表达式](http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). 必填                                                                                 |
| `replacement` | 替换字符串，可以使用$ 1 .. $ 9语法来引用捕获组，可以参考[这里](http://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#appendReplacement-java.lang.StringBuffer-java.lang.String-)。 |
| `flags`       | Java正则表达式[标志](http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#field.summary)。标志应该用“\|”进行分割，例如“CASE\_INSENSITIVE \| COMMENTS”。                      |

在下面例子中，我们使用**Pattern Replace Character Filter**实现用下划线替代任何嵌入的破折号，即123-456-789→123\_456\_789：

```
PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "(\\d+)-(?=\\d)",
          "replacement": "$1_"
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "My credit card is 123-456-789"
}
```

上面案例将返回如下结果：

```
[ My, credit, card, is 123_456_789 ]
```

> 出于搜索目的使用替换字符串，会引起原始文本长度的更改，进而导致不正确的高亮显示。案例如下所示。

这个示例在遇到小写字母后跟大写字母（即fooBarBaz→foo Bar Baz）时插入空格，允许单独查询camelCase字词：

```
PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ],
          "filter": [
            "lowercase"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "(?<=\\p{Lower})(?=\\p{Upper})",
          "replacement": " "
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "text": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "The fooBarBaz method"
}
```

示例返回的结果如下：

```
[ the, foo, bar, baz, method ]
```

查询**bar**可以正确找到文档，但突出显示结果将产生不正确的高光，因为我们的字符过滤器更改了原始文本的长度：

```
PUT my_index/my_doc/1?refresh
{
  "text": "The fooBarBaz method"
}

GET my_index/_search
{
  "query": {
    "match": {
      "text": "bar"
    }
  },
  "highlight": {
    "fields": {
      "text": {}
    }
  }
}
```

以上的输出结果是：

```
{
  "timed_out": false,
  "took": $body.took,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2824934,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_doc",
        "_id": "1",
        "_score": 0.2824934,
        "_source": {
          "text": "The fooBarBaz method"
        },
        "highlight": {
          "text": [
            "The foo<em>Ba</em>rBaz method"
          ]
        }
      }
    ]
  }
}
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://xiaoxiami.gitbook.io/elasticsearch/ji-chu/33-analysisfen-679029/336character-filtersff08-zi-fu-guo-lv-qi-ff09/pattern-replace-character-filter.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
