# Fingerprint Analyzer（指纹分析器）

fingerprint 分析器实现了OpenRefine项目使用的[指纹识别算法](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth#fingerprint)来协助聚类。

输入文本较低，规范化以删除扩展字符，排序，重复数据删除并连接到单个令牌。 如果配置了一个停用词列表，停止单词也将被删除。

## **定义**

它包括：

分词器

* [Standard Tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-standard-tokenizer.html)

词语过滤器

* [Lower Case Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lowercase-tokenfilter.html)
* [ASCII Folding Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-asciifolding-tokenfilter.html)
* [Stop Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-stop-tokenfilter.html) (默认禁用)
* [Fingerprint Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-fingerprint-tokenfilter.html)

## **输出实例**

```
POST _analyze
{
  "analyzer": "fingerprint",
  "text": "Yes yes, Gödel said this sentence is consistent and."
}
```

上述的句子将产生以下的词语：

```
[ and consistent godel is said sentence this yes ]
```

## **配置**

```
fingerprint（指纹）分析器接受以下的参数：
```

| `separator`       | 用于连接条款的字符。 默认为空格。                                |
| ----------------- | ------------------------------------------------ |
| `max_output_size` | 要发出的最大标记大小。 默认为255.大于此大小的token将被丢弃。              |
| `stopwords`       | 预定义的停止词列表，如\_english\_或包含停止词列表的数组。 默认为 \_none\_。 |
| `stopwords_path`  | 包含停止词的文件的路径。                                     |

有关停止字配置的更多信息，请参阅[Stop Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-stop-tokenfilter.html)。

## **配置实例**

在这个例子中，我们配置 fingerprint 分析器以使用预定义的英文停止词列表：

```
PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_fingerprint_analyzer": {
          "type": "fingerprint",
          "stopwords": "_english_"
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_fingerprint_analyzer",
  "text": "Yes yes, Gödel said this sentence is consistent and."
}
```

以上示例产生以下词语：

```
[ consistent godel said sentence yes ]
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://xiaoxiami.gitbook.io/elasticsearch/ji-chu/33-analysisfen-679029/333analyzersfen-xi-566829/fingerprint-analyzerff08-zhi-wen-fen-xi-qi-ff09.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
