# Language Analyzers（语言分析器）

一组用于分析特定语言文本的分析器。 支持以下类型：[`arabic`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#arabic-analyzer), [`armenian`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#armenian-analyzer), [`basque`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#basque-analyzer), [`brazilian`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#brazilian-analyzer), [`bulgarian`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#bulgarian-analyzer), [`catalan`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#catalan-analyzer), [`cjk`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#cjk-analyzer),[`czech`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#czech-analyzer), [`danish`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#danish-analyzer), [`dutch`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#dutch-analyzer), [`english`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#english-analyzer), [`finnish`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#finnish-analyzer), [`french`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#french-analyzer), [`galician`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#galician-analyzer), [`german`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#german-analyzer), [`greek`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#greek-analyzer), [`hindi`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#hindi-analyzer),[`hungarian`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#hungarian-analyzer), [`indonesian`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#indonesian-analyzer), [`irish`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#irish-analyzer), [`italian`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#italian-analyzer), [`latvian`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#latvian-analyzer), [`lithuanian`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#lithuanian-analyzer), [`norwegian`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#norwegian-analyzer), [`persian`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#persian-analyzer),[`portuguese`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#portuguese-analyzer), [`romanian`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#romanian-analyzer), [`russian`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#russian-analyzer), [`sorani`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#sorani-analyzer), [`spanish`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#spanish-analyzer), [`swedish`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#swedish-analyzer), [`turkish`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#turkish-analyzer), [`thai`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lang-analyzer.html#thai-analyzer).

## **配置语言分析仪** <a href="#id-yu-yan-fen-xi-qi-pei-zhi-yu-yan-fen-xi-yi" id="id-yu-yan-fen-xi-qi-pei-zhi-yu-yan-fen-xi-yi"></a>

### Stopwords(停止词) <a href="#id-yu-yan-fen-xi-qi-stopwords-ting-zhi-ci" id="id-yu-yan-fen-xi-qi-stopwords-ting-zhi-ci"></a>

所有分析仪都支持在配置内部设置自定义停用词，也可以通过设置stopwords\_path来使用外部的停用词。 检查Stop Analyzer了解更多详细信息。

### Excluding words from stemming(排除词干) <a href="#id-yu-yan-fen-xi-qi-excludingwordsfromstemming-pai-chu-ci-gan" id="id-yu-yan-fen-xi-qi-excludingwordsfromstemming-pai-chu-ci-gan"></a>

stem\_exclusion参数允许您指定不应该被阻止的小写字母数组。 在内部，通过将关键字设置为该值的keyword\_marker token filter 来实现此功能

### Reimplementing language analyzers（重新实现语言分析器） <a href="#id-yu-yan-fen-xi-qi-reimplementinglanguageanalyzers-zhong-xin-shi-xian-yu-yan-fen-xi-qi" id="id-yu-yan-fen-xi-qi-reimplementinglanguageanalyzers-zhong-xin-shi-xian-yu-yan-fen-xi-qi"></a>

内置语言分析器可以作为custom analyzers（如下所述）重新实现，以便自定义其行为。

&#x20;**笔记：**&#x5982;果您不打算排除单词被干扰（相当于上面的stem\_exclusion参数），那么您应该从custom analyzer配置中删除keyword\_marker token filter。

## `arabic 分析器` <a href="#id-yu-yan-fen-xi-qi-arabic-fen-xi-qi" id="id-yu-yan-fen-xi-qi-arabic-fen-xi-qi"></a>

arabic 分析器可以如以下定制分析仪重新实现：

```
{
  "settings": {
    "analysis": {
      "filter": {
        "arabic_stop": {
          "type":       "stop",
          "stopwords":  "_arabic_"  # 1
        },
        "arabic_keywords": {
          "type":       "keyword_marker",
          "keywords":   []          # 2
        },
        "arabic_stemmer": {
          "type":       "stemmer",
          "language":   "arabic"
        }
      },
      "analyzer": {
        "arabic": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "arabic_stop",
            "arabic_normalization",
            "arabic_keywords",
            "arabic_stemmer"
          ]
        }
      }
    }
  }
}
```

可以使用无效词或stopwords\_path参数覆盖默认的停用词。

应该删除此过滤器，除非有字词应该排除在干扰之外。

## armenian 分析器 <a href="#id-yu-yan-fen-xi-qi-armenian-fen-xi-qi" id="id-yu-yan-fen-xi-qi-armenian-fen-xi-qi"></a>

分析器可以如以下定制分析仪重新实现：

```
{
  "settings": {
    "analysis": {
      "filter": {
        "armenian_stop": {
          "type":       "stop",
          "stopwords":  "_armenian_"
        },
        "armenian_keywords": {
          "type":       "keyword_marker",
          "keywords":   []
        },
        "armenian_stemmer": {
          "type":       "stemmer",
          "language":   "armenian"
        }
      },
      "analyzer": {
        "armenian": {
          "tokenizer":  "standard",
          "filter": [
            "lowercase",
            "armenian_stop",
            "armenian_keywords",
            "armenian_stemmer"
          ]
        }
      }
    }
  }
}
```

语言分词器大同小异: 其他的请看官方文档:[Language Analyzers](https://www.elastic.co/guide/en/elasticsearch/reference/5.2/analysis-lang-analyzer.html)
