3.2.1.字段的数据类型

数据类型分类

核心数据类型

分类

字段的数据类型

String 字符串型

text和keyword

Numeric 数字型

long, integer, short, byte, double, float, half_float, scaled_float

Date 日期型

date

Boolean 布尔型

boolean

Binary 二进制型

binary

Range 范围型

integer_range, float_range, long_range, double_range, date_range

复合数据类型

分类

数据类型

Array 数组型

支持数组形式,不需要一个专有的字段数据类型

Object 对象型

object数据类型:表现形式其实就是单一的JSON对象

Nested 嵌套型

nested数据类型:表现形式是多个Object型组成的一个数组

Geo地理数据类型

分类

数据类型

Geo-point 地理坐标型

geo_point数据类型:描述纬度/经度坐标

Geo-Shape 地理图形型

geo_shape数据类型:描述多边形等复杂形状

特定数据类型

分类

数据类型

IP型

ip:描述IPv4 和 IPv6 地址

Completion补全型

completion:提供自动完成的提示

Token count 令牌计数型

token_count:用于统计字符串中的词条数量

mapper-murmur3 型

murmur3:计算哈希值在指数时间和并存储他们在索引中

Attachment 附件型

查看mapper-attachments插件来支持索引附件,如微软Office格式,开放文档格式,EPUB,HTML等附件类型。

Percolator 抽取型

接受特定领域查询语言(query-dsl)的查询

多字段

通常用于为不同目的用不同的方法索引同一个字段。例如,string字段可以映射为一个text字段用于全文检索,同样可以映射为一个keyword字段用于排序和聚合。另外,你可以使用(分析器) standard analyzer,english analyzer,french analyzer 来索引一个text 字段

这就是 muti-fields 的目的。大多数的数据类型通过fields参数来支持muti-fields

多字段详解

解析一下上面的意思:

插入一条测试数据

PUT my_index/my_type/1
{
  "name": "Some binary blob"

}
PUT my_index/my_type/2
{
  "name": "some apples"

}
PUT my_index/my_type/3
{
  "name": "Ha apples"

}

PUT my_index/my_type/4
{
  "name": "a man"

}
PUT my_index/my_type/5
{
  "name": "many apples"

}

查看自动创建的mapping

GET /my_index/my_type/_mapping

返回结果:

{
  "my_index": {
    "mappings": {
      "my_type": {
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

你会发现,5.x版本的Elasticsearch 会在每个字段name的mapping下多出来一个fields的对象,出现了一个已名字为keyword的类型为keyword的字段,这个字段默认是不分词了,所以就可以使用此字段来进行排序和不拆分查询.

1.name 字段,type类型是text,是分词的,所以"Some binary blob"会被分成,"Some","binary", "blob"三个词进行倒排索引

2.name.keyword字段,type类型是keyword,是不分词的正排索引

查看例子:

term查询(不分词查询,精确匹配的情况)

当你通过term查询时,会以"Some binary blob" 整个词组进行查询,如果对name字段进行搜索,是没有值可以返回的.

GET /my_index/my_type/_search
{
  "query": {
    "term": {
      "name": "Some binary blob"
    }
  }
}

无返回结果

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

使用name.keywords字段进行term查询

GET /my_index/my_type/_search
{
  "query": {
    "term": {
      "name.keyword": "Some binary blob"
    }
  }
}

查出结果:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": "Some binary blob"
        }
      }
    ]
  }
}

match查询(分词查询,模糊匹配的情况)

match查询 会对查询词inxS分词,ome binary blob会被拆分成四种情况进行搜索

  1. ome词

  2. binary词

  3. blob词

  4. ome binary blob 词

GET /my_index/my_type/_search
{
  "query": {
    "match": {
      "name": "Some binary blob"
    }
  }
}

返回两条数据

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.7594807,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.7594807,
        "_source": {
          "name": "Some binary blob"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": 0.62191015,
        "_source": {
          "name": "some apples"
        }
      }
    ]
  }
}

使用name.keyword进行match查询

GET /my_index/my_type/_search
{
  "query": {
    "match": {
      "name.keyword": "Some binary blob"
    }
  }
}

返回结果:

{
  "took": 20,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": "Some binary blob"
        }
      }
    ]
  }
}

排序的情况

由于name字段是text类型,分词后倒排索引.所以是无法进行排序的,因此下面会报错

GET /my_index/my_type/_search
{
  "query": { "match_all": {} },
  "sort": { "name": { "order": "desc" } }
}'

报错结果:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my_index",
        "node": "7bJsCFK-QlalolMWGqOoxA",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
    }
  },
  "status": 400
}

使用name.keyword字段进行,则ok.

GET /my_index/my_type/_search
{
  "query": { "match_all": {} },
  "sort": { "name.keyword": { "order": "asc" } }
}'

返回结果:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": null,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "3",
        "_score": null,
        "_source": {
          "name": "Ha apples"
        },
        "sort": [
          "Ha apples"
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": null,
        "_source": {
          "name": "Some binary blob"
        },
        "sort": [
          "Some binary blob"
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "4",
        "_score": null,
        "_source": {
          "name": "a man"
        },
        "sort": [
          "a man"
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "5",
        "_score": null,
        "_source": {
          "name": "many apples"
        },
        "sort": [
          "many apples"
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": null,
        "_source": {
          "name": "some apples"
        },
        "sort": [
          "some apples"
        ]
      }
    ]
  }
}

总结:

  1. 如果是模糊查询,一定要使用text类型的字段进行查询,倒排索引效率高

  2. 如果你是一个精确的匹配,并且需要排序,聚合操作,则需要使用keyword类型的字段.

  3. 在5.x之前的版本解决方案只能建立两个字段进行两种不通的分词器操作.一个字段分词,一个字段不分词来达到相同的效果.

Last updated