3.2.1.字段的数据类型
数据类型分类
核心数据类型
分类
字段的数据类型
String 字符串型
text和keyword
Numeric 数字型
long, integer, short, byte, double, float, half_float, scaled_float
Date 日期型
date
Boolean 布尔型
boolean
Binary 二进制型
binary
Range 范围型
integer_range, float_range, long_range, double_range, date_range
复合数据类型
分类
数据类型
Array 数组型
支持数组形式,不需要一个专有的字段数据类型
Object 对象型
object数据类型:表现形式其实就是单一的JSON对象
Nested 嵌套型
nested数据类型:表现形式是多个Object型组成的一个数组
Geo地理数据类型
分类
数据类型
Geo-point 地理坐标型
geo_point数据类型:描述纬度/经度坐标
Geo-Shape 地理图形型
geo_shape数据类型:描述多边形等复杂形状
特定数据类型
分类
数据类型
IP型
ip:描述IPv4 和 IPv6 地址
Completion补全型
completion:提供自动完成的提示
Token count 令牌计数型
token_count:用于统计字符串中的词条数量
mapper-murmur3 型
murmur3:计算哈希值在指数时间和并存储他们在索引中
Attachment 附件型
查看mapper-attachments插件来支持索引附件,如微软Office格式,开放文档格式,EPUB,HTML等附件类型。
Percolator 抽取型
接受特定领域查询语言(query-dsl)的查询
多字段
通常用于为不同目的用不同的方法索引同一个字段。例如,string字段可以映射为一个text字段用于全文检索,同样可以映射为一个keyword字段用于排序和聚合。另外,你可以使用(分析器) standard analyzer,english analyzer,french analyzer 来索引一个text 字段
这就是 muti-fields 的目的。大多数的数据类型通过fields参数来支持muti-fields。
多字段详解
解析一下上面的意思:
插入一条测试数据
PUT my_index/my_type/1
{
"name": "Some binary blob"
}
PUT my_index/my_type/2
{
"name": "some apples"
}
PUT my_index/my_type/3
{
"name": "Ha apples"
}
PUT my_index/my_type/4
{
"name": "a man"
}
PUT my_index/my_type/5
{
"name": "many apples"
}
查看自动创建的mapping
GET /my_index/my_type/_mapping
返回结果:
{
"my_index": {
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
你会发现,5.x版本的Elasticsearch 会在每个字段name的mapping下多出来一个fields的对象,出现了一个已名字为keyword的类型为keyword的字段,这个字段默认是不分词了,所以就可以使用此字段来进行排序和不拆分查询.
1.name 字段,type类型是text,是分词的,所以"Some binary blob"会被分成,"Some","binary", "blob"三个词进行倒排索引
2.name.keyword字段,type类型是keyword,是不分词的正排索引
查看例子:
term查询(不分词查询,精确匹配的情况)
当你通过term查询时,会以"Some binary blob" 整个词组进行查询,如果对name字段进行搜索,是没有值可以返回的.
GET /my_index/my_type/_search
{
"query": {
"term": {
"name": "Some binary blob"
}
}
}
无返回结果
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
使用name.keywords字段进行term查询
GET /my_index/my_type/_search
{
"query": {
"term": {
"name.keyword": "Some binary blob"
}
}
}
查出结果:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 0.2876821,
"_source": {
"name": "Some binary blob"
}
}
]
}
}
match查询(分词查询,模糊匹配的情况)
match查询 会对查询词inxS分词,ome binary blob
会被拆分成四种情况进行搜索
ome词
binary词
blob词
ome binary blob 词
GET /my_index/my_type/_search
{
"query": {
"match": {
"name": "Some binary blob"
}
}
}
返回两条数据
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.7594807,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 0.7594807,
"_source": {
"name": "Some binary blob"
}
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"_score": 0.62191015,
"_source": {
"name": "some apples"
}
}
]
}
}
使用name.keyword进行match查询
GET /my_index/my_type/_search
{
"query": {
"match": {
"name.keyword": "Some binary blob"
}
}
}
返回结果:
{
"took": 20,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 0.2876821,
"_source": {
"name": "Some binary blob"
}
}
]
}
}
排序的情况
由于name字段是text类型,分词后倒排索引.所以是无法进行排序的,因此下面会报错
GET /my_index/my_type/_search
{
"query": { "match_all": {} },
"sort": { "name": { "order": "desc" } }
}'
报错结果:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "my_index",
"node": "7bJsCFK-QlalolMWGqOoxA",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
},
"status": 400
}
使用name.keyword字段进行,则ok.
GET /my_index/my_type/_search
{
"query": { "match_all": {} },
"sort": { "name.keyword": { "order": "asc" } }
}'
返回结果:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": null,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "3",
"_score": null,
"_source": {
"name": "Ha apples"
},
"sort": [
"Ha apples"
]
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": null,
"_source": {
"name": "Some binary blob"
},
"sort": [
"Some binary blob"
]
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "4",
"_score": null,
"_source": {
"name": "a man"
},
"sort": [
"a man"
]
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "5",
"_score": null,
"_source": {
"name": "many apples"
},
"sort": [
"many apples"
]
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"_score": null,
"_source": {
"name": "some apples"
},
"sort": [
"some apples"
]
}
]
}
}
总结:
如果是模糊查询,一定要使用text类型的字段进行查询,倒排索引效率高
如果你是一个精确的匹配,并且需要排序,聚合操作,则需要使用keyword类型的字段.
在5.x之前的版本解决方案只能建立两个字段进行两种不通的分词器操作.一个字段分词,一个字段不分词来达到相同的效果.
Last updated
Was this helpful?