# 3.2.1.字段的数据类型

### 数据类型分类

**核心数据类型**

|      分类     | 字段的数据类型                                                               |
| :---------: | --------------------------------------------------------------------- |
| String 字符串型 | text和keyword                                                          |
| Numeric 数字型 | long, integer, short, byte, double, float, half\_float, scaled\_float |
|   Date 日期型  | date                                                                  |
| Boolean 布尔型 | boolean                                                               |
| Binary 二进制型 | binary                                                                |
|  Range 范围型  | integer\_range, float\_range, long\_range, double\_range, date\_range |

**复合数据类型**

|     分类     | 数据类型                             |
| :--------: | -------------------------------- |
|  Array 数组型 | 支持数组形式，不需要一个专有的字段数据类型            |
| Object 对象型 | object数据类型：表现形式其实就是单一的JSON对象     |
| Nested 嵌套型 | nested数据类型：表现形式是多个Object型组成的一个数组 |

**Geo地理数据类型**

|        分类       | 数据类型                      |
| :-------------: | ------------------------- |
| Geo-point 地理坐标型 | geo\_point数据类型：描述纬度/经度坐标  |
| Geo-Shape 地理图形型 | geo\_shape数据类型：描述多边形等复杂形状 |

**特定数据类型**

|         分类        | 数据类型                                                                                                                                                  |
| :---------------: | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
|        IP型        | ip：描述IPv4 和 IPv6 地址                                                                                                                                   |
|   Completion补全型   | completion:提供自动完成的提示                                                                                                                                  |
| Token count 令牌计数型 | token\_count:用于统计字符串中的词条数量                                                                                                                            |
|  mapper-murmur3 型 | murmur3:计算哈希值在指数时间和并存储他们在索引中                                                                                                                          |
|   Attachment 附件型  | 查看[mapper-attachments](https://www.elastic.co/guide/en/elasticsearch/plugins/5.2/mapper-attachments.html)插件来支持索引附件，如微软Office格式，开放文档格式，EPUB，HTML等附件类型。 |
|   Percolator 抽取型  | 接受特定领域查询语言（query-dsl）的查询                                                                                                                              |

## 多字段 <a href="#id-zi-duan-lei-xing-duo-zi-duan" id="id-zi-duan-lei-xing-duo-zi-duan"></a>

通常用于为不同目的用不同的方法索引同一个字段。例如，***string***&#x5B57;段可以映射为一&#x4E2A;***text***&#x5B57;段用于全文检索，同样可以映射为一&#x4E2A;***keyword***&#x5B57;段用于排序和聚合。另外，你可以使用（分析器） ***standard analyzer，english analyzer，french analyzer*** 来索引一&#x4E2A;***text*** 字段

这就是 ***muti-fields*** 的目的。大多数的数据类型通过fields参数来支&#x6301;***muti-fields***。

### 多字段详解

> 解析一下上面的意思:

插入一条测试数据

```
PUT my_index/my_type/1
{
  "name": "Some binary blob"

}
PUT my_index/my_type/2
{
  "name": "some apples"

}
PUT my_index/my_type/3
{
  "name": "Ha apples"

}

PUT my_index/my_type/4
{
  "name": "a man"

}
PUT my_index/my_type/5
{
  "name": "many apples"

}
```

查看自动创建的mapping

```
GET /my_index/my_type/_mapping
```

返回结果:

```
{
  "my_index": {
    "mappings": {
      "my_type": {
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}
```

你会发现,5.x版本的Elasticsearch 会在每个字段name的mapping下多出来一个fields的对象,出现了一个已名字为keyword的类型为keyword的字段,这个字段默认是不分词了,所以就可以使用此字段来进行排序和不拆分查询.

1.name 字段,type类型是text,是分词的,所以"Some binary blob"会被分成,"Some","binary", "blob"三个词进行倒排索引

2.name.keyword字段,type类型是keyword,是不分词的正排索引

查看例子:

#### **term查询(不分词查询,精确匹配的情况)**

**当你通过term查询时,会以"Some binary blob" 整个词组进行查询,如果对name字段进行搜索,是没有值可以返回的.**

```
GET /my_index/my_type/_search
{
  "query": {
    "term": {
      "name": "Some binary blob"
    }
  }
}
```

无返回结果

```
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}
```

**使用name.keywords字段进行term查询**

```
GET /my_index/my_type/_search
{
  "query": {
    "term": {
      "name.keyword": "Some binary blob"
    }
  }
}
```

查出结果:

```
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": "Some binary blob"
        }
      }
    ]
  }
}
```

#### match**查询(分词查询,模糊匹配的情况)**

match查询 会对查询词inxS分词,`ome binary blob`会被拆分成四种情况进行搜索

1. `ome词`
2. `binary词`
3. `blob词`
4. `ome binary blob 词`

```
GET /my_index/my_type/_search
{
  "query": {
    "match": {
      "name": "Some binary blob"
    }
  }
}
```

返回两条数据

```
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.7594807,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.7594807,
        "_source": {
          "name": "Some binary blob"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": 0.62191015,
        "_source": {
          "name": "some apples"
        }
      }
    ]
  }
}
```

使用name.keyword进行match查询

```
GET /my_index/my_type/_search
{
  "query": {
    "match": {
      "name.keyword": "Some binary blob"
    }
  }
}
```

返回结果:

```
{
  "took": 20,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": "Some binary blob"
        }
      }
    ]
  }
}
```

#### 排序的情况

**由于name字段是text类型,分词后倒排索引.所以是无法进行排序的,因此下面会报错**

```
GET /my_index/my_type/_search
{
  "query": { "match_all": {} },
  "sort": { "name": { "order": "desc" } }
}'
```

报错结果:

```
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my_index",
        "node": "7bJsCFK-QlalolMWGqOoxA",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
    }
  },
  "status": 400
}
```

使用name.keyword字段进行,则ok.

```
GET /my_index/my_type/_search
{
  "query": { "match_all": {} },
  "sort": { "name.keyword": { "order": "asc" } }
}'
```

返回结果:

```
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": null,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "3",
        "_score": null,
        "_source": {
          "name": "Ha apples"
        },
        "sort": [
          "Ha apples"
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": null,
        "_source": {
          "name": "Some binary blob"
        },
        "sort": [
          "Some binary blob"
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "4",
        "_score": null,
        "_source": {
          "name": "a man"
        },
        "sort": [
          "a man"
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "5",
        "_score": null,
        "_source": {
          "name": "many apples"
        },
        "sort": [
          "many apples"
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": null,
        "_source": {
          "name": "some apples"
        },
        "sort": [
          "some apples"
        ]
      }
    ]
  }
}
```

#### 总结:

1. 如果是模糊查询,一定要使用text类型的字段进行查询,倒排索引效率高
2. 如果你是一个精确的匹配,并且需要排序,聚合操作,则需要使用keyword类型的字段.
3. 在5.x之前的版本解决方案只能建立两个字段进行两种不通的分词器操作.一个字段分词,一个字段不分词来达到相同的效果.
