# 3.2.1.字段的数据类型

### 数据类型分类

**核心数据类型**

|      分类     | 字段的数据类型                                                               |
| :---------: | --------------------------------------------------------------------- |
| String 字符串型 | text和keyword                                                          |
| Numeric 数字型 | long, integer, short, byte, double, float, half\_float, scaled\_float |
|   Date 日期型  | date                                                                  |
| Boolean 布尔型 | boolean                                                               |
| Binary 二进制型 | binary                                                                |
|  Range 范围型  | integer\_range, float\_range, long\_range, double\_range, date\_range |

**复合数据类型**

|     分类     | 数据类型                             |
| :--------: | -------------------------------- |
|  Array 数组型 | 支持数组形式，不需要一个专有的字段数据类型            |
| Object 对象型 | object数据类型：表现形式其实就是单一的JSON对象     |
| Nested 嵌套型 | nested数据类型：表现形式是多个Object型组成的一个数组 |

**Geo地理数据类型**

|        分类       | 数据类型                      |
| :-------------: | ------------------------- |
| Geo-point 地理坐标型 | geo\_point数据类型：描述纬度/经度坐标  |
| Geo-Shape 地理图形型 | geo\_shape数据类型：描述多边形等复杂形状 |

**特定数据类型**

|         分类        | 数据类型                                                                                                                                                  |
| :---------------: | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
|        IP型        | ip：描述IPv4 和 IPv6 地址                                                                                                                                   |
|   Completion补全型   | completion:提供自动完成的提示                                                                                                                                  |
| Token count 令牌计数型 | token\_count:用于统计字符串中的词条数量                                                                                                                            |
|  mapper-murmur3 型 | murmur3:计算哈希值在指数时间和并存储他们在索引中                                                                                                                          |
|   Attachment 附件型  | 查看[mapper-attachments](https://www.elastic.co/guide/en/elasticsearch/plugins/5.2/mapper-attachments.html)插件来支持索引附件，如微软Office格式，开放文档格式，EPUB，HTML等附件类型。 |
|   Percolator 抽取型  | 接受特定领域查询语言（query-dsl）的查询                                                                                                                              |

## 多字段 <a href="#id-zi-duan-lei-xing-duo-zi-duan" id="id-zi-duan-lei-xing-duo-zi-duan"></a>

通常用于为不同目的用不同的方法索引同一个字段。例如，***string***&#x5B57;段可以映射为一&#x4E2A;***text***&#x5B57;段用于全文检索，同样可以映射为一&#x4E2A;***keyword***&#x5B57;段用于排序和聚合。另外，你可以使用（分析器） ***standard analyzer，english analyzer，french analyzer*** 来索引一&#x4E2A;***text*** 字段

这就是 ***muti-fields*** 的目的。大多数的数据类型通过fields参数来支&#x6301;***muti-fields***。

### 多字段详解

> 解析一下上面的意思:

插入一条测试数据

```
PUT my_index/my_type/1
{
  "name": "Some binary blob"

}
PUT my_index/my_type/2
{
  "name": "some apples"

}
PUT my_index/my_type/3
{
  "name": "Ha apples"

}

PUT my_index/my_type/4
{
  "name": "a man"

}
PUT my_index/my_type/5
{
  "name": "many apples"

}
```

查看自动创建的mapping

```
GET /my_index/my_type/_mapping
```

返回结果:

```
{
  "my_index": {
    "mappings": {
      "my_type": {
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}
```

你会发现,5.x版本的Elasticsearch 会在每个字段name的mapping下多出来一个fields的对象,出现了一个已名字为keyword的类型为keyword的字段,这个字段默认是不分词了,所以就可以使用此字段来进行排序和不拆分查询.

1.name 字段,type类型是text,是分词的,所以"Some binary blob"会被分成,"Some","binary", "blob"三个词进行倒排索引

2.name.keyword字段,type类型是keyword,是不分词的正排索引

查看例子:

#### **term查询(不分词查询,精确匹配的情况)**

**当你通过term查询时,会以"Some binary blob" 整个词组进行查询,如果对name字段进行搜索,是没有值可以返回的.**

```
GET /my_index/my_type/_search
{
  "query": {
    "term": {
      "name": "Some binary blob"
    }
  }
}
```

无返回结果

```
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}
```

**使用name.keywords字段进行term查询**

```
GET /my_index/my_type/_search
{
  "query": {
    "term": {
      "name.keyword": "Some binary blob"
    }
  }
}
```

查出结果:

```
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": "Some binary blob"
        }
      }
    ]
  }
}
```

#### match**查询(分词查询,模糊匹配的情况)**

match查询 会对查询词inxS分词,`ome binary blob`会被拆分成四种情况进行搜索

1. `ome词`
2. `binary词`
3. `blob词`
4. `ome binary blob 词`

```
GET /my_index/my_type/_search
{
  "query": {
    "match": {
      "name": "Some binary blob"
    }
  }
}
```

返回两条数据

```
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.7594807,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.7594807,
        "_source": {
          "name": "Some binary blob"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": 0.62191015,
        "_source": {
          "name": "some apples"
        }
      }
    ]
  }
}
```

使用name.keyword进行match查询

```
GET /my_index/my_type/_search
{
  "query": {
    "match": {
      "name.keyword": "Some binary blob"
    }
  }
}
```

返回结果:

```
{
  "took": 20,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": "Some binary blob"
        }
      }
    ]
  }
}
```

#### 排序的情况

**由于name字段是text类型,分词后倒排索引.所以是无法进行排序的,因此下面会报错**

```
GET /my_index/my_type/_search
{
  "query": { "match_all": {} },
  "sort": { "name": { "order": "desc" } }
}'
```

报错结果:

```
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my_index",
        "node": "7bJsCFK-QlalolMWGqOoxA",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
    }
  },
  "status": 400
}
```

使用name.keyword字段进行,则ok.

```
GET /my_index/my_type/_search
{
  "query": { "match_all": {} },
  "sort": { "name.keyword": { "order": "asc" } }
}'
```

返回结果:

```
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": null,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "3",
        "_score": null,
        "_source": {
          "name": "Ha apples"
        },
        "sort": [
          "Ha apples"
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": null,
        "_source": {
          "name": "Some binary blob"
        },
        "sort": [
          "Some binary blob"
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "4",
        "_score": null,
        "_source": {
          "name": "a man"
        },
        "sort": [
          "a man"
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "5",
        "_score": null,
        "_source": {
          "name": "many apples"
        },
        "sort": [
          "many apples"
        ]
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": null,
        "_source": {
          "name": "some apples"
        },
        "sort": [
          "some apples"
        ]
      }
    ]
  }
}
```

#### 总结:

1. 如果是模糊查询,一定要使用text类型的字段进行查询,倒排索引效率高
2. 如果你是一个精确的匹配,并且需要排序,聚合操作,则需要使用keyword类型的字段.
3. 在5.x之前的版本解决方案只能建立两个字段进行两种不通的分词器操作.一个字段分词,一个字段不分词来达到相同的效果.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://xiaoxiami.gitbook.io/elasticsearch/ji-chu/mapping/zi-duan-de-shu-ju-lei-xing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
