ElasticSearch学习之路-day14

转载自：https://blog.csdn.net/chengyuqiang/column/info/18392，ES版本号6.3.0

元数据概述
mapping元字段是mapping映射中描述文档本身的字段，大致可以分为文档属性元数据、文档元数据、索引元数据、路由元数据和自定义元数据。

_index:
多索引查询时，有时候只需要在特定索引名上进行查询，_index字段提供了便利，也就是说可以对索引名进行term查询、terms查询、聚合分析、使用脚本排序
_index是个虚拟的字段、不会真的加到Lucene索引中，对_index进行term、terms查询(也包括match、query_string、simple_query_string)，但是不支持prefix、wildcard、regexp和fuzzy查询。

_type：
此doc的mapping type名，自动被索引，可被查询，聚合，排序使用，或者脚本里访问

_id:
doc的id,建索引时候传入，不被索引，可通过_id被查询，脚本里使用，不能参与聚合或者排序

PUT my_index
PUT my_index/my_type/1
{
  "text":"Document with ID 1"
}
PUT my_index/my_type/2
{
  "text":"Document with ID 2"
}

根据_id进行查询

GET my_index/_search
{
  "query": {
    "terms": {
      "_id": ["1","2"]
    }
  }
}

{
  "took": 54,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "text": "Document with ID 2"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "text": "Document with ID 1"
        }
      }
    ]
  }
}

_source：
包含在索引时间传递的原始JSON文档正文。_source字段本身没有编入索引（因此不可被搜索）但他被存储，以便在执行获取请求（如get或search）时可以返回他。
默认_source字段是开启的，也就是说，默认情况下存储文档的原始值。
如果某个字段内容非常多（比如一篇小说），或者查询业务只需要对该字段进行搜索，返回文档id，然后通过其他途径查看文档原文，则不需要保留_source元字段。可以通过禁用_source元字段，在ElasticSearch 中只存储倒排索引，不保留字段原始值。
（1）_source禁用

DELETE my_index
PUT my_index
{
  "mappings": {
    "my_type": {
      "_source": {
        "enabled": false
      }
    }
  }
}
PUT my_index/my_type/1
{
  "text":"This is a document"
}

进行查询

GET my_index/my_type/1

返回，可以看到查询结果中没有返回_source相关数据

{
  "_index": "my_index",
  "_type": "my_type",
  "_id": "1",
  "_version": 1,
  "found": true
}

【例子】包含或排除部分字段

DELETE my_index
PUT my_index
{
  "mappings": {
    "blog":{
      "_source": {
        "includes":["title","url"],
        "excludes":["content"]
      },
      "properties": {
        "title":{
          "type": "text"
        },
        "content":{
          "type": "text"
        },
        "url":{
          "type": "text"
        }
      }
    }
  }
}
PUT my_index/blog/1
{
  "title":"yum源",
  "content":"CentOS更换国内yum源",
  "url":"http://url.cn/53788351"
}
PUT my_index/blog/2
{
  "title":"Ambari",
  "content":"CentOS7.x下的Ambari2.4源码编译",
  "url":"http://url.cn/53844169"
}

查询:

GET my_index/blog/1

{
  "_index": "my_index",
  "_type": "blog",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "title": "yum源",
    "url": "http://url.cn/53788351"
  }
}

_size：
整个_source字段的字节数大小
需要安装插件，执行命令bin/elasticsearch-plugin install mapper-size：然后重启elasticsearch

然后重启elasticsearch，mapper-size插件才能生效。

DELETE my_index
PUT my_index
{
  "mappings": {
    "my_type": {
      "_size": {
        "enabled": true
      }
    }
  }
}
PUT my_index/my_type/1
{
  "text": "This is a document"
}

PUT my_index/my_type/2
{
  "text": "This is another document"
}

查询文档时，可以通过_size元字段进行过滤

GET my_index/_search
{
  "query": {
    "range": {
      "_size": {
        "gt": 10
      }
    }
  }
}

{
  "took": 34,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "text": "This is another document"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "text": "This is a document"
        }
      }
    ]
  }
}

tips：可以通过命令bin/elasticsearch-plugin remove mapper-size删除mapper-size插件。

_all
_all字段是吧其他字段拼接在一起的超级字段，所有的字段用空格分开，_all字段会被解析和索引，但是不存储。当你只想返回某个关键字的文档但不是明确的搜索某个字段的时候就需要使用_all字段。
按照官方文档的说法，_all字段默认是禁用的，如果需要使用，可以通过"_all": {"enabled": true}开启，测试如下。

PUT myindex
{
  "mappings": {
    "mytype": {
      "_all": {"enabled": true},
      "properties": {
        "title": { 
          "type": "text"
        },
        "content": { 
          "type": "text"
        }
      }
    }
  }
}

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "Failed to parse mapping [mytype]: Enabling [_all] is disabled in 6.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field."
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping [mytype]: Enabling [_all] is disabled in 6.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Enabling [_all] is disabled in 6.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field."
    }
  },
  "status": 400
}

根据报错信息我们可以得知，通过copy_to实现类似_all的用途

_field_names
_field_names字段索引文档中每个字段的名称，其中包括除null以外的任何值。存在查询使用此字段来查找对于特定字段具有或不具有任何非空值的文档。

PUT my_index
PUT my_index/my_type/1
{
  "title": "This is a document"
}
PUT my_index/my_type/2?refresh=true
{
  "title": "This is another document",
  "body": "This document has a body"
}

查询

GET my_index/_search
{
  "query": {
    "terms": {
      "_field_names": ["body"]
    }
  }
}

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "title": "This is another document",
          "body": "This document has a body"
        }
      }
    ]
  }
}

routing
使用以下公式将文档路由到索引中的特定分片
shard_num=hash(_routing)%num_primary_shards
routing的默认值是文档的_id
自定义路由模式可以通过指定每个文档的自定义路由值来实现。

DELETE my_index
PUT my_index/my_type/1?routing=user1&refresh=true
{
  "title": "This is a document"
}

查询

GET my_index/my_type/1?routing=user1

{
  "_index": "my_index",
  "_type": "my_type",
  "_id": "1",
  "_version": 1,
  "_routing": "user1",
  "found": true,
  "_source": {
    "title": "This is a document"
  }
}

查询中可以使用_routing字段的值

GET my_index/_search
{
  "query": {
    "terms": {
      "_routing":["user1"]
    }
  }
}

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 1,
        "_routing": "user1",
        "_source": {
          "title": "This is a document"
        }
      }
    ]
  }
}

ElasticSearch学习之路-day14

猜你喜欢