索引 简介
- 索引是用来加速查询的,实际项目中,需要对哪些字段进行快速检索,则可以为这些字段建立索引!
- 数据库索引与书籍的索引类似:有了索引就不需要翻遍整本书,数据库则可以直接在索引中查找,使得查找速度能提高几个数量级。在索引中找到条目以后,就可以直接跳转到目标文档的位置。
环境准备
- 既然索引是为了提高检索速度的,本文将新建一个数据库 mydb2,创建一个集合 c1、然后往里面添加 五百万条 数据,接着查询其中的某一条数据需要花费多少时间,然后为它建立索引之后再次检索,看花费多少时间。
- 先启动 MongoDB 数据库 :mongod --dbpath=D:\MongoDB\Data
- 客户端连接 MongoDB 数据库:mongo ,这些操作不熟悉的可以从参考《MongoDB 下载_安装_配置 及 启动与连接》
- 如下所示,创建数据库 mydb2,往集合 c1 中插入 5 百万条数据,Win10 64位系统 8G内存,插入操作执行了约 20分钟。
> show dbs
admin 0.000GB
config 0.000GB
local 0.000GB
mydb1 0.000GB
> use mydb2
switched to db mydb2
> for(var i=0;i<5000000;i++){
... db.c1.insert({name:"华安"+i,age:i});
... }
WriteResult({ "nInserted" : 1 })
>
- 如下所示,可以再新开一个 cmd 窗口,然后连接 MongoDB 数据库,实时查看数据插入的情况。
> use mydb2
switched to db mydb2
> show tables
c1
> db.c1.find().count()
308624
> db.c1.find().count()
311139
> db.c1.find().count()
314849
> db.c1.find().count()
411957
> db.c1.find().count()
432314
> db.c1.find().count()
505586
> db.c1.find().count()
625517
> db.c1.find().count()
632710
> db.c1.find().count()
1817546
> db.c1.find().count()
4301947
> db.c1.find().count()
5000000
> db.c1.find().count()
5000000
> db.c1.find().count()
5000000
>
- 如下所示,此时查询时,它会扫描整个 c1 集合中的 5百万条数据,检索 name 等于 "华安500" 的时间约 2346 豪秒
- 后面的 索引操作 部分会为 age 字段创建索引,然后再次检索进行对比。
- explain("executionStats") 方法可以看到检索的详细信息,后面会详细介绍
- executionTimeMillis 表示检索耗费的时间(毫秒)
- totalDocsExamined 表示扫描的总文档数
> db.c1.find({age:500})
{ "_id" : ObjectId("5b98716247640bc808f6ef2d"), "name" : "华安500", "age" : 500 }
> db.c1.find({name:"华安500"}).explain("executionStats");
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydb2.c1",
"indexFilterSet" : false,
"parsedQuery" : {
"name" : {
"$eq" : "华安500"
}
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"name" : {
"$eq" : "华安500"
}
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 2346,
"totalKeysExamined" : 0,
"totalDocsExamined" : 5000000,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"name" : {
"$eq" : "华安500"
}
},
"nReturned" : 1,
"executionTimeMillisEstimate" : 2113,
"works" : 5000002,
"advanced" : 1,
"needTime" : 5000000,
"needYield" : 0,
"saveState" : 39154,
"restoreState" : 39154,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 5000000
}
},
"serverInfo" : {
"host" : "SC-201707281232",
"port" : 27017,
"version" : "4.0.2-rc0",
"gitVersion" : "fc1573ba18aee42f97a3bb13b67af7d837826b47"
},
"ok" : 1
}
>
索引操作
- 创建普通索引:db.collection.ensureIndex({key:1})
- 创建唯一索引:db.collection.ensureIndex({key:1},{unique:true})
- 查看关于索引的相关信息:db.collection.stats()
- 查看查询使用索引的情况:db.collection.find({key:value}).explain()
- 删除索引:db.collection.dropIndex({key:1})
- 删除集合,也会将集合中的索引全部删除
创建普通索引
- 创建普通索引:db.collection.ensureIndex({key:1})
- 如下所示,为 age 字段创建普通索引成功(说明:本文使用 MongoDB 4.0.2 版本)
> db.c1.ensureIndex({age:1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
- MongoDB 客户端运行创建索引的命令后,在 MongoDB 服务端 cmd 面板中会看到执行的进度信息,如下所示就是索引构建的进度,耗时 14 秒构建完成 5百万条文档的索引
2018-09-12T10:34:44.562+0800 I INDEX [conn1] building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2018-09-12T10:34:47.000+0800 I - [conn1] Index Build: 1457800/5000000 29%
2018-09-12T10:34:50.000+0800 I - [conn1] Index Build: 3314600/5000000 66%
2018-09-12T10:34:58.882+0800 I INDEX [conn1] build index done. scanned 5000000 total records. 14 secs
- 如下所示,为 age 字段构建了索引之后,再次查询 age 字段时,完全是秒查。
- 通过 explain("executionStats") 方法可以看到
- executionTimeMillis : 0,即检索时间为 0秒,完全秒查
- totalDocsExamined : 1,即扫描的文档总数为1,因为有索引,根据索引可以直接找到文档。
> db.c1.find({age:1000});
{ "_id" : ObjectId("5b98716347640bc808f6f121"), "name" : "华安1000", "age" : 1000 }
> db.c1.find({age:1000}).explain("executionStats");
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydb2.c1",
"indexFilterSet" : false,
"parsedQuery" : {
"age" : {
"$eq" : 1000
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"age" : 1
},
"indexName" : "age_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"age" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"age" : [
"[1000.0, 1000.0]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 0,
"totalKeysExamined" : 1,
"totalDocsExamined" : 1,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 1,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"age" : 1
},
"indexName" : "age_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"age" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"age" : [
"[1000.0, 1000.0]"
]
},
"keysExamined" : 1,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
},
"serverInfo" : {
"host" : "SC-201707281232",
"port" : 27017,
"version" : "4.0.2-rc0",
"gitVersion" : "fc1573ba18aee42f97a3bb13b67af7d837826b47"
},
"ok" : 1
}
>
查看索引信息
- 查看关于索引的相关信息:db.collection.stats()
- 查看查询使用索引的情况:db.collection.find({key:value}).explain()
> db.c1.stats()
{
"ns" : "mydb2.c1",
"size" : 293888890,
"count" : 5000000,
"avgObjSize" : 58,
"storageSize" : 93253632,
"capped" : false,
"wiredTiger" : {
"metadata" : {
"formatVersion" : 1
},
"creationString" : "access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,read_timestamp=none),block_allocation=best,block_compressor=snappy,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_custom=(prefix=,start_generation=0,suffix=),merge_max=15,merge_min=0),memory_page_image_max=0,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,type=file,value_format=u",
"type" : "file",
"uri" : "statistics:table:collection-0-7507112370645922167",
"LSM" : {
"bloom filter false positives" : 0,
"bloom filter hits" : 0,
"bloom filter misses" : 0,
"bloom filter pages evicted from cache" : 0,
"bloom filter pages read into cache" : 0,
"bloom filters in the LSM tree" : 0,
"chunks in the LSM tree" : 0,
"highest merge generation in the LSM tree" : 0,
"queries that could have benefited from a Bloom filter that did not exist" : 0,
"sleep for LSM checkpoint throttle" : 0,
"sleep for LSM merge throttle" : 0,
"total size of bloom filters" : 0
},
"block-manager" : {
"allocations requiring file extension" : 11465,
"blocks allocated" : 11560,
"blocks freed" : 73,
"checkpoint size" : 93204480,
"file allocation unit size" : 4096,
"file bytes available for reuse" : 32768,
"file magic number" : 120897,
"file major version number" : 1,
"file size in bytes" : 93253632,
"minor version number" : 0
},
"btree" : {
"btree checkpoint generation" : 100,
"column-store fixed-size leaf pages" : 0,
"column-store internal pages" : 0,
"column-store variable-size RLE encoded values" : 0,
"column-store variable-size deleted values" : 0,
"column-store variable-size leaf pages" : 0,
"fixed-record size" : 0,
"maximum internal page key size" : 368,
"maximum internal page size" : 4096,
"maximum leaf page key size" : 2867,
"maximum leaf page size" : 32768,
"maximum leaf page value size" : 67108864,
"maximum tree depth" : 3,
"number of key/value pairs" : 0,
"overflow pages" : 0,
"pages rewritten by compaction" : 0,
"row-store internal pages" : 0,
"row-store leaf pages" : 0
},
"cache" : {
"bytes currently in the cache" : 683932272,
"bytes read into cache" : 0,
"bytes written from cache" : 325368864,
"checkpoint blocked page eviction" : 0,
"data source pages selected for eviction unable to be evicted" : 0,
"eviction walk passes of a file" : 0,
"eviction walk target pages histogram - 0-9" : 0,
"eviction walk target pages histogram - 10-31" : 0,
"eviction walk target pages histogram - 128 and higher" : 0,
"eviction walk target pages histogram - 32-63" : 0,
"eviction walk target pages histogram - 64-128" : 0,
"eviction walks abandoned" : 0,
"eviction walks gave up because they restarted their walk twice" : 0,
"eviction walks gave up because they saw too many pages and found no candidates" : 0,
"eviction walks gave up because they saw too many pages and found too few candidates" : 0,
"eviction walks reached end of tree" : 0,
"eviction walks started from root of tree" : 0,
"eviction walks started from saved location in tree" : 0,
"hazard pointer blocked page eviction" : 0,
"in-memory page passed criteria to be split" : 150,
"in-memory page splits" : 75,
"internal pages evicted" : 0,
"internal pages split during eviction" : 0,
"leaf pages split during eviction" : 0,
"modified pages evicted" : 0,
"overflow pages read into cache" : 0,
"page split during eviction deepened the tree" : 0,
"page written requiring lookaside records" : 0,
"pages read into cache" : 0,
"pages read into cache after truncate" : 1,
"pages read into cache after truncate in prepare state" : 0,
"pages read into cache requiring lookaside entries" : 0,
"pages requested from the cache" : 5314066,
"pages seen by eviction walk" : 0,
"pages written from cache" : 11507,
"pages written requiring in-memory restoration" : 0,
"tracked dirty bytes in the cache" : 0,
"unmodified pages evicted" : 0
},
"cache_walk" : {
"Average difference between current eviction generation when the page was last considered" : 0,
"Average on-disk page image size seen" : 0,
"Average time in cache for pages that have been visited by the eviction server" : 0,
"Average time in cache for pages that have not been visited by the eviction server" : 0,
"Clean pages currently in cache" : 0,
"Current eviction generation" : 0,
"Dirty pages currently in cache" : 0,
"Entries in the root page" : 0,
"Internal pages currently in cache" : 0,
"Leaf pages currently in cache" : 0,
"Maximum difference between current eviction generation when the page was last considered" : 0,
"Maximum page size seen" : 0,
"Minimum on-disk page image size seen" : 0,
"Number of pages never visited by eviction server" : 0,
"On-disk page image sizes smaller than a single allocation unit" : 0,
"Pages created in memory and never written" : 0,
"Pages currently queued for eviction" : 0,
"Pages that could not be queued for eviction" : 0,
"Refs skipped during cache traversal" : 0,
"Size of the root page" : 0,
"Total number of pages currently in cache" : 0
},
"compression" : {
"compressed pages read" : 0,
"compressed pages written" : 11393,
"page written failed to compress" : 0,
"page written was too small to compress" : 114,
"raw compression call failed, additional data available" : 0,
"raw compression call failed, no additional data available" : 0,
"raw compression call succeeded" : 0
},
"cursor" : {
"bulk-loaded cursor-insert calls" : 0,
"create calls" : 3,
"cursor operation restarted" : 0,
"cursor-insert key and value bytes inserted" : 313806525,
"cursor-remove key bytes removed" : 0,
"cursor-update value bytes updated" : 0,
"cursors cached on close" : 0,
"cursors reused from cache" : 5000013,
"insert calls" : 5000000,
"modify calls" : 0,
"next calls" : 40000008,
"prev calls" : 1,
"remove calls" : 0,
"reserve calls" : 0,
"reset calls" : 10313408,
"search calls" : 8,
"search near calls" : 313375,
"truncate calls" : 0,
"update calls" : 0
},
"reconciliation" : {
"dictionary matches" : 0,
"fast-path pages deleted" : 0,
"internal page key bytes discarded using suffix compression" : 26736,
"internal page multi-block writes" : 27,
"internal-page overflow keys" : 0,
"leaf page key bytes discarded using prefix compression" : 0,
"leaf page multi-block writes" : 102,
"leaf-page overflow keys" : 0,
"maximum blocks required for a page" : 1,
"overflow values written" : 0,
"page checksum matches" : 2599,
"page reconciliation calls" : 156,
"page reconciliation calls for eviction" : 0,
"pages deleted" : 0
},
"session" : {
"cached cursor count" : 3,
"object compaction" : 0,
"open cursor count" : 0
},
"transaction" : {
"update conflicts" : 0
}
},
"nindexes" : 2,
"totalIndexSize" : 117092352,
"indexSizes" : {
"_id_" : 50466816,
"age_1" : 66625536
},
"ok" : 1
}
>
- 如上所示,可以看到最后的 "indexSizes" 部分中有两个索引,分别是 _id_,与 age_
- _id_:这是 MongoDB 自己维护的主键字段 (_id),默认是建索引的
- age_:这是为 age 字段新建的索引,索引名称是 字段名称 加 下斜杠。
删除索引
- 删除索引:db.collection.dropIndex({key:1}),如下所示 删除 age 字段的索引,再次使用 stats() 查看信息时,已经没有 age_ 索引了
> db.c1.dropIndex({age:1})
{ "nIndexesWas" : 2, "ok" : 1 }
> db.c1.stats()
{
"ns" : "mydb2.c1",
"size" : 293888890,
"count" : 5000000,
"avgObjSize" : 58,
"storageSize" : 93253632,
"capped" : false,
"wiredTiger" : {
..............
},
"nindexes" : 1,
"totalIndexSize" : 50466816,
"indexSizes" : {
"_id_" : 50466816
},
"ok" : 1
}
创建唯一索引
- 创建唯一索引:db.collection.ensureIndex({key:1},{unique:true})
- 为某个字段建立唯一索引,则这个键对应的值必须不能重复,相当于 Mysql 的唯一约束,此字段的值必须唯一!
- 唯一索引与普通索引的区别在于 索引字段的值能不能重复,普通索引的字段的值可以重复,唯一索引不能重复!
- 如下所示为 name 字段 建立唯一索引成功。
> db.c1.ensureIndex({name:1},{unique:true})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
>
- 同理可以在 MongoDB 服务端的 cmd 面板中看到索引构建的状态信息。
2018-09-12T11:31:08.471+0800 I INDEX [conn1] building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2018-09-12T11:31:11.000+0800 I - [conn1] Index Build: 1377500/5000000 27%
2018-09-12T11:31:14.000+0800 I - [conn1] Index Build: 3082200/5000000 61%
2018-09-12T11:31:17.000+0800 I - [conn1] Index Build: 4731000/5000000 94%
2018-09-12T11:31:24.570+0800 I INDEX [conn1] build index done. scanned 5000000 total records. 16 secs
- 如下所示,根据 name 字段进行查询,速度与之前为 age 字段创建普通索引时的检索速度一样,都是秒查,没有明显区别
> db.c1.find({name:"华安45000"})
{ "_id" : ObjectId("5b98717147640bc808f79d01"), "name" : "华安45000", "age" : 45000 }
> db.c1.find({name:"华安45000"}).explain("executionStats")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydb2.c1",
"indexFilterSet" : false,
"parsedQuery" : {
"name" : {
"$eq" : "华安45000"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"name" : 1
},
"indexName" : "name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"name" : [ ]
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"华安45000\", \"华安45000\"]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 0,
"totalKeysExamined" : 1,
"totalDocsExamined" : 1,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 1,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"name" : 1
},
"indexName" : "name_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"name" : [ ]
},
"isUnique" : true,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"华安45000\", \"华安45000\"]"
]
},
"keysExamined" : 1,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
},
"serverInfo" : {
"host" : "SC-201707281232",
"port" : 27017,
"version" : "4.0.2-rc0",
"gitVersion" : "fc1573ba18aee42f97a3bb13b67af7d837826b47"
},
"ok" : 1
}
>
- 如下所示 name 字段建立唯一索引之后,如果添加的新文档中的 name 字段值重复时,则添加失败
> db.c1.find({name:"华安45000"})
{ "_id" : ObjectId("5b98717147640bc808f79d01"), "name" : "华安45000", "age" : 45000 }
> db.c1.insert({name:"华安45000",age:45000})
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "E11000 duplicate key error collection: mydb2.c1 index: name_1 dup key: { : \"鍗庡畨45000\" }"
}
})
explain 工具
- explain 是非常有用的工具,能获得查询方面诸多有用的信息
- 如下所示,explain 方法不带参数时,返回信息如下
> db.c1.find().explain();
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydb2.c1",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "COLLSCAN",
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "SC-201707281232",
"port" : 27017,
"version" : "4.0.2-rc0",
"gitVersion" : "fc1573ba18aee42f97a3bb13b67af7d837826b47"
},
"ok" : 1
}
>
- 通常应该为 explain 方法带上参数 executionStats,即 .explain("executionStats"),这样信息会更加全面
> db.c1.find().explain("executionStats");
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydb2.c1",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "COLLSCAN",
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 5000000,
"executionTimeMillis" : 1381,
"totalKeysExamined" : 0,
"totalDocsExamined" : 5000000,
"executionStages" : {
"stage" : "COLLSCAN",
"nReturned" : 5000000,
"executionTimeMillisEstimate" : 952,
"works" : 5000002,
"advanced" : 5000000,
"needTime" : 1,
"needYield" : 0,
"saveState" : 39104,
"restoreState" : 39104,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 5000000
}
},
"serverInfo" : {
"host" : "SC-201707281232",
"port" : 27017,
"version" : "4.0.2-rc0",
"gitVersion" : "fc1573ba18aee42f97a3bb13b67af7d837826b47"
},
"ok" : 1
}
>
- executionStats 信息部分参数解释如下:
- "executionSuccess" : true ————是否执行成功,true 表示成功
- "nReturned" : 5000000 ————检索返回的数据条数,此处为 5 百万条
- "executionTimeMillis" : 1375 ————本次检索耗费的时间,此处为 1357 毫秒,即约 1.3 秒
- "totalDocsExamined" : 5000000 ————本次检索扫描的文档数,此处为 5 百万条