Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

Vinllen Chen


但行好事,莫问前程

MongoDB removeShard的流程

  1. 向mongos发送removeShard命令
  2. mongos向cs发送_configsvrRemoveShard命令
  3. cs查询除了要remove的shard,是否还有别的shard在drain状态,是的话则返回失败。{"_id": {$ne: "shard1"}, "drainging": true}}}
    表明同一时刻只能有一个removeShard操作。remove_shard.png
  4. cs查询是否要remove的shard是最后一个shard,是的话本次请求返回失败。
  5. cs查询要remove的shard是否在drain状态,如果不是的话,则置draining状态为true,并且reload shardRegistry(缓存的shard的路由表)。日志层面记录removeShard开始
  6. cs查询shard对应的chunk表,还遗留有多少条chunk
  7. cs查询shard对应的database表,还遗留有多少个以当前shard为primary的db
  8. 如果还遗留chunk或者database,则返回ongoing状态。chunk靠balancer移动,primary database需要手动移动(通常是调用movePrimary命令)。
  9. 如果都已经空了,则删除config.shards表里面对应的shard,并重新reload shardRegistry
  10. cs移除对应shard的连接和监控,并记录日志,标记完成removeShard,返回completed状态。

什么时候会发起move chunk的操作?3.4之后由cs的balancer线程定期轮询,发现有shard位于draining状态,则发起move chunk操作。相当于这个跟removeShard是一个异步的操作。

通常用户在removeShard返回中,如果state是ongoing表示还在move chunk,remaining字段会显示还没有move完毕的chunks数:

{
   "msg" : "draining ongoing",
   "state" : "ongoing",
   "remaining" : {
      "chunks" : NumberLong(2),
      "dbs" : NumberLong(2),
      "jumboChunks" : NumberLong(0)         // Available starting in 4.2.2 (and 4.0.14)
   },
   "note" : "you need to drop or movePrimary these databases",
   "dbsToMove" : [
      "fizz",
      "buzz"
   ],
   "ok" : 1,
   "operationTime" : Timestamp(1575399086, 1655),
   "$clusterTime" : {
      "clusterTime" : Timestamp(1575399086, 1655),
      "signature" : {
         "hash" : BinData(0,"XBrTmjMMe82fUtVLRm13GBVtRE8="),
         "keyId" : NumberLong("6766255701040824328")
      }
   }
}

如果没有remaining字段而只有dbsToMove表示move chunk已经完毕或者没有必要,用户需要手动移除dbsToMove下面的db,通常采用movePrimary命令把其挪到别的shard上:

{
   "msg" : "draining started successfully",
   "state" : "started",
   "shard" : "bristol01",
   "note" : "you need to drop or movePrimary these databases",
   "dbsToMove" : [
      "fizz",
      "buzz"
   ],
   "ok" : 1,
   "operationTime" : Timestamp(1575398919, 2),
   "$clusterTime" : {
      "clusterTime" : Timestamp(1575398919, 2),
      "signature" : {
         "hash" : BinData(0,"Oi68poWCFCA7b9kyhIcg+TzaGiA="),
         "keyId" : NumberLong("6766255701040824328")
      }
   }
}

如果都完毕了,state状态返回completed

{
   "msg" : "removeshard completed successfully",
   "state" : "completed",
   "shard" : "bristol01",
   "ok" : 1,
   "operationTime" : Timestamp(1575400370, 2),
   "$clusterTime" : {
      "clusterTime" : Timestamp(1575400370, 2),
      "signature" : {
         "hash" : BinData(0,"JjSRciHECXDBXo0e5nJv9mdRG8M="),
         "keyId" : NumberLong("6766255701040824328")
      }
   }
}

参考:

https://docs.mongodb.com/manual/reference/command/removeShard/


About the author

vinllen chen

Beijing, China

格物致知


Discussions

comments powered by Disqus