MongoDB: slow removeShard



  • I have a MongoDB cluster with 9 nodes (3 shards, 3 nodes each). I'm now removing one shard, but the process itself is running extremely slow. Each node of the shard being deleted stores ~400Gb of data, which isn't too much, I suppose. But approximation shows that the process of draining finishes in 200+ days.

    I was wondering if there is a way to speed up this process. I have enough free resources (CPU, Mem, IO), I mean 3x more than nodes consume now. I've already looked at balancers settings like _secondaryThrottle or _waitForDelete without much success.

    MongoDB 4.4.13



  • As it turns out, during analyzing events in changelog MongoDB waits too long on the transfer step inside every MoveChunk.to/MoveChunk.from event.

    For now, MongoDB can't guarantee consistency for queries on secondaries that have been started before chunk migration with anything other than sleep. By default, MongoDB just waits for 15 minutes before deleting a chunk from secondary and moving on. So a migration of every chunk took >15min.

    Moreover, MongoDB has had an issue for about 5 years.

    https://www.mongodb.com/docs/v4.4/reference/parameters/#mongodb-parameter-param.orphanCleanupDelaySecs

    https://jira.mongodb.org/browse/SERVER-31837




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2