Database Compaction

Because CouchDB/BigCouch uses an append-only file to write changes to, you must monitor the size of those files on disk and periodically compact them. Compaction is the process of taking all the current revisions in the file and writing a new file with just the active documents, effectively deleting all old revisions. Compaction can have dramatic results in recovered disk space and performance (as seek times are reduced on smaller files).

Automatic CompactionCopied

2600Hz allows the cluster operator to enable automatic compaction of all databases. A process will walk the list of known databases (updating each time the process starts again) and check each database against a heuristic to decide whether to compact or not.

System Configuration SchemaCopied

Schema

Schema for 2600Hz_couch system_config

Key	Description	Type	Default	Required
`admin_port`	The CouchDB API port, typically 5986	`integer()`	`5986`	`false`
`allow_maintenance_db_delete`	kazoo_couch allow maintenance db delete	`boolean()`	`false`	`false`
`api_port`	The CouchDB API port, typically 5984	`integer()`	`5984`	`false`
`compact_automatically`	kazoo_couch compact automatically	`boolean()`	`false`	`false`
`default_chunk_size`	kazoo_couch default chunk size	`integer()`	`1000`	`false`
`max_compacting_shards`	kazoo_couch maximum compacting shards	`integer()`	`2`	`false`
`max_compacting_views`	kazoo_couch maximum compacting views	`integer()`	`2`	`false`
`max_concurrent_docs_to_archive`	kazoo_couch maximum concurrent docs to archive	`integer()`	`500`	`false`
`max_wait_for_compaction_pids`	kazoo_couch maximum wait for compaction pids	`integer()`	`360000`	`false`
`min_data_size`	kazoo_couch minimum data size	`integer()`	`131072`	`false`
`min_ratio`	kazoo_couch minimum ratio	`number()`	`1.2`	`false`
`sleep_between_poll`	kazoo_couch sleep between poll	`integer()`	`3000`	`false`
`use_bigcouch_direct`	kazoo_couch use bigcouch direct	`boolean()`	`true`	`false`

OperationsCopied

The compaction module kt_compactor binds into the triggers for all databases. Based on the tasks.browse_dbs_interval_s in system_config, each database will be processed each time the timer expires. The default timeout is daily, but the next timer won’t start until all the databases have been processed by all task modules bound for them. On large systems, there could be so many databases and operations could take a long time on them, that the next timer won’t start for more than a day.

Typically there’s nothing wrong with that, just know that the timer isn’t a daily timer but instead is 84600 seconds after the last database is processed.

Per-node settingsCopied

Its important to set per-couch-node settings if your setup varies from the default ports (5984 and 5986).

sup kapps_config set_node kazoo_couch admin_port 35986 couchdb@db001.host.com

Repeat for admin_port and api_port for each database node in your setup (curl localhost:5984/_membership if you need your list of nodes).

API-initiated compactionCopied

In addition to the automatic crawling of databases available, the Tasks API contains methods for initiating compaction of the cluster, per-node, or per-database.

Compact EverythingCopied

This is an input-less compaction job. At the end of the run, you should have a CSV with output about the database shards on each node with their disk/data usage before and after compaction.

Create the task:

curl -v -X PUT \
-H "X-Auth-Token: {AUTH_TOKEN}" \
'http://{SERVER}:8000/v2/tasks?category=compaction&action=compact_all'

Start the task:

curl -v -X PATCH \
-H "X-Auth-Token: {AUTH_TOKEN}" \
'http://{SERVER}:8000/v2/tasks/{TASK_ID}'

Query the task’s status:

curl -v -X GET \
-H "X-Auth-Token: {AUTH_TOKEN}" \
'http://{SERVER}:8000/v2/tasks/{TASK_ID}'

Export the results

curl -v -X GET \
-o compaction_results.csv
-H "Accept: text/csv"
-H "X-Auth-Token: $AUTH_TOKEN"
'http://{SERVER}:8000/v2/tasks/{TASK_ID}?csv_name=out.csv'

node,database,before_disk,before_data,after_disk,after_data
node3@127.0.0.1,faxes,804116,1544,804116,1544
node2@127.0.0.1,faxes,554260,1544,1230100,1544
node1@127.0.0.1,faxes,132372,1544,132372,1544

Compact a nodeCopied

If you prefer, you can specify a CSV with nodes to compact, including a flag to force compaction or use the heuristics to do it.

CSV columns

For instance, compact two nodes, forcing all databases to be compacted on the first node, and only those meeting the ratio criteria on the second node:

"node","force"
"bigcouch@db001.zone1.host.com","true"
"bigcouch@db004.zone3.host.co.uk","false"

API commands

Create the task: shell curl -v -X PUT \ -H "X-Auth-Token: $AUTH_TOKEN" \ -H "Content-Type: text/csv" \ 'http://localhost:8000/v2/tasks?category=compaction&action=compact_node' \ --data-binary $'"node","force"\n"node1@127.0.0.1","false"\n"node2@127.0.0.1","true"'

Note: if you want to specify the CSV content in the args to cURL, you need to use $'...' to allow the \n to be passed to Crossbar properly. Otherwise create a CSV file and use @file.csv instead.

Start, query, and export the results as before.

Compact a databaseCopied

Compacting a database will ensure all shards on relevant nodes are compacted.

CSV columns

For instance, compact two databases, forcing compaction on the first database, and apply the ratio heuristic to see if the second database should be compacted.

"database","force"
"system_config","true"
"accounts","false"

API commands

Create the task:

curl -v -X PUT \
-H "X-Auth-Token: {AUTH_TOKEN}" \
-H "Content-Type: application/json" \
--data-binary @/path/to/node.csv
'http://{SERVER}:8000/v2/tasks?category=compaction&action=compact_db'

Start, query and export as before.

Via SUPCopied

You can compact all databases on a node:

> sup kt_compactor compact_node {BIGCOUCH@SERVER1.COM}
node, database, before_disk, before_data, after_disk, after_data
{BIGCOUCH@SERVER1.COM}, webhooks, 41056, 8273, 24672, 8273
{BIGCOUCH@SERVER1.COM}, token_auth, 24672, 283, 8288, 283
{BIGCOUCH@SERVER1.COM}, tasks, 28768, 2724, 12384, 2724
...

You can compact a database across all nodes:

> shell kt_compactor compact_db {DATABASE}
node, database, before_disk, before_data, after_disk, after_data
{BIGCOUCH@SERVER1.COM}, {DATABASE}, 24672, 4192, 16480, 4192
{BIGCOUCH@SERVER2.COM}, {DATABASE}, 24672, 4192, 16480, 4192
{BIGCOUCH@SERVER3.COM}, {DATABASE}, 24672, 4192, 16480, 4192

You can compact account databases by using the account id; compact MODBs with {ACCOUNT_ID}-{YYYY}{MM}.

Database Compaction

Automatic CompactionCopied

System Configuration SchemaCopied

Schema

OperationsCopied

Per-node settingsCopied

API-initiated compactionCopied

Compact EverythingCopied

Compact a nodeCopied

CSV columns

API commands

Compact a databaseCopied

CSV columns

API commands

Via SUPCopied

On this page