Skip to main content

ClickHouse

Signals Provided
  • Infrastructure Entities - Complete infrastructure topology including compute, storage, and networking resources
  • Metrics - Performance metrics from applications and infrastructure
  • Symptoms - Automatic symptom detection from metrics, traces, and external monitoring systems

Overview

Causely provides native integration with ClickHouse to help you identify and resolve database issues before they impact your users.

Instead of just monitoring symptoms, Causely analyzes real-time signals to surface the underlying causal factors driving database issues.

By setting up the ClickHouse integration, you will be able to do the following:

  • Identify causes for reliability issues originating from your ClickHouse database, including:

  • Observe the database as an entity in the Topology Graph, including its relationships to other entities on the service map, infrastructure stack, and dataflow map.

  • Get insights into the slowest queries over a rolling 12-hour window, and troubleshoot them with Ask Causely directly from the UI.

The integration supports both self-hosted ClickHouse instances and cloud-managed deployments.

Setup Guide

Step 1: Create a user

Create a dedicated user in your ClickHouse instance and grant it access to the system tables that Causely requires:

CREATE USER causely_user IDENTIFIED BY 'your-password';

GRANT SELECT ON system.tables TO causely_user;
GRANT SELECT ON system.columns TO causely_user;
GRANT SELECT ON system.databases TO causely_user;
GRANT SELECT ON system.query_log TO causely_user;
GRANT SELECT ON system.mutations TO causely_user;

The integration reads from the following system tables:

TablePurpose
system.tablesTable names, row counts, and sizes
system.columnsColumn definitions and schema information
system.databasesDatabase discovery
system.query_logSlow query analysis (top 10 by total execution time, rolling 12-hour window)
system.mutationsActive mutation detection for lock monitoring

Step 2: Create a Kubernetes secret for the user

Create a Kubernetes secret with the ClickHouse connection details. The secret supports two protocols:

  • Native (default): binary protocol on port 9000
  • HTTP: HTTP/HTTPS protocol on port 8123 / 8443

Option 1: Single Database Configuration

kubectl create secret generic \
--namespace causely clickhouse-credentials \
--from-literal=username="causely_user" \
--from-literal=password='...' \
--from-literal=host="..." \
--from-literal=port="9000" \
--from-literal=database="..." \
--from-literal=protocol="native" \
--from-literal=secure="false"

To connect over HTTP instead of the native protocol:

kubectl create secret generic \
--namespace causely clickhouse-credentials \
--from-literal=username="causely_user" \
--from-literal=password='...' \
--from-literal=host="..." \
--from-literal=port="8123" \
--from-literal=database="..." \
--from-literal=protocol="http" \
--from-literal=secure="false"

Option 2: Multiple Databases Configuration

To monitor multiple databases within the same ClickHouse instance, specify them as a comma-separated list using the databases field:

kubectl create secret generic \
--namespace causely clickhouse-credentials-multidb \
--from-literal=username="causely_user" \
--from-literal=password='...' \
--from-literal=host="..." \
--from-literal=port="9000" \
--from-literal=databases="database1,database2,database3" \
--from-literal=protocol="native" \
--from-literal=secure="false"

Alternatively, use a YAML manifest:

apiVersion: v1
kind: Secret
metadata:
name: clickhouse-credentials-multidb
namespace: causely
type: Opaque
stringData:
username: 'causely_user'
password: '...'
host: '...'
port: '9000'
databases: 'database1,database2,database3'
protocol: 'native'
secure: 'false'

Note: Use either the database field for a single database or the databases field for multiple databases. Do not use both in the same secret.

Option 3: Database Auto-Discovery

Causely can automatically discover all databases on a ClickHouse server. Add auto_discovery: "true" to the secret:

kubectl create secret generic \
--namespace causely clickhouse-credentials \
--from-literal=username="causely_user" \
--from-literal=password='...' \
--from-literal=host="..." \
--from-literal=port="9000" \
--from-literal=protocol="native" \
--from-literal=secure="false" \
--from-literal=auto_discovery="true"

When auto-discovery is enabled, Causely queries system.databases (excluding system, INFORMATION_SCHEMA, and information_schema) and starts a scraper for each discovered database. Discovery runs periodically to pick up newly created databases.

The host must be the FQDN of your ClickHouse instance, or an IP address if no DNS entry is set up. It must match the FQDN/IP Causely would discover either from the Kubernetes API or your cloud provider's API.

If you are connecting through a proxy, set host to the proxy address and host_overwrite to the actual ClickHouse instance address:

--from-literal=host="my-proxy.example.com"
--from-literal=host_overwrite="my-clickhouse.example.com"

Secret field reference

FieldRequiredDefaultDescription
hostYesHostname or IP of the ClickHouse server
usernameYesClickHouse user name
passwordYesClickHouse user password
databaseYes*Single database to monitor
databasesYes*Comma-separated list of databases to monitor
portNo9000 (native) / 8123 (HTTP)ClickHouse port
protocolNonativeConnection protocol: native or http
secureNofalseEnable TLS (true or false)
host_overwriteNoOverride the host used for entity resolution
port_overwriteNoOverride the port used for entity resolution
auto_discoveryNofalseAutomatically discover all databases

*Either database or databases must be set, unless auto_discovery is enabled.

Step 3: Update Causely Configuration

Once the secret is created, update the Causely configuration to enable scraping for the new instance:

scrapers:
clickhouse:
enabled: true
instances:
- secretName: clickhouse-credentials
namespace: causely

Alternative: Enable Credentials Autodiscovery

Causely also supports credentials autodiscovery, which lets you add new scraping targets without modifying the Causely configuration. Label the Kubernetes secret to enable autodiscovery:

kubectl --namespace causely label secret clickhouse-credentials "causely.ai/scraper=ClickHouse"

Verify Your Configuration

After completing the setup, run these queries against your ClickHouse instance to verify that the Causely user has the required access.

Quick Access Check

SELECT
(SELECT count() FROM system.tables LIMIT 1) > 0 AS tables_ok,
(SELECT count() FROM system.columns LIMIT 1) > 0 AS columns_ok,
(SELECT count() FROM system.databases LIMIT 1) > 0 AS databases_ok,
(SELECT count() FROM system.query_log LIMIT 1) >= 0 AS query_log_ok,
(SELECT count() FROM system.mutations LIMIT 1) >= 0 AS mutations_ok;

All columns should return 1 (true).

Detailed Checks

1. System tables access

-- Each of these should return a result without error
SELECT 1 FROM system.tables LIMIT 1;
SELECT 1 FROM system.columns LIMIT 1;
SELECT 1 FROM system.databases LIMIT 1;
SELECT 1 FROM system.query_log LIMIT 1;
SELECT 1 FROM system.mutations LIMIT 1;

If any query fails with an access denied error, grant the missing privilege to your Causely user:

-- Run as admin
GRANT SELECT ON system.<table_name> TO causely_user;

2. Test slow query collection

Run this query to confirm Causely can collect slow query data:

SELECT
normalized_query_hash,
count() AS calls,
sum(query_duration_ms) AS total_exec_time_ms
FROM system.query_log
WHERE type = 'QueryFinish'
AND event_time >= now() - toIntervalHour(12)
GROUP BY normalized_query_hash
ORDER BY total_exec_time_ms DESC
LIMIT 5;

This should return results without error. An empty result set is normal on a freshly configured instance—entries will appear as queries run.

Success

If all checks pass, your ClickHouse instance is correctly configured for Causely monitoring.

Setup Checklist

  • causely_user created in ClickHouse
  • SELECT granted on system.tables, system.columns, system.databases, system.query_log, system.mutations
  • Kubernetes secret created with correct host, username, password, database/databases, protocol, and port
  • Causely configuration updated (or secret labeled for autodiscovery)
  • Verification queries succeed without access errors

What Data is Collected

The ClickHouse scraper collects comprehensive metadata and performance information from your ClickHouse databases, including:

  • Database entities with names and relationships to hosting services
  • Service-to-database mappings (which service provides which database)
  • Connection details including host, port, and protocol configuration
  • Table information including names, row counts, and sizes (from system.tables)
  • Complete table schemas with column definitions, data types, default expressions, and comments (from system.columns)
  • Slow query analysis using system.query_log: top 10 queries by total execution time over a rolling 12-hour window, including call counts, total and average execution time, and rows read
  • Mutation lock monitoring using system.mutations: active mutations are tracked as exclusive locks to detect contention on tables