Skip to main content

ClickHouse

Signals Provided
  • Infrastructure Entities - Complete infrastructure topology including compute, storage, and networking resources
  • Metrics - Performance metrics from applications and infrastructure
  • Symptoms - Automatic symptom detection from metrics, traces, and external monitoring systems

Overview

Causely provides native integration with ClickHouse to help you identify and resolve database issues before they impact your users.

Instead of just monitoring symptoms, Causely analyzes real-time signals to surface the underlying causal factors driving database issues.

By setting up the ClickHouse integration, you will be able to do the following:

  • Identify causes for reliability issues originating from your ClickHouse database, including:

  • Observe the database as an entity in the Topology Graph, including its relationships to other entities on the service map, infrastructure stack, and dataflow map.

  • Get insights into the slowest queries over a rolling 12-hour window, and troubleshoot them with Ask Causely directly from the UI.

The integration supports both self-hosted ClickHouse instances and cloud-managed deployments.

Setup Guide

Step 1: Create a user

Create a dedicated user in your ClickHouse instance and grant it access to the system tables that Causely requires:

CREATE USER causely_user IDENTIFIED BY 'your-password';

GRANT SELECT ON system.tables TO causely_user;
GRANT SELECT ON system.columns TO causely_user;
GRANT SELECT ON system.databases TO causely_user;
GRANT SELECT ON system.query_log TO causely_user;
GRANT SELECT ON system.mutations TO causely_user;

System-table grants alone are not sufficient. The ClickHouse user must also be able to see the actual application tables in the databases Causely should discover. Without SHOW/SELECT on those databases, Causely cannot build table entities or relate them to your services.

Grant database-wide visibility (choose the scope that fits your security model):

GRANT SHOW DATABASES ON *.* TO causely_user;
GRANT SELECT ON *.* TO causely_user;

For tighter scope, grant SELECT only on the application databases Causely should monitor:

GRANT SHOW DATABASES ON *.* TO causely_user;
GRANT SELECT ON <database_name>.* TO causely_user;

Or, if you want multiple examples:

GRANT SHOW DATABASES ON *.* TO causely_user;
GRANT SELECT ON db1.* TO causely_user;
GRANT SELECT ON db2.* TO causely_user;

How this maps to discovery:

  • system.databases controls whether Causely can discover database names.
  • system.tables controls whether Causely can discover tables inside those databases.

If the user can see databases but not non-system tables, Causely will discover databases but create no useful entities for your application data.

The integration reads from the following system tables:

TablePurpose
system.tablesTable names, row counts, and sizes
system.columnsColumn definitions and schema information
system.databasesDatabase discovery
system.query_logSlow query analysis (top 10 by total execution time, rolling 12-hour window)
system.mutationsActive mutation detection for lock monitoring

Step 2: Create a Kubernetes secret for the user

Create a Kubernetes secret with the ClickHouse connection details. The secret supports two protocols:

  • Native (default): binary protocol on port 9000
  • HTTP: HTTP/HTTPS protocol on port 8123 / 8443

Option 1: Single Database Configuration

kubectl create secret generic \
--namespace causely clickhouse-credentials \
--from-literal=username="causely_user" \
--from-literal=password='...' \
--from-literal=host="..." \
--from-literal=port="9000" \
--from-literal=database="..." \
--from-literal=protocol="native" \
--from-literal=secure="false"

To connect over HTTP instead of the native protocol:

kubectl create secret generic \
--namespace causely clickhouse-credentials \
--from-literal=username="causely_user" \
--from-literal=password='...' \
--from-literal=host="..." \
--from-literal=port="8123" \
--from-literal=database="..." \
--from-literal=protocol="http" \
--from-literal=secure="false"

Option 2: Multiple Databases Configuration

To monitor multiple databases within the same ClickHouse instance, specify them as a comma-separated list using the databases field:

kubectl create secret generic \
--namespace causely clickhouse-credentials-multidb \
--from-literal=username="causely_user" \
--from-literal=password='...' \
--from-literal=host="..." \
--from-literal=port="9000" \
--from-literal=databases="database1,database2,database3" \
--from-literal=protocol="native" \
--from-literal=secure="false"

Alternatively, use a YAML manifest:

apiVersion: v1
kind: Secret
metadata:
name: clickhouse-credentials-multidb
namespace: causely
type: Opaque
stringData:
username: 'causely_user'
password: '...'
host: '...'
port: '9000'
databases: 'database1,database2,database3'
protocol: 'native'
secure: 'false'

Note: Use either the database field for a single database or the databases field for multiple databases. Do not use both in the same secret.

Option 3: Database Auto-Discovery

Causely can automatically discover all databases on a ClickHouse server. Add auto_discovery: "true" to the secret:

kubectl create secret generic \
--namespace causely clickhouse-credentials \
--from-literal=username="causely_user" \
--from-literal=password='...' \
--from-literal=host="..." \
--from-literal=port="9000" \
--from-literal=protocol="native" \
--from-literal=secure="false" \
--from-literal=auto_discovery="true"

When auto-discovery is enabled, Causely queries system.databases (excluding system, INFORMATION_SCHEMA, and information_schema) and starts a scraper for each discovered database. Discovery runs periodically to pick up newly created databases.

For external or VM-hosted ClickHouse, use a stable DNS hostname whenever possible. Raw IPs may connect successfully but can break service and entity resolution in Causely. The host value should align with how Causely discovers infrastructure (for example, the FQDN from the Kubernetes API or your cloud provider's API).

If connectivity must use an IP address, set host to the working IP and set host_overwrite to the stable DNS hostname that identifies that ClickHouse instance:

--from-literal=host="10.202.225.23"
--from-literal=host_overwrite="clickhouse.internal.company.net"

If you are connecting through a proxy, set host to the proxy address and host_overwrite to the actual ClickHouse instance hostname:

--from-literal=host="my-proxy.example.com"
--from-literal=host_overwrite="my-clickhouse.example.com"

Secret field reference

FieldRequiredDefaultDescription
hostYesHostname or IP used to connect to ClickHouse. Prefer a stable DNS hostname for normal setups; if you must use an IP or proxy here, set host_overwrite to the canonical hostname for topology and entity resolution.
usernameYesClickHouse user name
passwordYesClickHouse user password
databaseYes*Single database to monitor
databasesYes*Comma-separated list of databases to monitor
portNo9000 (native) / 8123 (HTTP)ClickHouse port
protocolNonativeConnection protocol: native or http
secureNofalseEnable TLS (true or false)
host_overwriteNoOverride the host used for topology and entity resolution. Use when the connection host is an IP or proxy address, but ClickHouse should be identified by a stable DNS hostname.
port_overwriteNoOverride the port used for entity resolution
auto_discoveryNofalseAutomatically discover all databases

*Either database or databases must be set, unless auto_discovery is enabled.

Step 3: Update Causely Configuration

Once the secret is created, update the Causely configuration to enable scraping for the new instance:

scrapers:
clickhouse:
enabled: true
instances:
- secretName: clickhouse-credentials
namespace: causely

Alternative: Enable Credentials Autodiscovery

Causely also supports credentials autodiscovery, which lets you add new scraping targets without modifying the Causely configuration. Label the Kubernetes secret to enable autodiscovery:

kubectl --namespace causely label secret clickhouse-credentials "causely.ai/scraper=ClickHouse"

Verify Your Configuration

After completing the setup, run these queries against your ClickHouse instance to verify that the Causely user has the required access.

Troubleshooting

Use these symptoms to narrow down configuration issues:

  • If the UI only shows a scraper path ending in /_discovery, database auto-discovery is running but no per-database scrapers were created (often because no application databases were found, or discovery could not proceed as expected).
  • If the UI shows per-database scrapers with a message like configured database has no eligible tables, the user can see the configured databases (or target databases) but they have no eligible tables for the current scraper logic (see Eligible table types below).
  • If logs show failed to resolve hostname or service mapping errors, set host_overwrite to a stable DNS hostname and avoid relying on a raw IP alone for identity.

Quick Access Check

SELECT
(SELECT count() FROM system.tables LIMIT 1) > 0 AS tables_ok,
(SELECT count() FROM system.columns LIMIT 1) > 0 AS columns_ok,
(SELECT count() FROM system.databases LIMIT 1) > 0 AS databases_ok,
(SELECT count() FROM system.query_log LIMIT 1) >= 0 AS query_log_ok,
(SELECT count() FROM system.mutations LIMIT 1) >= 0 AS mutations_ok;

All columns should return 1 (true).

Detailed Checks

1. System tables access

-- Each of these should return a result without error
SELECT 1 FROM system.tables LIMIT 1;
SELECT 1 FROM system.columns LIMIT 1;
SELECT 1 FROM system.databases LIMIT 1;
SELECT 1 FROM system.query_log LIMIT 1;
SELECT 1 FROM system.mutations LIMIT 1;

If any query fails with an access denied error, grant the missing privilege to your Causely user:

-- Run as admin
GRANT SELECT ON system.<table_name> TO causely_user;

2. Test slow query collection

Run this query to confirm Causely can collect slow query data:

SELECT
normalized_query_hash,
count() AS calls,
sum(query_duration_ms) AS total_exec_time_ms
FROM system.query_log
WHERE type = 'QueryFinish'
AND event_time >= now() - toIntervalHour(12)
GROUP BY normalized_query_hash
ORDER BY total_exec_time_ms DESC
LIMIT 5;

This should return results without error. An empty result set is normal on a freshly configured instance—entries will appear as queries run.

3. Database discovery

Confirm application databases are visible to the Causely user (same filter Causely uses for auto-discovery):

SELECT name
FROM system.databases
WHERE name NOT IN ('system', 'INFORMATION_SCHEMA', 'information_schema')
ORDER BY name;

If this returns no application databases, auto-discovery will only create the _discovery scraper and no per-database scrapers.

4. Table visibility

Confirm the Causely user can see non-system tables:

SELECT database, name, engine
FROM system.tables
ORDER BY database, name
LIMIT 100;

This result set must include non-system tables. If it only returns system.* tables, Causely will not discover application entities.

5. Eligible table types

Replace <configured_db> with a target database from your configuration (or one returned by the Database discovery query above). Causely only creates table entities for engines it treats as eligible:

SELECT database, name, engine
FROM system.tables
WHERE database = '<configured_db>'
AND engine NOT IN ('View', 'MaterializedView', 'Dictionary')
ORDER BY name;

The current scraper excludes View, MaterializedView, and Dictionary. If this query returns zero rows, Causely reports that the configured database has no eligible tables and creates no ClickHouse table entities for that database.

Success

Your ClickHouse instance is correctly configured for Causely when all of the following are true:

  1. System table checks pass (system.tables, system.columns, system.databases, system.query_log, system.mutations).
  2. Application databases appear in system.databases for the Causely user (see Database discovery).
  3. Application (non-system) tables appear in system.tables for the Causely user (see Table visibility).
  4. At least one configured database contains at least one eligible table after excluding View, MaterializedView, and Dictionary (see Eligible table types).

Setup Checklist

  • causely_user created in ClickHouse
  • SELECT granted on system.tables, system.columns, system.databases, system.query_log, system.mutations
  • SHOW DATABASES granted (for example GRANT SHOW DATABASES ON *.*)
  • SELECT granted on the application databases or tables Causely should discover (broad *.* or scoped per database)
  • Kubernetes secret created with correct host, username, password, database/databases, protocol, and port
  • If using an IP address or proxy for connectivity, host_overwrite is set to a stable DNS hostname for topology and entity resolution
  • Causely configuration updated (or secret labeled for autodiscovery)
  • Verification queries succeed without access errors
  • system.tables returns non-system tables for the Causely user
  • At least one configured database contains eligible tables after excluding View, MaterializedView, and Dictionary

What Data is Collected

The ClickHouse scraper collects comprehensive metadata and performance information from your ClickHouse databases, including:

  • Database entities with names and relationships to hosting services
  • Service-to-database mappings (which service provides which database)
  • Connection details including host, port, and protocol configuration
  • Table information for eligible tables discovered from system.tables (names, row counts, and sizes). The current scraper excludes View, MaterializedView, and Dictionary engines.
  • Complete table schemas for those eligible tables from system.columns, including column definitions, data types, default expressions, and comments
  • Slow query analysis using system.query_log: top 10 queries by total execution time over a rolling 12-hour window, including call counts, total and average execution time, and rows read
  • Mutation lock monitoring using system.mutations: active mutations are tracked as exclusive locks to detect contention on tables