ClickHouse
- Infrastructure Entities - Complete infrastructure topology including compute, storage, and networking resources
- Metrics - Performance metrics from applications and infrastructure
- Symptoms - Automatic symptom detection from metrics, traces, and external monitoring systems
Overview
Causely provides native integration with ClickHouse to help you identify and resolve database issues before they impact your users.
Instead of just monitoring symptoms, Causely analyzes real-time signals to surface the underlying causal factors driving database issues.
By setting up the ClickHouse integration, you will be able to do the following:
-
Identify causes for reliability issues originating from your ClickHouse database, including:
-
Observe the database as an entity in the Topology Graph, including its relationships to other entities on the service map, infrastructure stack, and dataflow map.
-
Get insights into the slowest queries over a rolling 12-hour window, and troubleshoot them with Ask Causely directly from the UI.
The integration supports both self-hosted ClickHouse instances and cloud-managed deployments.
Setup Guide
Step 1: Create a user
Create a dedicated user in your ClickHouse instance and grant it access to the system tables that Causely requires:
CREATE USER causely_user IDENTIFIED BY 'your-password';
GRANT SELECT ON system.tables TO causely_user;
GRANT SELECT ON system.columns TO causely_user;
GRANT SELECT ON system.databases TO causely_user;
GRANT SELECT ON system.query_log TO causely_user;
GRANT SELECT ON system.mutations TO causely_user;
System-table grants alone are not sufficient. The ClickHouse user must also be able to see the actual application tables in the databases Causely should discover. Without SHOW/SELECT on those databases, Causely cannot build table entities or relate them to your services.
Grant database-wide visibility (choose the scope that fits your security model):
GRANT SHOW DATABASES ON *.* TO causely_user;
GRANT SELECT ON *.* TO causely_user;
For tighter scope, grant SELECT only on the application databases Causely should monitor:
GRANT SHOW DATABASES ON *.* TO causely_user;
GRANT SELECT ON <database_name>.* TO causely_user;
Or, if you want multiple examples:
GRANT SHOW DATABASES ON *.* TO causely_user;
GRANT SELECT ON db1.* TO causely_user;
GRANT SELECT ON db2.* TO causely_user;
How this maps to discovery:
system.databasescontrols whether Causely can discover database names.system.tablescontrols whether Causely can discover tables inside those databases.
If the user can see databases but not non-system tables, Causely will discover databases but create no useful entities for your application data.
The integration reads from the following system tables:
| Table | Purpose |
|---|---|
system.tables | Table names, row counts, and sizes |
system.columns | Column definitions and schema information |
system.databases | Database discovery |
system.query_log | Slow query analysis (top 10 by total execution time, rolling 12-hour window) |
system.mutations | Active mutation detection for lock monitoring |
Step 2: Create a Kubernetes secret for the user
Create a Kubernetes secret with the ClickHouse connection details. The secret supports two protocols:
- Native (default): binary protocol on port
9000 - HTTP: HTTP/HTTPS protocol on port
8123/8443
Option 1: Single Database Configuration
kubectl create secret generic \
--namespace causely clickhouse-credentials \
--from-literal=username="causely_user" \
--from-literal=password='...' \
--from-literal=host="..." \
--from-literal=port="9000" \
--from-literal=database="..." \
--from-literal=protocol="native" \
--from-literal=secure="false"
To connect over HTTP instead of the native protocol:
kubectl create secret generic \
--namespace causely clickhouse-credentials \
--from-literal=username="causely_user" \
--from-literal=password='...' \
--from-literal=host="..." \
--from-literal=port="8123" \
--from-literal=database="..." \
--from-literal=protocol="http" \
--from-literal=secure="false"
Option 2: Multiple Databases Configuration
To monitor multiple databases within the same ClickHouse instance, specify them as a comma-separated list using the databases field:
kubectl create secret generic \
--namespace causely clickhouse-credentials-multidb \
--from-literal=username="causely_user" \
--from-literal=password='...' \
--from-literal=host="..." \
--from-literal=port="9000" \
--from-literal=databases="database1,database2,database3" \
--from-literal=protocol="native" \
--from-literal=secure="false"
Alternatively, use a YAML manifest:
apiVersion: v1
kind: Secret
metadata:
name: clickhouse-credentials-multidb
namespace: causely
type: Opaque
stringData:
username: 'causely_user'
password: '...'
host: '...'
port: '9000'
databases: 'database1,database2,database3'
protocol: 'native'
secure: 'false'
Note: Use either the database field for a single database or the databases field for multiple databases. Do not use both in the same secret.
Option 3: Database Auto-Discovery
Causely can automatically discover all databases on a ClickHouse server. Add auto_discovery: "true" to the secret:
kubectl create secret generic \
--namespace causely clickhouse-credentials \
--from-literal=username="causely_user" \
--from-literal=password='...' \
--from-literal=host="..." \
--from-literal=port="9000" \
--from-literal=protocol="native" \
--from-literal=secure="false" \
--from-literal=auto_discovery="true"
When auto-discovery is enabled, Causely queries system.databases (excluding system, INFORMATION_SCHEMA, and information_schema) and starts a scraper for each discovered database. Discovery runs periodically to pick up newly created databases.
For external or VM-hosted ClickHouse, use a stable DNS hostname whenever possible. Raw IPs may connect successfully but can break service and entity resolution in Causely. The host value should align with how Causely discovers infrastructure (for example, the FQDN from the Kubernetes API or your cloud provider's API).
If connectivity must use an IP address, set host to the working IP and set host_overwrite to the stable DNS hostname that identifies that ClickHouse instance:
--from-literal=host="10.202.225.23"
--from-literal=host_overwrite="clickhouse.internal.company.net"
If you are connecting through a proxy, set host to the proxy address and host_overwrite to the actual ClickHouse instance hostname:
--from-literal=host="my-proxy.example.com"
--from-literal=host_overwrite="my-clickhouse.example.com"
Secret field reference
| Field | Required | Default | Description |
|---|---|---|---|
host | Yes | Hostname or IP used to connect to ClickHouse. Prefer a stable DNS hostname for normal setups; if you must use an IP or proxy here, set host_overwrite to the canonical hostname for topology and entity resolution. | |
username | Yes | ClickHouse user name | |
password | Yes | ClickHouse user password | |
database | Yes* | Single database to monitor | |
databases | Yes* | Comma-separated list of databases to monitor | |
port | No | 9000 (native) / 8123 (HTTP) | ClickHouse port |
protocol | No | native | Connection protocol: native or http |
secure | No | false | Enable TLS (true or false) |
host_overwrite | No | Override the host used for topology and entity resolution. Use when the connection host is an IP or proxy address, but ClickHouse should be identified by a stable DNS hostname. | |
port_overwrite | No | Override the port used for entity resolution | |
auto_discovery | No | false | Automatically discover all databases |
*Either database or databases must be set, unless auto_discovery is enabled.
Step 3: Update Causely Configuration
Once the secret is created, update the Causely configuration to enable scraping for the new instance:
scrapers:
clickhouse:
enabled: true
instances:
- secretName: clickhouse-credentials
namespace: causely
Alternative: Enable Credentials Autodiscovery
Causely also supports credentials autodiscovery, which lets you add new scraping targets without modifying the Causely configuration. Label the Kubernetes secret to enable autodiscovery:
kubectl --namespace causely label secret clickhouse-credentials "causely.ai/scraper=ClickHouse"
Verify Your Configuration
After completing the setup, run these queries against your ClickHouse instance to verify that the Causely user has the required access.
Troubleshooting
Use these symptoms to narrow down configuration issues:
- If the UI only shows a scraper path ending in
/_discovery, database auto-discovery is running but no per-database scrapers were created (often because no application databases were found, or discovery could not proceed as expected). - If the UI shows per-database scrapers with a message like configured database has no eligible tables, the user can see the configured databases (or target databases) but they have no eligible tables for the current scraper logic (see Eligible table types below).
- If logs show failed to resolve hostname or service mapping errors, set
host_overwriteto a stable DNS hostname and avoid relying on a raw IP alone for identity.
Quick Access Check
SELECT
(SELECT count() FROM system.tables LIMIT 1) > 0 AS tables_ok,
(SELECT count() FROM system.columns LIMIT 1) > 0 AS columns_ok,
(SELECT count() FROM system.databases LIMIT 1) > 0 AS databases_ok,
(SELECT count() FROM system.query_log LIMIT 1) >= 0 AS query_log_ok,
(SELECT count() FROM system.mutations LIMIT 1) >= 0 AS mutations_ok;
All columns should return 1 (true).
Detailed Checks
1. System tables access
-- Each of these should return a result without error
SELECT 1 FROM system.tables LIMIT 1;
SELECT 1 FROM system.columns LIMIT 1;
SELECT 1 FROM system.databases LIMIT 1;
SELECT 1 FROM system.query_log LIMIT 1;
SELECT 1 FROM system.mutations LIMIT 1;
If any query fails with an access denied error, grant the missing privilege to your Causely user:
-- Run as admin
GRANT SELECT ON system.<table_name> TO causely_user;
2. Test slow query collection
Run this query to confirm Causely can collect slow query data:
SELECT
normalized_query_hash,
count() AS calls,
sum(query_duration_ms) AS total_exec_time_ms
FROM system.query_log
WHERE type = 'QueryFinish'
AND event_time >= now() - toIntervalHour(12)
GROUP BY normalized_query_hash
ORDER BY total_exec_time_ms DESC
LIMIT 5;
This should return results without error. An empty result set is normal on a freshly configured instance—entries will appear as queries run.
3. Database discovery
Confirm application databases are visible to the Causely user (same filter Causely uses for auto-discovery):
SELECT name
FROM system.databases
WHERE name NOT IN ('system', 'INFORMATION_SCHEMA', 'information_schema')
ORDER BY name;
If this returns no application databases, auto-discovery will only create the _discovery scraper and no per-database scrapers.
4. Table visibility
Confirm the Causely user can see non-system tables:
SELECT database, name, engine
FROM system.tables
ORDER BY database, name
LIMIT 100;
This result set must include non-system tables. If it only returns system.* tables, Causely will not discover application entities.
5. Eligible table types
Replace <configured_db> with a target database from your configuration (or one returned by the Database discovery query above). Causely only creates table entities for engines it treats as eligible:
SELECT database, name, engine
FROM system.tables
WHERE database = '<configured_db>'
AND engine NOT IN ('View', 'MaterializedView', 'Dictionary')
ORDER BY name;
The current scraper excludes View, MaterializedView, and Dictionary. If this query returns zero rows, Causely reports that the configured database has no eligible tables and creates no ClickHouse table entities for that database.
Your ClickHouse instance is correctly configured for Causely when all of the following are true:
- System table checks pass (
system.tables,system.columns,system.databases,system.query_log,system.mutations). - Application databases appear in
system.databasesfor the Causely user (see Database discovery). - Application (non-system) tables appear in
system.tablesfor the Causely user (see Table visibility). - At least one configured database contains at least one eligible table after excluding
View,MaterializedView, andDictionary(see Eligible table types).
Setup Checklist
-
causely_usercreated in ClickHouse -
SELECTgranted onsystem.tables,system.columns,system.databases,system.query_log,system.mutations -
SHOW DATABASESgranted (for exampleGRANT SHOW DATABASES ON *.*) -
SELECTgranted on the application databases or tables Causely should discover (broad*.*or scoped per database) - Kubernetes secret created with correct
host,username,password,database/databases,protocol, andport - If using an IP address or proxy for connectivity,
host_overwriteis set to a stable DNS hostname for topology and entity resolution - Causely configuration updated (or secret labeled for autodiscovery)
- Verification queries succeed without access errors
-
system.tablesreturns non-system tables for the Causely user - At least one configured database contains eligible tables after excluding
View,MaterializedView, andDictionary
What Data is Collected
The ClickHouse scraper collects comprehensive metadata and performance information from your ClickHouse databases, including:
- Database entities with names and relationships to hosting services
- Service-to-database mappings (which service provides which database)
- Connection details including host, port, and protocol configuration
- Table information for eligible tables discovered from
system.tables(names, row counts, and sizes). The current scraper excludesView,MaterializedView, andDictionaryengines. - Complete table schemas for those eligible tables from
system.columns, including column definitions, data types, default expressions, and comments - Slow query analysis using
system.query_log: top 10 queries by total execution time over a rolling 12-hour window, including call counts, total and average execution time, and rows read - Mutation lock monitoring using
system.mutations: active mutations are tracked as exclusive locks to detect contention on tables