Skip to main content

Grafana

Causely can leverage Grafana as an observability backend with Grafana Alloy and Beyla as instrumentation.

Note: If your logs are centralized in Grafana Loki, Causely can retrieve and display them directly.
See the Loki Integration section below for configuration details and benefits.

Please use the following values for your helm install:

Install with Causely

helm upgrade --install alloy grafana/alloy --create-namespace --namespace monitoring --values ./alloy-values.yaml

alloy-values.yaml

controller:
type: deployment
alloy:
stabilityLevel: experimental
extraPorts:
- name: 'grpc'
port: 4317
targetPort: 4317
- name: 'http'
port: 4318
targetPort: 4318
- name: 'datadog'
port: 8126
targetPort: 8126
configMap:
content: |
otelcol.exporter.otlp "causely" {
client {
endpoint = "mediator.causely:4317"
tls {
insecure = true
}
}
}

otelcol.processor.batch "default" {
output {
metrics = [otelcol.exporter.otlp.causely.input]
traces = [otelcol.exporter.otlp.causely.input]
}
}

otelcol.processor.k8sattributes "default" {
extract {
label {
from = "pod"
}

metadata = [
"k8s.namespace.name",
"k8s.pod.name",
"k8s.pod.uid",
"k8s.deployment.name",
"k8s.node.name",
"k8s.pod.start_time",
"container.id",
]
}

output {
traces = [otelcol.processor.batch.default.input]
}
}

otelcol.processor.deltatocumulative "default" {
output {
metrics = [otelcol.processor.batch.default.input]
}
}

otelcol.receiver.datadog "default" {
endpoint = "0.0.0.0:8126"
output {
metrics = [otelcol.processor.deltatocumulative.default.input]
traces = [otelcol.processor.batch.default.input]
}
}

otelcol.receiver.otlp "otlp" {
grpc {
endpoint = "0.0.0.0:4317"
}
http {
endpoint = "0.0.0.0:4318"
}

output {
metrics = [otelcol.processor.batch.default.input]
traces = [otelcol.processor.k8sattributes.default.input]
}
}

Install with Causely and Grafana Cloud

To also send logs, metrics, and traces to Grafana Cloud, you need to use the following values for your helm install:

helm upgrade --install alloy grafana/alloy --create-namespace --namespace monitoring --values ./grafana-alloy-values.yaml

grafana-alloy-values.yaml

controller:
hostPID: true
alloy:
securityContext:
privileged: true
stabilityLevel: experimental
extraPorts:
- name: 'grpc'
port: 4317
targetPort: 4317
- name: 'http'
port: 4318
targetPort: 4318
- name: 'datadog'
port: 8126
targetPort: 8126
configMap:
content: |
otelcol.exporter.otlp "causely" {
client {
endpoint = "mediator.causely:4317"
tls {
insecure = true
}
}
}

otelcol.exporter.otlphttp "grafana" {
client {
endpoint = "https://otlp-gateway-prod-us-east-0.grafana.net/otlp"
auth = otelcol.auth.basic.grafana.handler
}
}

otelcol.auth.basic "grafana" {
username = "GRAFANA_CLOUD_INSTANCE_ID"
password = "GRAFANA_CLOUD_API_KEY"
}

// discovery.kubernetes allows you to find scrape targets from Kubernetes resources.
// It watches cluster state and ensures targets are continually synced with what is currently running in your cluster.
discovery.kubernetes "pod_logs" {
role = "pod"
}

// discovery.relabel rewrites the label set of the input targets by applying one or more relabeling rules.
// If no rules are defined, then the input targets are exported as-is.
discovery.relabel "pod_logs" {
targets = discovery.kubernetes.pod_logs.targets

// Label creation - "namespace" field from "__meta_kubernetes_namespace"
rule {
source_labels = ["__meta_kubernetes_namespace"]
action = "replace"
target_label = "namespace"
}

// Label creation - "pod" field from "__meta_kubernetes_pod_name"
rule {
source_labels = ["__meta_kubernetes_pod_name"]
action = "replace"
target_label = "pod"
}

// Label creation - "container" field from "__meta_kubernetes_pod_container_name"
rule {
source_labels = ["__meta_kubernetes_pod_container_name"]
action = "replace"
target_label = "container"
}

// Label creation - "app" field from "__meta_kubernetes_pod_label_app_kubernetes_io_name"
rule {
source_labels = ["__meta_kubernetes_pod_label_app_kubernetes_io_name"]
action = "replace"
target_label = "app"
}

// Label creation - "job" field from "__meta_kubernetes_namespace" and "__meta_kubernetes_pod_container_name"
// Concatenate values __meta_kubernetes_namespace/__meta_kubernetes_pod_container_name
rule {
source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_pod_container_name"]
action = "replace"
target_label = "job"
separator = "/"
replacement = "$1"
}

// Label creation - "container" field from "__meta_kubernetes_pod_uid" and "__meta_kubernetes_pod_container_name"
// Concatenate values __meta_kubernetes_pod_uid/__meta_kubernetes_pod_container_name.log
rule {
source_labels = ["__meta_kubernetes_pod_uid", "__meta_kubernetes_pod_container_name"]
action = "replace"
target_label = "__path__"
separator = "/"
replacement = "/var/log/pods/*$1/*.log"
}

// Label creation - "container_runtime" field from "__meta_kubernetes_pod_container_id"
rule {
source_labels = ["__meta_kubernetes_pod_container_id"]
action = "replace"
target_label = "container_runtime"
regex = "^(\\S+):\\/\\/.+$"
replacement = "$1"
}
}

// loki.source.kubernetes tails logs from Kubernetes containers using the Kubernetes API.
loki.source.kubernetes "pod_logs" {
targets = discovery.relabel.pod_logs.output
forward_to = [otelcol.receiver.loki.default.receiver]
}

otelcol.receiver.loki "default" {
output {
logs = [otelcol.processor.batch.default.input]
}
}

beyla.ebpf "default" {
attributes {
kubernetes {
enable = "true"
}
select {
attr = "sql_client_duration"
exclude = []
include = ["db.query.text"]
}
}
discovery {
services {
open_ports = "80,443,3000,8000-8999"
}
}
output {
traces = [otelcol.processor.batch.default.input]
}
}

otelcol.processor.batch "default" {
output {
logs = [otelcol.exporter.otlphttp.grafana.input]
metrics = [otelcol.exporter.otlphttp.grafana.input, otelcol.exporter.otlp.causely.input]
traces = [otelcol.exporter.otlphttp.grafana.input, otelcol.exporter.otlp.causely.input]
}
}

otelcol.processor.k8sattributes "default" {
extract {
label {
from = "pod"
}

metadata = [
"k8s.namespace.name",
"k8s.pod.name",
"k8s.pod.uid",
"k8s.deployment.name",
"k8s.node.name",
"k8s.pod.start_time",
"container.id",
]
}

output {
traces = [otelcol.processor.batch.default.input]
}
}

otelcol.processor.deltatocumulative "default" {
output {
metrics = [otelcol.processor.batch.default.input]
}
}

otelcol.receiver.datadog "default" {
endpoint = "0.0.0.0:8126"
output {
metrics = [otelcol.processor.deltatocumulative.default.input]
traces = [otelcol.processor.batch.default.input]
}
}

otelcol.receiver.otlp "otlp" {
grpc {
endpoint = "0.0.0.0:4317"
}
http {
endpoint = "0.0.0.0:4318"
}

output {
logs = [otelcol.processor.batch.default.input]
metrics = [otelcol.processor.batch.default.input]
traces = [otelcol.processor.k8sattributes.default.input]
}
}

Loki Integration

Causely integrates with Loki to retrieve logs already centralized in your Grafana environment.
When configured, Causely automatically surfaces relevant logs in the context of active root causes and service malfunctions.
This enables rapid validation of issues and faster time to understanding and resolution.

When to Use

Use this integration when:

  • Your logs are shipped to Loki instead of remaining in the Kubernetes API.
  • You want Causely to display relevant log lines and exceptions alongside detected service degradations or root causes.
  • You prefer a centralized, scalable log pipeline already managed through Grafana Cloud or self-hosted Loki.

Benefits

  • Contextual Insight: Automatically surfaces logs correlated with active root causes or degraded services.
  • Accelerated RCA: Shows container log lines, stack traces, and error spikes precisely around the time of failure.
  • Unified View: Displays Loki logs directly within Causely, alongside metrics and traces for the same service.
  • Operational Efficiency: Reduces reliance on the Kubernetes API for log collection.

Configuration

Configure Causely:

scrapers:
kubernetes:
loki_endpoint: "http://loki.monitoring:3100"

Causely automatically uses Loki as the preferred external log source.

Supported Log Types

  • Container logs (stdout/stderr)

Result

When Causely detects a service malfunction or identifies a root cause, it automatically retrieves related logs from Loki.
These logs appear:

  • Under affected services, when they exhibit abnormal behavior such as elevated errors or latency.
  • Alongside root causes, showing relevant exceptions or stack traces at the time of failure.

This provides clear evidence for what went wrong and why—dramatically shortening investigation and resolution times.