Kubernetes Job Metrics in Prometheus



  • I want to create a Kubernetes Job object like the example: https://kubernetes.io/docs/concepts/workloads/controllers/job/

    Now imagine I want to have the result (in this case the value of Pi: 3.14159...) available in Prometheus.

    Is this possible?

    In a more complex example, imagine the output of my pod was JSON:

    {
      "foo": 200,
      "bar": 10,
      "baz": 999
    }
    

    I somehow need to denote that I'm interested in foo and baz (but not bar) "being available" in Prom (note I'm not opinionated on whether the solution is push or pull.

    Option 1

    The first option I've thought of is somehow attaching the result 3.14159 to the job and Prom can scrape it as normal.

    Option 2

    Don't run the Pi-generator container at all, instead run something like a Python script that runs the Pi process. Then we can push from Python to Prom.

    Option 3

    Some other way?



  • I think your need may be an anti-pattern but I'm unsure.

    Metrics are used to measure things and Metrics are correlated with time.

    In your example, the job's output is a constant and not correlated with a time.

    Metrics are often (!) measurements of the health (of the state) of a system rather than the output (product) of a system.

    The job's duration, CPU, memory, success|failure etc. are conventional (!) measurements.

    While it's entirely reasonable to want to capture time-series data (from Jobs), it may be (!) that a database or some other persistence mechanism would be a better sink for your data.

    Answering your question as stated: Jobs are challenging because they are potentially unscrapeable (because they don't live sufficiently long) by Prometheus as part of its 'pull' mechanism.

    Batch jobs are a valid use-case for https://prometheus.io/docs/practices/pushing/ . So, your Jobs could push metrics to Pushgateway to ensure that these are captured.

    If the logs produced by your Jobs are persisted beyond the life of the Job, another approach is to derive metrics (by parsing) logs. This approach biases towards aggregation of log data. The examplar is counting HTTP 500s in log entries to determine a failure rate.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2