Kubernetes Resource Optimisation and Performance Stabilisation for Odoo ERP

Industry

ERP

Technology

Odoo, Kubernetes, PostgreSQL, Redis, Prometheus, Grafana

Overview

A leading enterprise organisation running a high-usage Odoo ERP system on Kubernetes required a resolution to recurring Exit Code 137 (OOMKilled) pod terminations affecting accounting, inventory, and reporting operations. The platform served over 200 concurrent users during peak hours, with additional load generated by automated cron jobs and scheduled reporting modules running in parallel with live user sessions. The instability stemmed from memory limits that did not reflect Odoo’s actual workload profile, worker processes misaligned with node capacity, and a complete absence of proactive monitoring. Every OOMKilled event was discovered only after a pod had already been terminated and users were already experiencing service interruptions.

Ksolves, an AI-first company, designed and implemented a six-component Kubernetes and Odoo optimisation strategy that resolved the instability through configuration precision rather than a disruptive architectural overhaul. The solution eliminated OOMKilled events, cut response latency from 5 to 8 seconds down to under 2 seconds, and restored the platform to 99.9% system availability.

Key Challenges

The challenges faced by the client are as follows:

Frequent Pod Restarts Causing ERP Service Interruptions: Odoo containers were repeatedly terminated by the Kubernetes scheduler with Exit Code 137, interrupting active user sessions and, in some cases, corrupting in-progress transactions across accounting, inventory, and reporting functions.
Memory Limits Misconfigured for Odoo's Actual Workload Profile: Kubernetes memory limits had been set without baselining real workload consumption, making OOMKilled events structurally inevitable regardless of user behaviour.
Odoo Worker Processes Not Aligned with Node Memory Capacity: Worker process counts were configured independently of available node memory, so aggregate worker consumption under concurrent load exceeded both pod limits and node capacity.
Sudden Memory Spikes from Reporting, Cron Jobs, and Large Queries: Heavy reporting executions, large PostgreSQL result sets, and scheduled cron jobs triggered unpredictable memory spikes with no mechanism in place to smooth or defer peak demand.
No Proactive Monitoring or Early Warning System: The organisation had no observability layer to track memory trends, pod restart frequency, or node pressure, so every failure was discovered reactively, after users had already been affected.
Node Level Resource Contention Amplifying OOM Risk: Cluster nodes were provisioned with insufficient RAM relative to co-located workload demands, amplifying the impact of any single pod's memory spike.

Our Solution

Ksolves implemented a structured, six-component optimisation strategy spanning Kubernetes configuration, Odoo application tuning, infrastructure scaling, and observability, governed by a single principle: configuration precision over architectural complexity.

Kubernetes Memory Requests and Limits Reconfiguration: Baselined actual Odoo memory consumption and reset resource specifications to memory requests of 1Gi and limits of 2Gi per pod, directly addressing the structural mismatch behind most OOMKilled events.
Odoo Worker Memory Tuning: Reconfigured worker count to 4, with a soft memory limit of 640MB and a hard limit of 768MB, allowing Odoo's process manager to recycle heavy workers gracefully before the Kubernetes scheduler intervened.
Horizontal Pod Autoscaling for Peak Load Resilience: Implemented HPA to trigger additional pod creation automatically when memory utilisation crossed defined thresholds, distributing load during peak user, reporting, and cron windows.
Prometheus and Grafana Observability Integration: Deployed real-time metric collection across pods and nodes, with threshold-based Grafana alerting that feeds back into HPA triggers and on-call engineers, shifting incident management from reactive to proactive.
Node Infrastructure Upgrade to High-RAM Instances: Migrated cluster nodes to higher-capacity RAM instances, reducing inter-pod contention and providing the headroom required for HPA scale-out events to operate cleanly.
Redis Caching, Query Optimisation, and Cron Rescheduling: Introduced Redis as an in-memory caching layer, optimised heavy PostgreSQL queries, and rescheduled cron jobs to off-peak windows to flatten the overall memory demand profile.

Technology Stack

Category	Technology
ERP Platform	Odoo (Worker and Configuration Tuning)
Container Orchestration	Kubernetes (Resource Requests, Limits, Horizontal Pod Autoscaler (HPA))
Database	PostgreSQL (Query Optimisation)
Caching	Redis (In-Memory Cache Layer)
Observability	Prometheus and Grafana (Real-Time Monitoring)

Results

Exit Code 137 (OOMKilled) Pod Restarts Near-Eliminated: Recurring memory-driven pod terminations across accounting, inventory, and reporting functions were brought down to near-zero occurrences following the resource and worker recalibration.
ERP Response Latency Improved by 60 to 70%: Response times under concurrent load fell from 5 to 8 seconds to under 2 seconds, driven by optimised memory allocation, Redis caching, and HPA-based workload distribution.
System Availability Reached 99.9%: Correctly sized Kubernetes resources, autoscaling, self-managing Odoo workers, and proactive alerting together eliminated memory-driven downtime in production.
Infrastructure Utilisation Balanced Without Over-Provisioning: Precise memory calibration combined with HPA and Redis caching delivered stability without resorting to blanket over-provisioning, keeping infrastructure costs efficient.
Operational Visibility Shifted from Reactive to Proactive: Prometheus and Grafana integration gave operations teams real-time memory trends and restart tracking, enabling intervention before OOMKilled events occurred rather than after.

Data Flow Diagram

Conclusion

Ksolves helped the organisation resolve one of the most disruptive failure patterns in its Kubernetes-hosted Odoo ERP deployment. By combining targeted resource recalibration with application-level tuning, autoscaling, caching, and full-stack observability, the solution restored enterprise-grade stability without requiring a redesign of the hosting architecture. The engagement demonstrates that resolving OOMKilled failures is not a capacity problem solved by unlimited memory allocation.

It is a configuration alignment problem that requires precise coordination between Kubernetes resource scheduling, application-level memory management, workload demand shaping, and observability. Through Odoo development services, Ksolves enables enterprises to run mission-critical ERP workloads reliably at scale.

Have A Project Idea?

Name*

Email*

Phone Number*

Message*

What is 9 + 10 ? *