Difficulty scaling Laravel Horizon across multiple instances (ECS / Auto Scaling)
Iagoss
DevOps Engineer · 2024-01-19
Hello everyone, I'm deploying a Laravel application in a containerized architecture (AWS ECS), where I've split services into three separate tasks: Web (HTTP requests) Horizon (job processing) Scheduling (cron tasks) Each service has its own auto scaling policy, so new instances can be created dynamically as demand increases. The issue I'm facing is that when trying to scale H...
Distributed Horizon deployments require careful coordination because the supervisor processes must share visibility into job payloads, failures, and metrics without duplicating work. In ECS, auto scaling creates and destroys containers rapidly, which can reset Horizon state or cause supervisor count mismatches. This post presents production-tested configurations and operational checks.
Architecture Overview
Separating web, Horizon, and scheduler tasks is sound, but each service needs its own scaling signal. Web tasks scale on request latency and CPU. Horizon tasks scale on queue depth and job throughput. Scheduler tasks usually remain at one or two instances, because overlapping cron workers can cause duplicate commands. Use distinct ECS service names and security groups so that only Horizon tasks listen on the Horizon port.
Balancing and Visibility
Horizon uses Redis to store metrics, job payloads, and supervisor state. When multiple Horizon instances scale out, they all read from the same Redis keys. Enable Horizon's balanced queue consumption by setting auto-scaling based on queue latency rather than raw message count. Monitor the horizon_metrics namespace to confirm that supervisor counts update in near-real time after ECS scales tasks.
Failures and Retention
Auto scaling can terminate Horizon instances mid-job. Configure jobs to be idempotent and use timeouts so queued work returns to Redis quickly. Retry failed jobs with exponential backoff and investigate Horizon's failed jobs tab regularly. Pair these checks with broader observability strategies covered in Analytics dashboard to aggregate error rates across your entire estate.
Persistent Runtime Considerations
If you use Laravel Octane instead of PHP-FPM, runtime boot time improves but state persistence changes the scaling dynamics. Review Laravel Octane benchmark comparing Swoole, OpenSwoole, RoadRunner, FrankenPHP to understand how worker lifetimes and reload behavior affect Horizon deployment decisions.
Conclusion
ECS auto scaling works with Horizon if you tune queue balancing, idempotency, and observability together. Treat each task family independently, and rely on Redis as the single source of truth for shared job state.
Related Posts
- Laravel Octane benchmark comparing Swoole, OpenSwoole, RoadRunner, FrankenPHP
- Analytics dashboard
- Difficulty scaling Laravel Horizon across multiple instances (ECS / Auto Scaling)
These related posts cover runtime performance, dashboard monitoring, and distributed Laravel operations.
Horizon's State Management in Distributed Systems
Horizon stores its state (queue metrics, job payloads, worker status) in Redis. When you split Horizon into its own ECS task, that task becomes the sole writer to Horizon-specific Redis keys, while web and scheduler tasks remain readers. The challenge emerges when you scale the Horizon task horizontally: multiple Horizon instances compete to manage the same queues and timers. Without proper coordination, you'll see duplicate job dispatching or metric drift.
Laravel Horizon 5+ introduced better multi-node support via a dedicated supervisor, but it's still common to run a single Horizon instance with scaled workers inside it, rather than scaling the Horizon process itself. Alternatively, use separate Horizon instances with distinct queue connections so they don't overlap. AWS ECS service auto scaling should scale worker count based on metrics like ApproximateNumberOfMessagesVisible in SQS or queue depth in Redis, not CPU alone.
Monitoring and Observability
Instrument your setup with Laravel Telescope or a dedicated APM. Track queue latency, throughput, and failure rates per queue. CloudWatch can push ECS metrics into dashboards, but add application-level metrics for real queue health. Use dead-letter queues to isolate problematic jobs without blocking the main pipeline.
See Laravel Octane benchmark comparing Swoole, OpenSwoole, RoadRunner, FrankenPHP for how runtime engines affect worker architecture, and Yajra datatables questions for data-handling patterns relevant to dashboarding job metrics.
Queue Driver Selection for ECS
Horizon works exclusively with Redis. In ECS, deploy Redis via Amazon ElastiCache or a self-managed container. Use cluster mode for high availability. Ensure security groups allow traffic only from your ECS tasks. If you use SQS as the underlying queue driver, Horizon is not applicable; you would use SQS directly with Laravel's queue system. This is simpler for ECS but loses Horizon's metrics and UI.
If you need Horizon's UI and metrics with ECS, run a single Horizon task behind an Application Load Balancer, with auto scaling based on queue depth rather than CPU. Use environment variables to configure Horizon's metrics Redis connection. Periodically snapshot Horizon's metrics to CloudWatch or Datadog for long-term retention and alerting.
Autoscaling Policies
Base autoscaling on business metrics, not just infrastructure metrics. For web tasks, scale on request count or latency. For Horizon, scale on queue depth: if ApproximateNumberOfMessagesVisible exceeds a threshold for more than N minutes, add another worker task. For scheduler, scale on job duration: if scheduled jobs consistently exceed the container's CPU or memory, split cron jobs into separate task definitions.
See Laravel Octane benchmark comparing Swoole, OpenSwoole, RoadRunner, FrankenPHP for runtime engine decisions that affect scaling, and Printer selections from laravel for offloading work to external systems.