Link Search Menu Expand Document

Async job scheduling

[Blog] Asynchronous computing @Facebook (Private)

  • Problem with simple priority queues:
    • large use cases dominate
    • bad jobs stuck
    • uneven utilization between peaks & valleys
  • Building to scale
    • introduce delay tolerance
    • Capacity optimization:
      • classifying use cases:
        • daily traffic: predictable
        • major events: semi-predictable
        • Incident response: short and spikey, unpredictable
      • Time shifting:
        • Predictive - which data may need, precomputes and cache
        • Deferred compute
      • batching:
        • reduce # of requests to other components
        • potential high cache reuse and code warmup
    • Capacity policy: quota and rate limiting
      • CPU instruction utilization and memory limit - when exceeded, throttle and send alert
      • rate limit on intake