Warning: include(/home/elyngved/eriklyngved.com/wp-content/themes/beach/cache.php): failed to open stream: Permission denied in /home/elyngved/eriklyngved.com/wp-config.php on line 39

Warning: include(): Failed opening '/home/elyngved/eriklyngved.com/wp-content/themes/beach/cache.php' for inclusion (include_path='.:/usr/local/lib/php:/usr/local/php5/lib/pear') in /home/elyngved/eriklyngved.com/wp-config.php on line 39

Warning: include(/home/elyngved/eriklyngved.com/wp-content/themes/green-marinee/user.php): failed to open stream: Permission denied in /home/elyngved/eriklyngved.com/wp-config.php on line 61

Warning: include(): Failed opening '/home/elyngved/eriklyngved.com/wp-content/themes/green-marinee/user.php' for inclusion (include_path='.:/usr/local/lib/php:/usr/local/php5/lib/pear') in /home/elyngved/eriklyngved.com/wp-config.php on line 61
» Delayed Job and Throttling: Some Simple Solutions | Erik Lyngved

Delayed Job and Throttling: Some Simple Solutions

While using my gem to work with the Amazon MWS service (an API for marketplace sellers), the main source of frustration was taking on their aggressive throttling policies. I found that I have to space out my requests about 1.5 seconds apart for best results.

I am using delayed_job to schedule background workers to update a seller’s orders every hour. To retrieve the actual items ordered is one more request for each and every order. Not a very scalable design and you can see where this frustration is coming from. This means many successive calls, especially for a popular seller, which could upset the throttler if not careful. I only have one request per job, which makes things easier to manage.

Attempt 1: run_at

First I tried taking advantage of DJ’s run_at option to schedule jobs into the future. I made a JobQueue class which handles different queues and schedules jobs based on a specified time interval for each queue. I set up a queue called amazon_query and gave it an interval of 1.5 seconds.

# lib/jobs/update_orders.rb

  def perform
    # process initial list of orders...

    # now schedule to get items for each order
    orders.each do |order|
      JobQueue.amazon_query.enqueue { Jobs::UpdateOrderItems.new order.amazon_order_id }
    end
  end
# app/models/job_queue.rb

class JobQueue < ActiveRecord::Base
  # other methods to set and manage queues...

  def self.method_missing(method, *params)
    (q = where(:name => method)).present? ? q.first : super
  end

  def enqueue(options={})
    runtime = last_job.present? ? [last_job + interval, Time.now].max : Time.now
    Delayed::Job.enqueue yield, {:run_at => runtime}.merge(options)
    update_attributes :last_job => runtime
  end

end

JobQueue.amazon_query retrieves the correct queue. #enqueue schedules the job either now or 1.5 seconds after the last job run, whichever is later. This has a pitfall though: when a worker polls the database for new jobs, it finds all pending jobs with a run_at time in the past. So what happens when a worker becomes tied up or unavailable for some amount of time? The next polling could be delayed (“delayed” not being desirable in this case) and it can run many jobs at once to try and catch up. Do not want.

Attempt 2: sleep

Now for the dead simple solution: simply put the worker to sleep before each request. This is not very scalable and it ties up the worker for that interval. In fact if you want to preserve or optimize worker time at all you should probably look for a better solution. For my custom jobs, I use a before hook to handle this. (An after hook could just as well work, but I don’t want to tie up any subsequent job that doesn’t require an API call.)

# lib/jobs/update_order_items.rb

  def before(job)
    sleep 1.5
  end

For this small scale app, this is what I will use. I could slightly optimize this by using a combination of the queue and sleeping–a before hook that asks the queue when it’s safe to make the next call, and sleep until that time.

For future larger-scale apps, I will look into using Resque with Redis, which seems to be getting more popular as the preferred background job manager.