1

Following up on my question here, I'm trying to improve a search further. We first search a replays table (searching 2k records) and then get unique players associated with that table (10 per, so 20k records) and render a JSON. This is done through the controller, the search reads as:

def index
 @replays = Replay.includes(:players).where(map_id: params['map_id'].to_i).order(id: :desc).limit(2000)
 render json: @replays[0..2000].to_json(include: [:players])
end 

The performance:

Completed 200 OK in 254032ms (Views: 34.1ms | ActiveRecord: 20682.4ms)

The actual Active Record search reads as:

Replay Load (80.4ms)  SELECT  "replays".* FROM "replays" WHERE "replays"."map_id" = $1 ORDER BY "replays"."id" DESC LIMIT $2  [["map_id", 1], ["LIMIT", 2000]]
Player Load (20602.0ms)  SELECT "players".* FROM "players" WHERE "players"."replay_id" IN (117217...

This mostly works, but still takes an exceptional amount of time. Is there are way to improve performance?

6
  • Just a quick FYI (as I followed the previous question) - you don't need the [0..2000] on the second line of index now. That's covered in the limit @GustavMauler. Commented Nov 24, 2017 at 16:33
  • Yeah, sry bout that. I've never actually done a personal project with a data size this large (mostly smaller) so will having that extra [0..2000] (beyond redundancy) affect performance that much? Commented Nov 24, 2017 at 16:48
  • Shouldn't have much impact - it's better off limiting the query beforehand as it'll ask less of the db. Which might be the downfall of my new answer :) It's definitely superfluous in your code there though, and will also actually give you 2,001 records as it's counting from 0. Commented Nov 24, 2017 at 16:52
  • I wonder why the second query takes so long? Is there an index on the database column missing? Or are there too many players to fit into memory? How big is the returned JSON? Commented Nov 24, 2017 at 18:26
  • It's definitely an indexing issue. The json is fairly big (2k replays = 20k players) but not to the point where it should be taking this long. Commented Nov 25, 2017 at 17:25

2 Answers 2

0

You're getting bitten by this issue https://postgres.cz/wiki/PostgreSQL_SQL_Tricks_I#Predicate_IN_optimalization

I found note pg_performance about optimalization possibility of IN predicate when list of values is longer than eighty numbers. For longer list is better create constant subqueries with using multi values:

SELECT * FROM tab WHERE x IN (1,2,3,..n); -- n > 70

-- faster case SELECT * FROM tab WHERE x IN (VALUES(10),(20));

Using VALUES is faster for bigger number of items, so don't use it for small set of values.

Basically, SELECT * FROM WHERE IN ((1),(2)...) with a long list of values is very slow. It's ridiculously faster if you can convert it to a list of values, like SELECT * FROM WHERE IN (VALUES(1),(2) ...)

Unfortunately, since this is happening in active record, it's a little tricky to exercise control over the query. You can either avoid using an includes call and just manually construct the SQL to load all your child records, and then manually build up the associations.

Alternatively, you can monkey patch active record. Here's what I've done on rails 4.2, in an initializer.

module PreloaderPerformance
  private
  def query_scope(ids)
    if ids.count > 100
      type = klass.columns_hash[association_key_name.to_s].sql_type
      values_list = ids.map do |id|
        if id.kind_of?(Integer)
          " (#{id})"
        elsif type == "uuid"
          " ('#{id.to_s}'::uuid)"
        else
          " ('#{id.to_s}')"
        end
      end.join(",")

      scope.where("#{association_key_name} in (VALUES #{values_list})")
    else
      super
    end
  end
end

module ActiveRecord
  module Associations
    class Preloader
      class Association #:nodoc:
        prepend PreloaderPerformance
      end
    end
  end
end

Doing this I've seen a 50x speed up of some of my queries, with no issues as of yet. Note it's not fully battle tested, and I bet it will have some issues if you're association is using a unique data type for the foreign_key relationship. In my data base, I only use uuids or integers for our associations. The usual caveats about monkey patching core rails behavior applies.

Sign up to request clarification or add additional context in comments.

1 Comment

The core underlying issue is also noted in this stack-overflow question dba.stackexchange.com/questions/91247/…
0

I know find_each can be used to batch queries, which might lighten the memory load here. Could you try out the following and see how it impacts upon the time?

Replay.where(map_id: params['map_id'].to_i).includes(:players).find_each(batch_size: 100).map do |replay|
  replay.to_json(includes: :players)
end

I'm not sure this will work. It might be the mapping negates the benefits of batching - there are certainly more queries, but it'll use less memory as it doesn't need to store > 20k records at a time.

Have a play and see how it looks - fiddle with the batch size too, see how that affects things.

There's a caveat in that you can't apply a limit, so bear that in mind.

I'm sure someone else'll come up with a far slicker solution, but hope this might help in the meantime. If it's awful when you check the speed, let me know and I'll delete this answer :)

3 Comments

I'll get around to it shortly. The main reason for the limit is actually to stop it from searching ALL the records so I have a reasonable sample size. So if the performance is good then it's actually beneficial to lose the limit.
Yeah, understandable - I'm not sure this will result in an improvement, though I know it's come in handy for me once or twice in the past! Keen to know how you get on.
After experimenting I think the eager loading gets somewhat better performance and allows me to impose a limit should I choose.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.