13

Im trying to replicate the searching list style of crunchbase using ruby on rails. I have an array of filters that looks something like this:

[
   {
      "id":"0",
      "className":"Company",
      "field":"name",
      "operator":"starts with",
      "val":"a"
   },
   {
      "id":"1",
      "className":"Company",
      "field":"hq_city",
      "operator":"equals",
      "val":"Karachi"
   },
   {
      "id":"2",
      "className":"Category",
      "field":"name",
      "operator":"does not include",
      "val":"ECommerce"
   }
]

I send this json string to my ruby controller where I have implemented this logic:

filters = params[:q]
table_names = {}
filters.each do |filter|
    filter = filters[filter]
    className = filter["className"]
    fieldName = filter["field"]
    operator = filter["operator"]
    val = filter["val"]
    if table_names[className].blank? 
        table_names[className] = []
    end
    table_names[className].push({
        fieldName: fieldName,
        operator: operator,
        val: val
    })
end

table_names.each do |k, v|
    i = 0
    where_string = ''
    val_hash = {}
    v.each do |field|
        if i > 0
            where_string += ' AND '
        end
        where_string += "#{field[:fieldName]} = :#{field[:fieldName]}"
        val_hash[field[:fieldName].to_sym] = field[:val]
        i += 1
    end
    className = k.constantize
    puts className.where(where_string, val_hash)
end

What I do is, I loop over the json array and create a hash with keys as table names and values are the array with the name of the column, the operator and the value to apply that operator on. So I would have something like this after the table_names hash is created:

{
   'Company':[
      {
         fieldName:'name',
         operator:'starts with',
         val:'a'
      },
      {
         fieldName:'hq_city',
         operator:'equals',
         val:'karachi'
      }
   ],
   'Category':[
      {
         fieldName:'name',
         operator:'does not include',
         val:'ECommerce'
      }
   ]
}

Now I loop over the table_names hash and create a where query using the Model.where("column_name = :column_name", {column_name: 'abcd'}) syntax.

So I would be generating two queries:

SELECT "companies".* FROM "companies" WHERE (name = 'a' AND hq_city = 'b')
SELECT "categories".* FROM "categories" WHERE (name = 'c')

I have two problems now:

1. Operators:

I have many operators that can be applied on a column like 'starts with', 'ends with', 'equals', 'does not equals', 'includes', 'does not includes', 'greater than', 'less than'. I am guessing the best way would be to do a switch case on the operator and use the appropriate symbol while building the where string. So for example, if the operator is 'starts with', i'd do something like where_string += "#{field[:fieldName]} like %:#{field[:fieldName]}" and likewise for others.

So is this approach correct and is this type of wildcard syntax allowed in this kind of .where?

2. More than 1 table

As you saw, my approach builds 2 queries for more than 2 tables. I do not need 2 queries, I need the category name to be in the same query where the category belongs to the company.

Now what I want to do is I need to create a query like this:

Company.joins(:categories).where("name = :name and hq_city = :hq_city and categories.name = :categories[name]", {name: 'a', hq_city: 'Karachi', categories: {name: 'ECommerce'}})

But this is not it. The search can become very very complex. For example:

A Company has many FundingRound. FundingRound can have many Investment and Investment can have many IndividualInvestor. So I can select create a filter like:

{
  "id":"0",
  "className":"IndividualInvestor",
  "field":"first_name",
  "operator":"starts with",
  "val":"za"
} 

My approach would create a query like this:

SELECT "individual_investors".* FROM "individual_investors" WHERE (first_name like %za%)

This query is wrong. I want to query the individual investors of the investments of the funding round of the company. Which is a lot of joining tables.

The approach that I have used is applicable to a single model and cannot solve the problem that I stated above.

How would I solve this problem?

10
  • So the search will always be in the Company context? Commented Oct 30, 2017 at 16:43
  • Why not include something like a queryModel in the JSON when the query is in the context of a relationed model? Commented Oct 30, 2017 at 16:46
  • @ErvalhouS no it wont be Commented Oct 30, 2017 at 17:05
  • Umm what would queryModel do? Commented Oct 30, 2017 at 17:05
  • It would represent the model that you will use to search the corresponding class, i.e.: a Category query with the queryModel: 'Company' would concatenate it's search for categories inside the company query, instead of creating results for categories, it would handle results for companies with those category matchers. Commented Oct 30, 2017 at 17:20

4 Answers 4

3
+250

You can create a SQL query based on your hash. The most generic approach is raw SQL, which can be executed by ActiveRecord.

Here is some concept code that should give you the right idea:

query_select = "select * from "
query_where = ""
tables = [] # for selecting from all tables
hash.each do |table, values|
  table_name = table.constantize.table_name
  tables << table_name
  values.each do |q|
    query_where += " AND " unless query_string.empty?
    query_where += "'#{ActiveRecord::Base.connection.quote(table_name)}'."
    query_where += "'#{ActiveRecord::Base.connection.quote(q[fieldName)}'"
    if q[:operator] == "starts with" # this should be done with an appropriate method
      query_where += " LIKE '#{ActiveRecord::Base.connection.quote(q[val)}%'"
    end
  end
end
query_tables = tables.join(", ")
raw_query = query_select + query_tables + " where " + query_where 
result = ActiveRecord::Base.connection.execute(raw_query)
result.to_h # not required, but raw results are probably easier to handle as a hash

What this does:

  • query_select specifies what information you want in the result
  • query_where builds all the search conditions and escapes input to prevent SQL injections
  • query_tables is a list of all the tables you need to search
  • table_name = table.constantize.table_name will give you the SQL table_name as used by the model
  • raw_query is the actual combined sql query from the parts above
  • ActiveRecord::Base.connection.execute(raw_query) executes the sql on the database

Make sure to put any user submitted input in quotes and escape it properly to prevent SQL injections.

For your example the created query will look like this:

select * from companies, categories where 'companies'.'name' LIKE 'a%' AND 'companies'.'hq_city' = 'karachi' AND 'categories'.'name' NOT LIKE '%ECommerce%'

This approach might need additional logic for joining tables that are related. In your case, if company and category have an association, you have to add something like this to the query_where

"AND 'company'.'category_id' = 'categories'.'id'"

Easy approach: You can create a Hash for all pairs of models/tables that can be queried and store the appropriate join condition there. This Hash shouldn't be too complex even for a medium-sized project.

Hard approach: This can be done automatically, if you have has_many, has_one and belongs_to properly defined in your models. You can get the associations of a model using reflect_on_all_associations. Implement a Breath-First-Search or Depth-First Search algorithm and start with any model and search for matching associations to other models from your json input. Start new BFS/DFS runs until there are no unvisited models from the json input left. From the found information, you can derive all join conditions and then add them as expressions in the where clause of the raw sql approach as explained above. Even more complex, but also doable would be reading the database schema and using a similar approach as defined here by looking for foreign keys.

Using associations: If all of them are associated with has_many / has_one, you can handle the joins with ActiveRecord by using the joins method with inject on the "most significant" model like this:

base_model = "Company".constantize
assocations = [:categories]  # and so on
result = assocations.inject(base_model) { |model, assoc| model.joins(assoc) }.where(query_where)

What this does:

  • it passes the base_model as starting input to Enumerable.inject, which will repeatedly call input.send(:joins, :assoc) (for my example this would do Company.send(:joins, :categories) which is equivalent to `Company.categories
  • on the combined join, it executes the where conditions (constructed as described above)

Disclaimer The exact syntax you need might vary based on the SQL implementation you use.

Sign up to request clarification or add additional context in comments.

Comments

2

Full blown SQL string is a security issue, because it exposes your application to a SQL injection attack. If you can get your way around this, it is completely ok to make those query concatenations, as long as you make them compatible with your DB(yes, this solution is DB specific).

Other than that you can make some field that marks some querys as joined, as I have mentioned in the comment, you would have some variable to mark the desired table to be the output of the query, something like:

[
  {
    "id":"1",
    "className":"Category",
    "field":"name",
    "operator":"does not include",
    "val":"ECommerce",
    "queryModel":"Company"
  }
]

Which, when processing the query, you would use to output the result of this query as the queryModel instead of the className, in those cases the className would be used only to join the table conditions.

7 Comments

Some models aren't directly related to the queryModel
So for example, I need all the IndividualInvestors of the company. The Company has many Investments and Investment has many IndividualInvestors. How do u propose i'd join these tables?
Is there any way I could find a association between 2 models?
That sounds nice. Company also has many Founders and Founders belong to Person. How would I apply it to this?
So i could do something like company.joins(:founders).where(founders: {first_name: 'abcd'})
|
2

I would suggest altering your JSON data. Right now you only send name of the model, without the context, it would be easier if your model would have context.

In your example data would have to look like

data = [
  {
    id: '0',
    className: 'Company',
    relation: 'Company',
    field: 'name',
    operator: 'starts with',
    val: 'a'
  },
  {
    id: '1',
    className: 'Category',
    relation: 'Company.categories',
    field: 'name',
    operator: 'equals',
    val: '12'
  },  
  {
    id: '3',
    className: 'IndividualInvestor',
    relation:     'Company.founding_rounds.investments.individual_investors',
    field: 'name',
    operator: 'equals',
    val: '12'
  }
]

And you send this data to QueryBuilder

query = QueryBuilder.new(data) results = query.find_records

Note: find_records returns array of hashes per model on which you execute query.

For example it would return [{Company: [....]]

class QueryBuilder
  def initialize(data)
    @data = prepare_data(data)
  end

  def find_records
    queries = @data.group_by {|e| e[:model]}
    queries.map do |k, v|
      q = v.map do |f|
        {
          field: "#{f[:table_name]}.#{f[:field]} #{read_operator(f[:operator])} ?",
          value: value_based_on_operator(f[:val], f[:operator])
        }
      end

      db_query = q.map {|e| e[:field]}.join(" AND ")
      values = q.map {|e| e[:value]}

      {"#{k}": k.constantize.joins(join_hash(v)).where(db_query, *values)}
    end
  end

  private

  def join_hash(array_of_relations)
    hash = {}
    array_of_relations.each do |f|
      hash.merge!(array_to_hash(f[:joins]))
    end
    hash.map do |k, v|
      if v.nil?
        k
      else
        {"#{k}": v}
      end
    end
  end

  def read_operator(operator)
    case operator
    when 'equals'
      '='
    when 'starts with'
      'LIKE'
    end
  end

  def value_based_on_operator(value, operator)
    case operator
    when 'equals'
      value
    when 'starts with'
      "%#{value}"
    end
  end

  def prepare_data(data)
    data.each do |record|
      record.tap do |f|
        f[:model] = f[:relation].split('.')[0]
        f[:joins] = f[:relation].split('.').drop(1)
        f[:table_name] = f[:className].constantize.table_name
      end
    end
  end

  def array_to_hash(array)
    if array.length < 1
      {}
    elsif array.length == 1
      {"#{array[0]}": nil}
    elsif array.length == 2
      {"#{array[0]}": array[1]}
    else
      {"#{array[0]}": array_to_hash(array.drop(1))}
    end
  end
end

1 Comment

It's not necessary to add the relations to the input and he might not want to or be able to add these details into the json input. You can derive all associations by using a BFS or DFS with ` reflect_on_all_associations` as explained in my answer.
1

I feel you are over complicating things by having one single controller for everything. I would create a controller for every model or entity that you would want to show and then implement the filters like you said.

Implementing a dynamic where and order by is not very hard but if, as you said, you need to have also the logic to implement some joins you are not only over complicating the solution (because you will have to keep this controller updated every time you add a new model, entity or change the basic logic) but you are also enabling people start playing with your data.

I am not very familiar with Rails so sadly I cannot give you any specific cde other than saying that your approach seems OK to me. I would explode it into multiple controllers.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.