2

Optimizng MySQL queries isn't my expertise, so I was wondering if someone could help me formulate the most optimal query here (and indices).

As background, I'm trying to find a distinct visitor id within a table of transactions with certain where criteria (date range, not a certain product, etc. as you see in the query below). Transactions and visitors have a one to many relationship, so there can be many transactions to a single visitor.

Another requirement for the results is that if a visitor_id is found in the result, it must be the first instance of a visitor_id (by date_time) in the entire table. In other words, the visitor_id should only exist in the date range set in the primary query and at no time beforehand.

Here's what I've put together so far. It uses NOT IN and a subquery, but this doesn't seem ideal because the query takes between 2-3 seconds being that the table has over 500k records. I've tried a few variations of indices, but nothing seems to really work.

Here's the query.

SELECT DISTINCT visitor_id, date_time
FROM pt_transactions
WHERE visitor_id NOT IN (SELECT visitor_id FROM pt_transactions WHERE date_time <     '$this->_date_time_start')
AND campaign_id = $this->_campaign_id
AND a_aid = '$a_aid'
AND date_time >= '$this->_date_time_start'
AND date_time <= '$this->_date_time_end'                      
AND product_id != 65

And here's the complete table structure.

CREATE TABLE IF NOT EXISTS `pt_transactions` (
  `id` int(32) NOT NULL AUTO_INCREMENT,
  `type` varchar(2) NOT NULL COMMENT 'New Lead (NL), Raw Optin (RO), Base Sale (BS), Upsell Sale (US), Recurring Sale (RS), Base Refund (BR), Upsell Refund (UR), Recurring Refund (RR), Unknown Refund (XR),  or Chargeback (C)',
  `date_time` datetime NOT NULL,
  `amount` varchar(255) NOT NULL,
  `a_aid` varchar(255) NOT NULL,
  `subid1` varchar(255) NOT NULL,
  `subid2` varchar(255) NOT NULL,
  `subid3` varchar(255) NOT NULL,
  `product_id` int(16) NOT NULL,
  `visitor_id` int(32) NOT NULL,
  `campaign_id` int(16) NOT NULL,
  `last_click_id` int(16) NOT NULL,
  `trackback_type` varchar(255) NOT NULL COMMENT 'Shows if the transaction is tracked back to the original visitor via cookie or via IP.  Usually only applies to sales via pixel.',
  `original_transaction_id` int(32) NOT NULL COMMENT 'Reference to original transaction id, in this table, if type is RS, R, or C',
  `recurring_transaction_id` varchar(32) NOT NULL COMMENT 'Reference to existing RecurringTransaction if type is RS',
  PRIMARY KEY (`id`),
  KEY `visitor_id` (`visitor_id`),
  KEY `campaign_id` (`visitor_id`,`campaign_id`,`amount`,`product_id`),
  KEY `transaction_retrieval_group` (`campaign_id`,`date_time`,`a_aid`),
  KEY `type` (`type`),
  KEY `date_time` (`date_time`),
  KEY `original_source` (`campaign_id`,`a_aid`,`date_time`,`product_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=574636 
0

2 Answers 2

3

You can try NOT EXISTS

SELECT DISTINCT visitor_id, date_time
  FROM pt_transactions t
 WHERE campaign_id = $this->_campaign_id
   AND a_aid = '$a_aid'
   AND date_time >= '$this->_date_time_start'
   AND date_time <= '$this->_date_time_end'                      
   AND product_id != 65
   AND NOT EXISTS 
(
  SELECT * 
    FROM pt_transactions 
   WHERE visitor_id = t.visitor_id
     AND date_time < '$this->_date_time_start'
)

Do EXPLAIN <query> and see how your indices are used. If you want you can post results in your question in a textual form.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes, thank you! This did the trick. I ended up re-writing the query and taking another approach, but this lead me in the right direction.
3

From your query what i can understand is that... Their is no need to write NOT IN Statement...

Because, you are already keeping a check for

date_time >= '$this->_date_time_start'

so thier is no need to check date_time < '$this->_date_time_start' in not NOT IN statement.

Only below should work fine :)

SELECT DISTINCT visitor_id, date_time
FROM pt_transactions
WHERE 
AND campaign_id = $this->_campaign_id
AND a_aid = '$a_aid'
AND date_time >= '$this->_date_time_start'
AND date_time <= '$this->_date_time_end'                      
AND product_id != 65

3 Comments

Yes. No need for subqueries.
Actually there is a need in a subquery or an outer join because OP wants ... the first instance of a visitor_id (by date_time) in the entire table.
Hmmm... being the table you are querying,

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.