0

I have few queries and am not able to figure out how to optimize them,

QUERY 1

select * 
from t_twitter_tracking 
where classified is null and tweetType='ENGLISH' 
order by id limit 500;

QUERY 2

Select 
  count(*) as cnt, 
  DATE_FORMAT(CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30'),'%Y-%m-%d') 
  as dat
from
  t_twitter_tracking wrdTrk 
where 
  wrdTrk.word like ('dell')
  and CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30')  
      between  '2010-12-12 00:00:00' and '2010-12-26 00:00:00'
group by dat;

Both these queries run on the same table,

CREATE TABLE `t_twitter_tracking` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`word` VARCHAR(200) NOT NULL,
`tweetId` BIGINT(100) NOT NULL,
`twtText` VARCHAR(800) NULL DEFAULT NULL,
`language` TEXT NULL,
`links` TEXT NULL,
`tweetType` VARCHAR(20) NULL DEFAULT NULL,
`source` TEXT NULL,
`sourceStripped` TEXT NULL,
`isTruncated` VARCHAR(40) NULL DEFAULT NULL,
`inReplyToStatusId` BIGINT(30) NULL DEFAULT NULL,
`inReplyToUserId` INT(11) NULL DEFAULT NULL,
`rtUsrProfilePicUrl` TEXT NULL,
`isFavorited` VARCHAR(40) NULL DEFAULT NULL,
`inReplyToScreenName` VARCHAR(40) NULL DEFAULT NULL,
`latitude` BIGINT(100) NOT NULL,
`longitude` BIGINT(100) NOT NULL,
`retweetedStatus` VARCHAR(40) NULL DEFAULT NULL,
`statusInReplyToStatusId` BIGINT(100) NOT NULL,
`statusInReplyToUserId` BIGINT(100) NOT NULL,
`statusFavorited` VARCHAR(40) NULL DEFAULT NULL,
`statusInReplyToScreenName` TEXT NULL,
`screenName` TEXT NULL,
`profilePicUrl` TEXT NULL,
`twitterId` BIGINT(100) NOT NULL,
`name` TEXT NULL,
`location` VARCHAR(100) NULL DEFAULT NULL,
`bio` TEXT NULL,
`url` TEXT NULL COLLATE 'latin1_swedish_ci',
`utcOffset` INT(11) NULL DEFAULT NULL,
`timeZone` VARCHAR(100) NULL DEFAULT NULL,
`frenCnt` BIGINT(20) NULL DEFAULT '0',
`createdAt` DATETIME NULL DEFAULT NULL,
`createdOnGMT` VARCHAR(40) NULL DEFAULT NULL,
`createdOnServerTime` DATETIME NULL DEFAULT NULL,
`follCnt` BIGINT(20) NULL DEFAULT '0',
`favCnt` BIGINT(20) NULL DEFAULT '0',
`totStatusCnt` BIGINT(20) NULL DEFAULT NULL,
`usrCrtDate` VARCHAR(200) NULL DEFAULT NULL,
`humanSentiment` VARCHAR(30) NULL DEFAULT NULL,
`replied` BIT(1) NULL DEFAULT NULL,
`replyMsg` TEXT NULL,
`classified` INT(32) NULL DEFAULT NULL,
`createdOnGMTDate` DATETIME NULL DEFAULT NULL,
`locationDetail` TEXT NULL,
`geonameid` INT(11) NULL DEFAULT NULL,
`country` VARCHAR(255) NULL DEFAULT NULL,
`continent` CHAR(2) NULL DEFAULT NULL,
`placeLongitude` FLOAT NULL DEFAULT NULL,
`placeLatitude` FLOAT NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `id` (`id`, `word`),
INDEX `createdOnGMT_index` (`createdOnGMT`) USING BTREE,
INDEX `word_index` (`word`) USING BTREE,
INDEX `location_index` (`location`) USING BTREE,
INDEX `classified_index` (`classified`) USING BTREE,
INDEX `tweetType_index` (`tweetType`) USING BTREE,
INDEX `getunclassified_index` (`classified`, `tweetType`) USING BTREE,
INDEX `timeline_index` (`word`, `createdOnGMTDate`, `classified`) USING BTREE,
INDEX `createdOnGMTDate_index` (`createdOnGMTDate`) USING BTREE,
INDEX `locdetail_index` (`country`, `id`) USING BTREE,
FULLTEXT INDEX `twtText_index` (`twtText`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM
ROW_FORMAT=DEFAULT
AUTO_INCREMENT=12608048;

The table has more than 10 million records. How can I optimize it?

EDITED

Explain on 2nd query

"id";"select_type";"table";"type";"possible_keys";"key";"key_len";"ref";"rows";"Extra" "1";"SIMPLE";"wrdTrk";"range";"word_index,word_createdOnGMT";"word_index";"602";NULL;"222847";"Using where; Using temporary; Using filesort"

Regards, Rohit

4
  • stackoverflow.com/faq#howtoask Commented Dec 25, 2010 at 18:44
  • Could you run EXPLAIN on both queries and post the output? Commented Dec 25, 2010 at 18:46
  • 2
    you could start by learning about datatypes bigint(100) - zomg Commented Dec 25, 2010 at 19:04
  • f00.. thanks for the suggestion. Will have a look . Commented Dec 26, 2010 at 6:19

2 Answers 2

1

In Query2, I suggest that:
1. remove DATE_FORMAT and CONVERT_TZ. You can process in PHP to be an output or between's condition.
2. like ('dell'): I don't see any '%', so you can use wrdTrk.word = 'dell' to let it faster.

Sign up to request clarification or add additional context in comments.

2 Comments

Hi erinus, the main problem is Convert_tz, a query without that run's in 2 - 4 second. While a query with CONVERY_TZ runs in over a minute and a half.
'Select count() as cnt, DATE_FORMAT(CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30'),'%Y-%m-%d') as dat from ...' can change to 'Select count() as cnt, wrdTrk.createdOnGMTDate as dat from ...', then in php, access and format by '$datedata = new DateTime($row["dat"], new DateTimeZone('GMT)); $datedata->setTimezone(new DateTimeZone('Your Timezone Name')); echo $datedata->format("Y-m-d");' So, you can avoid using CONVERT_TZ, and the same for 'WHERE ......'
0

The convert_tz in the where condition needs to be removed,

Select 
  count(*) as cnt, 
  DATE_FORMAT(CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30'),'%Y-%m-%d') 
  as dat
from
  t_twitter_tracking wrdTrk 
where 
  wrdTrk.word like ('dell')
  and CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30')  
      between  '2010-12-12 00:00:00' and '2010-12-26 00:00:00'
group by dat;

This will lead to comparing each row and finding out the right result, hence a tremendous improvement in the query result. Just passed the converted data to the query.

Select 
  count(*) as cnt, 
  DATE_FORMAT(CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30'),'%Y-%m-%d') 
  as dat
from
  t_twitter_tracking wrdTrk 
where 
  wrdTrk.word like ('dell')
  and CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30')  
      between  '2010-12-12 00:00:00' and '2010-12-26 00:00:00'
group by dat;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.