0

I have over 1.7 million records in a table which contains ip address range(begin and end) both primary key and corresponding details.

The Table structure is

mysql> desc csv;
+---------+-------------+------+-----+---------+-------+
| Field   | Type        | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| begin   | bigint(20)  | NO   | PRI | 0       |       |
| end     | bigint(20)  | NO   | PRI | 0       |       |
| code    | char(2)     | YES  |     | NULL    |       |
| country | varchar(50) | YES  |     | NULL    |       |
| city    | varchar(50) | YES  |     | NULL    |       |
| area    | varchar(50) | YES  |     | NULL    |       |
+---------+-------------+------+-----+---------+-------+

Because of Indexing in Primary Key, the search is fast when an exact match is to be made like this

mysql> SELECT * FROM csv WHERE begin=3338456576;
+------------+------------+------+---------------+----------+---------------+
| begin      | end        | code | country       | city     | area          |
+------------+------------+------+---------------+----------+---------------+
| 3338456576 | 3338456831 | US   | UNITED STATES | NEW YORK | NEW YORK CITY |
+------------+------------+------+---------------+----------+---------------+
1 row in set (0.03 sec)

But when I try to search within a range, It takes longer time.

mysql> SELECT * FROM csv WHERE begin<3338456592 AND end>3338456592;
+------------+------------+------+---------------+----------+---------------+
| begin      | end        | code | country       | city     | area          |
+------------+------------+------+---------------+----------+---------------+
| 3338456576 | 3338456831 | US   | UNITED STATES | NEW YORK | NEW YORK CITY |
+------------+------------+------+---------------+----------+---------------+
1 row in set (1.59 sec)

Is there any way I can optimize my query to search ip address within a range?

EDIT

Create table Statement

CREATE TABLE `csv` (
  `begin` bigint(20) NOT NULL DEFAULT '0',
  `end` bigint(20) NOT NULL DEFAULT '0',
  `code` char(2) DEFAULT NULL,
  `country` varchar(50) DEFAULT NULL,
  `city` varchar(50) DEFAULT NULL,
  `area` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`begin`,`end`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
6
  • Can you also post the create table statement output? And some explain statements outputs? Commented Oct 18, 2013 at 18:47
  • How many rows the query returns ? Could you post a result of `SELECT count(*) FROM csv WHERE begin<3338456592 AND end>3338456592;' ? Commented Oct 18, 2013 at 19:05
  • Really only one row ? That means that all remaining rows meet the opposite condition begin > 3338456592 AND end < 3338456592. Assuming that begin <= end (is my assumption true?), then 1.7 million records must have begin = end = 3338456592, and only one row has begin <> 3338456592 AND end <> 3338456592, am I right ? Commented Oct 18, 2013 at 19:14
  • @kordirko take a look at this link showing 20 rows from my table. Commented Oct 18, 2013 at 19:25
  • See this: stackoverflow.com/questions/7955382/… Commented Oct 19, 2013 at 17:38

3 Answers 3

3

If the IP ranges are not overlapping, so the query is never going to return more than 1 row, you can use this:

SELECT q.*
FROM 
  ( SELECT csv.* 
    FROM csv
    WHERE csv.begin < 3338456592 
    ORDER BY csv.begin DESC
    LIMIT 1
  ) AS q
WHERE 3338456592 < q.end ;

No index needs to be added. The primary index will be used.

Sign up to request clarification or add additional context in comments.

3 Comments

Great!! Did it in 0.06 Seconds. Thanx
Removing the last line also works. It gave result in 0.0 seconds
Keep the last line. Otherwise you may get false positives.
1

In case the ranges are overlapping you should:

  • define the ip range as a LineString column
  • define spatial index on that column
  • use a geometric "contains" query

See more in Efficient data model for range queries

Comments

0

What score on SELECT begin, end, code, country, city, area FROM csv WHERE begin <> 3338456592 HAVING begin NOT BETWEEN MIN(begin) AND MAX(end)?

UPD: It is my version of table structure.

CREATE TABLE `csv` (
  `begin` INT(10) NOT NULL DEFAULT '0',
  `end` INT(10) NOT NULL DEFAULT '0',
  `code` char(2) DEFAULT NULL,
  `country` varchar(50) DEFAULT NULL,
  `city` varchar(45) DEFAULT NULL,
  `area` varchar(40) DEFAULT NULL,
  KEY `combined` (`begin`,`end`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

I think use country and code as ENUM it is faster.

18 Comments

Looks like an impossible SQL to me... begin = 3338456592 MIN(begin) = 0 MAX(begin) = 4294967295 and begin 3338456592 is NOT BETWEEN those values?
@GauravSharma Actauly we cant try to optimize your table structure. What returns SELECT * FROM csv PROCEDURE ANALYSE();?
@GauravSharma and can you change table engine to myisam? It faster than innodb
@GauravSharma emm what about UNSIGNED INT(10) ?
UNSIGNED INT can indeed hold the max off 4294967295 IP 255.255.255.255 you can remove (10) this is only usable with zerofill... i see this sometimes INT(11) without zerofill don't know maybe the programmer is thinking the int can hold an larger value that way...
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.