4

I'm struggling to think of an efficient model to describe IPv4 address data. I want to be able to perform a 'whois' type lookup on a dataset within MySQL. Currently I have this:

CREATE TABLE inetnum (
 `from_ip` int(11) unsigned NOT NULL,
 `to_ip` int(11) unsigned NOT NULL,
 `netname` varchar(40) default NULL,
 `ip_txt` varchar(60) default NULL,
 `descr` varchar(60) default NULL,
 `country` varchar(2) default NULL,
 `recurse_limit` int(11) NOT NULL default '0',
 `unexpected` int(11) NOT NULL default '0',
 `rir` enum('APNIC','AFRINIC','ARIN','RIPE','LACNIC') NOT NULL default 'RIPE',
 PRIMARY KEY  (`from_ip`,`to_ip`)
) ENGINE=MyISAM DEFAULT CHARSET=ascii;

And I want to do queries like this:

SELECT *
FROM inetnum
WHERE INET_ATON('192.168.0.1') BETWEEN from_ip AND to_ip;

But because the upper and lower bounds of the address range are held in different fields, this results in a full table scan:

mysql> EXPLAIN SELECT * FROM `inetnum` WHERE INET_ATON('192.168.0.1') BETWEEN from_ip AND to_ip;
+----+-------------+---------+------+---------------+------+---------+------+---------+------------+
| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+---------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | inetnum | ALL  | NULL          | NULL | NULL    | NULL | 3800440 | Using where |
 +----+-------------+---------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

(and as I'm sure someone will try to point out - not it's not because of the INET_ATON function - using a literal integer makes no difference, nor does using <=to_ip AND >=from_ip).

This is currently running on MySQL 5.0.67. I only have limited scope for changing/upgrading the DBMS.

4
  • 1
    Since the underlying data is (at least notionally) hierarchical, then I could model this as a tree - but for any sort of efficiency I'd need to add surrogate node to define arbitrary ranges. Another possiblity would be to treat this a 1-dimensional GIS - but I've never worked with geo-spatial data. Commented Dec 12, 2013 at 10:28
  • A simpler case (no overlapping ranges) of the same problem is described here: linuxdevcenter.com/pub/a/linux/2004/01/06/rangekeyed_1.html (but with no viable solution) Commented Dec 12, 2013 at 10:36
  • 1
    possible duplicate of How to store an IP in mySQL Commented Dec 12, 2013 at 11:59
  • Closely related: stackoverflow.com/a/19283828/1046007 Commented Jun 22, 2020 at 18:36

2 Answers 2

2

Actually, your primary key have little sense in terms of such range query. It only indicates unique pairs for <from_ip, to_ip> tuple - thus, MySQL will not be able to use that index with such range comparisons.

Unless you're running some query that involves both parts of your primary key, it will have no effect (well, actually, MySQL also will use it - when selection condition uses left-subset of compound index, but that's not your case). For example, this will use primary key:

-- @x and @y are derived from somewhere else
SELECT * FROM inetnum WHERE from_ip=@x && to_ip=@y

In your case, compound key may be primary key, yes, but it's only benefit will be - to provide uniqueness. So, you can leave it as it is, or create surrogate id primary key (replacing current primary key with UNIQUE constraint).

One of possible solutions to improve situation could be - create single-column keys for from_ip and to_ip. Since they are integers, there's a good chance for high cardinality, that result indexes will have. However, MySQL can use only one index, and, therefore, you'll lose 'half' of range efficient comparison. Also you should remember, that if greater-than (or less-than) comparison will affect too many rows, MySQL will not use index as well (since, obviously, there's no sense in that because there are too much rows to select).

And - yes, avoid using functions in WHERE clause. I'm not saying that MySQL will always loose index usage in such case (but most likely, it will loose it in most cases) - but think about overhead that will cause function call. Even if it's little - you can always get rid of it via passing correct value, formed by your application.

Sign up to request clarification or add additional context in comments.

8 Comments

Sorry - this doesn't really help. Using the single column index just makes inserts more expensive and if I drop the current key there's the complication of no longer having an intrinsic primary key (a netwrok may contain multiple sub-nets, at least one of which will have the same from_ip) and I'm still running a query which the optimizer sees as WHERE values &gt;= indexed_column hence it still needs to consider the majority of rows in the database.
Well, you didn't mentioned about inserts and/or other special conditions that force you to use such primary key. My point is - that this multiple primary key will be useless in your task. You can not do anything with that (I assume you can't change structure like you've said above). New keys will slow inserts - no doubt in that, so you have to decide - what is better - to have slow selects or to have slow inserts
Insert/update and uniqueness are not major constraints - but adding single value indexes doesn't help - the database can't use them to solve the problem :(
So then your select conditions are too broad (I mean there are too many rows to select, and, thus, MySQL will not use such keys)
No - the database doesn't know BEFORE fetching the data how many rows to SELECT nor where to start loking from then....192.168.0.1 is in 0.0.0.0/32 and 128.0.0.0/31 and 192.0.0.0/32 and 192.0.0.0/33....
|
2

I found a solution (using spatial data types) here on Stack overflow - but note that the solution is not the accepted answer - it's the one from Quassnoi

Please vote to close my question as a duplicate.

But for anyone trying this at home - there was an additional complication as I already had a table of data - hence I'm using a slightly different recipe:

mysql> alter table inetnum add column netrange linestring;
Query OK, 3800440 rows affected (22.41 sec)
Records: 3800440  Duplicates: 0  Warnings: 0

mysql> create spatial index rangelookup on inetnum(netrange);
ERROR 1252 (42000): All parts of a SPATIAL index must be NOT NULL

mysql> UPDATE inetnum
    -> SET netrange=GeomFromText(CONCAT('LINESTRING(', from_ip, ' -1, ', to_ip, ' 1)'))
    -> ;
Query OK, 3800440 rows affected (57.42 sec)
Rows matched: 3800440  Changed: 3800440  Warnings: 0

mysql> create spatial index rangelookup on inetnum(netrange);
ERROR 1252 (42000): All parts of a SPATIAL index must be NOT NULL

mysql> alter table inetnum modify netrange linestring not null;
Query OK, 3800440 rows affected (35.84 sec)
Records: 3800440  Duplicates: 0  Warnings: 0

mysql> create spatial index rangelookup on inetnum(netrange);
Query OK, 3800440 rows affected (1 min 19.69 sec)
Records: 3800440  Duplicates: 0  Warnings: 0

mysql> SELECT COUNT(*)
    -> FROM inetnum
    -> WHERE INET_ATON('88.104.22.241') BETWEEN from_ip AND to_ip;
+----------+
| COUNT(*) |
+----------+
|        3 |
+----------+
1 row in set (1.19 sec)

mysql> SELECT COUNT(*)
    -> FROM inetnum
    -> WHERE MBRCONTAINS(netrange, GEOMFROMTEXT(CONCAT('POINT(', INET_ATON('88.104.22.241'), ' 0)')));
+----------+
| COUNT(*) |
+----------+
|       10 |
+----------+
1 row in set (0.06 sec)

1 Comment

(note that the difference in the 2 results was due to an anomoly in the data - there were 7 records with 'to_ip' value of 0 (0.0.0.0)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.