4

I have a large txt (ipaddress.txt) with a lot of lines, each line is a IP Address, eg.:

44.XXX.XXX.XXX
45.XXX.XXX.XXX
46.XXX.XXX.XXX
47.XXX.XXX.XXX
48.XXX.XXX.XXX

i load this list in a TStringList, eg.:

FIpAddressList := TStringList.Create();
FIpAddressList.Sorted := true;
FIpAddressList.LoadFromFile('ipaddress.txt');

and i want check if a IP Address is in the list, eg.:

function IsIPinList(const IPAddress : string): Boolean;
begin
   Result := (FIpAddressList.IndexOf(IPAddress) <> -1);
end;

it works... but is slow with huge TStringList.

Is there any way to make this process faster?

UPDATE

  • The list is static with monthly update, with less or more 5'000 lines.
  • The function is invoked at each request on a server (eg. 5 times per seconds).
  • The list is loaded only when the server service start.
17
  • There are many ways to accomplish what you want: through a database, binary tree, hash lists, etc. It depends on how big is your list, if it changes, restrictions of your application and environment. Commented Dec 16, 2016 at 9:07
  • you can use for example - soft-gems.net/index.php/controls/virtual-treeview which is quite fast. Commented Dec 16, 2016 at 9:09
  • 1
    5000 lines is nothing. And 5 times a second is nothing either. You would barely notice that in a linear search. Are you loading the file from disk every time. Just sort it when you load it and use binary search. Or stuff it in a TDictionary<string, ...>. Commented Dec 16, 2016 at 9:37
  • 2
    For searching in sorted list use Find instead of IndexOf docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/… Commented Dec 16, 2016 at 9:45
  • 1
    That it is rather silly to keep the in memory as stringlist. Convert them immediately into array of cardinal and use search as cardinals. Comparing one Cardinal variable is ONE CPU command. And it one your IP as UnicodeString takes memory enough to hold a dozen of IP as Cardinal addresses. You start your program, you use omething like IOUtils.TFile.GetAllLines to read your config. You parse your IPs into TList<Cardinal>, you save your list finally as array of cardinal - and from that moment on work with this array Commented Dec 16, 2016 at 13:12

2 Answers 2

10

One way to make this quicker is to arrange for the list to be sorted so that the search can be performed using binary search, O(log n), rather than linear search, O(n).

If the list is static then the best thing to do is to sort the list outside your program so that you can sort it exactly once.

If the list is more dynamic then you will have to sort it, and keep any modifications ordered. In this scenario, sorting the list will only bring benefits if the number of searches you make is sufficient to overcome the extra cost of sorting and maintaining the order.

Another approach is to use dictionary containing your items. Typically this will involve hashing each string. Whilst the lookup complexity is O(n) the cost of hashing can be significant.

Yet another way to speed things up is to convert the IP address strings to 32 bit integers. In fact this is sure to yield a huge performance benefit, and you should do this irrespective of any other concerns. It is always faster and clearer to work with a 32 bit integer than a string representation of an IP address.

These are just some possible approaches, there are more. Which to choose depends very much on the usage trade offs.

Whilst you probably are just looking for some code to solve your problem, it is my view that this problem is more algorithmic than code based. You need to better understand the requirements, data size constraints, etc. before selecting an algorithm. Once you have decided on an algorithm, or narrowed the choice down to a small number to compare, implementation should be straightforward.


Another possibility is that you have misdiagnosed your problem. Even linear search over a list of 5000 IP addresses stored as strings should be quicker than you are reporting:

  • My computer can search such a list 2,000 times a second using linear search.
  • Once you sort the list and switch to binary search, then it can manage 1,000,000 searches a second.
  • Switch to storing an ordered array of integers, and it achieves over 10,000,000 searches a second.
  • With a hash based dictionary of integers gives performance twice as fast again.

So, the performance of your search could be improved easily by a factor of 20,000, I still don't think that your performance problems are down to what you believe. I wonder if your real problem that that you are reading the file from disk every time you search.

Sign up to request clarification or add additional context in comments.

3 Comments

after your answer update i have investigated... the list is loaded on each function call and this cause the performance issue. Thanks david!
Good. I think there's a lesson to be had here. Namely that understanding and identifying the problem really well is very often the most important task. Once you have done that, very often the solution reveals itself trivially.
@DavidHeffernan is that encouragement to use a profiler when there are performance issues?
1
function IsIPinList(const IPAddress : string): Boolean;
var dummy : integer 
begin
   Result := FIpAddressList.Find(IPAddress,dummy);
end;

3 Comments

Yes, and it is: FIpAddressList.Sorted := true;
Understood. But Sorted := true is in the question. I just meant that maybe that should be emphasized in your answer - that Find requires a sorted list.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.