I wrote a script in python that scans a file and extract strings from it in a big array. I do something like:
while (delimiter #1 found)
search for the delimiter #2
if the string between #1 and #2 is not in the "final array", add it.
It took me 1 hour to make the script in python. But it's just too slow for big files (8 minutes for 400 files is far too long) So I decided to write this batch in C. After one day I still haven't finished it.
I've already looked at things like sorted arrays (gnu C sorted arrays) I'd like to check whether the string betwen #1 and #2 is already in an array of strings, and if not, add it. I thought there would be obvious functions like adding a string in a pre-sorted array (and keep it sorted), and / or adding a string in a pre-sorted array if it's not already in.
The only solutions I've found is
- use lsearch()
- use bsearch (), and if not found, add it and re-sort the array()
The second function takes ages ( qsort() is too long) and the first one is getting too long after thousand of elements (because they're not sorted).
Do you know where I could look / what I could do / which library I could use? I guess I'm not the only one on earth who wants to put a string in a pre-sorted string array only if it's not present (and keep it sorted)! ;)
uniqthensorttake?time ./scriptpython= Python doing the whole stuff (parsing, sorting, uniq) = 2m31.596s.time ./a.exe > tt= C program parsing = 0m2.170s.time ./a.exe | uniq | sort > tt= whole stuff = 0m2.497s