3

I want to write a php a function which will create 50000 unique random alpha numeric string of length 4 and insert it into a db table. how can I do it?

2
  • 1
    Did you mean to include a question? Commented May 20, 2012 at 19:30
  • @Alvi_1987 downvoted for no activity/feedback on this question. Commented May 21, 2012 at 21:38

3 Answers 3

4
for($i = 0; $i < 50000; $i++)
    $pdo->exec("INSERT INTO table_x (the_string) VALUES (UUID());");

As documented here

Edit: alternative insert manner (striping the dashes)

for($i = 0; $i < 50000; $i++)
    $pdo->exec("INSERT INTO table_x (the_string) VALUES (REPLACE(UUID(), '-', ''));");

Edit worth mentioning:

Within one single server the UUIDs encode geographic location and precision time, along with sha-1 random values.

Thus the probability of collision only exists within separate servers (for example when merging their data sets).

So long as we don't overflow the capacity of the geo/time slots it is guaranteed to not create duplicate values locally.

As a matter of optimisation (speed of database reads) casting the UUID to a binary(16) field (and having the column in the table to match that datatype) is faster and more compact.

Sign up to request clarification or add additional context in comments.

13 Comments

The op asked for 5000 unique strings, please change 5000 -> 50000.
Thanks for the pervasive observation skills.
+1, it's straightforward and simple. There are alternatives, but I think it would be a fairly good idea to go with this solution. Whoever wants a more general solution can read my comment at navnav's answer.
"only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%" UUIDs on WikiPedia
I've deleted my answer because I realized (after some research) that Mihai could be right about the dups. But I would just like to point out - you guys are talking about what the question required, yet this answer provides a non-suitable solution. The UUID() function being used inserts a string with -. The OP required a alpha numeric string - no where did he/she mention special chars like hyphens. -1
|
2

This function will generate an array of alphanumeric codes, using whatever character set you like, and whatever length you require. It generates no sequential duplicate letters.

function GenCodes($howmany=50000) {

 $charset = 'ABCDEFGHJKLMNPQRSTUVWXYZ23456789';
 $cl = strlen($charset);
 $codelength = 4;
 $result = array();
 $code = array();
 $lastchar = "";

 for ($x = 1; $x <=$howmany; $x++) {
    for ($i=1; $i<=$codelength; $i++) {
        while(($l = rand(1,$cl-1)) == $lastchar)
            ;
        $code[$i] = $charset[$l];
        $lastchar = $l;
    }
    $code = implode($code);
    $result[$x] = $code;
    $code = "";
    $lastchar = "";
  }
  return $result;
}

And this one will save them to a database table ensuring there are no duplicates.

function SaveCodes($codes) {
 global $dbHost, $dbPort, $dbUser, $dbPass, $dbName;
 $insctr = 0;
 $db = new db($dbHost, $dbPort, $dbUser, $dbPass, $dbName);
 foreach($codes as $code) {

    $sql = "select code_id from codes where code_code='".$code."'";

    $result = $db->Query($sql);
    if ($db->NumRows() == 0) {  // don't generate an in-use code
        $sql = "insert into codes (code_code) values ('".$code."')";
        $result = $db->Query($sql);
        if ($result) {
            $insctr++;
        }
    }
 }
}

7 Comments

As I commented on @navnav's example, doing this kind of thing without transactions is unsafe and has a higher failure rate.
The OP did not specify any specific database, so I have not assumed that transactions were available.
A solution should work for all possible cases. If the solution doesn't work for some specific cases then the person who shares the answer should specify when does the answer work and why it doesn't work in other cases.
With duplicate letters if you use a string length of 10 letters and an alphabet of 30 different characters you can have 10^30 combinations (I think, my math's a bit rusty).
But with no duplicate letters that boils down to seriously less possible combinations. Which in the sense of generating many unique strings is bad, you'll hit collisions earlier and the constraints will make you run out of possible strings fast.
|
1

Use a string representation of the 100 number system. Digits would be 0, 1, ..., A, ..., Z, ..., a, ..., z These are only 62 digits so you should use other 38 alphanumeric characters, such as Á á É é Ó ó Ö ö Ő ő Ú ú Ű ű and so on. Each four character string should be part of your set of 100 possible digits. You start from 0000 and add randomly a number to this which would be at least 1 and maximum 2000 in number system 10. The result of the addition (represented as a string with the abstract meaning of a number in the 100 number system) will be your first number. From the second number on all your numbers should be generated by adding the same random offset (between 1 and 2000) to the last generated number. This way your strings will be unique.

Also you should generate batches for insertion, because it's better to have 50 inserts with 1000 rows to insert each than to have 50000 database requests.

6 Comments

In that case why not use PostgreSQL serial sequences, which are (indeed integres not string; datatype BIGINT) but have atomic incrementing and transactional safety and never reuse an id even if transaction rolledback after requesting it. They can be setup to increment by which ever random value you would like. Their storage and indexing features are optimised and they can be processed to be displayed as strings?
Good question, the answer is that the op wants strings, so I generate 50000 thousands of strings. I'm guaranteeing that no duplicates will arise because I interpret (but only logically) them as numbers. Only a few functions are needed to handle the operations and we are good to go. This idea worked when I had a similar task months ago.
Also you can't have 50000 unique decimal numbers in four character representation in number system 10. Bu we can generate 50000 distinct alphanumeric strings without needing to compare them with this idea. Also this is business logic, it should be handled by the application server, not by the database server in my opinion.
But PostgreSQL is an alternative, as you correctly pointed out.
PostgreSQL has custom datatypes, type inheritance and table inheritance. Custom types allow you to define functions for input of data and output of data. Which means you can use optimised bigints in the background while inserting/retrieving textual strings. A compromise can be found between what OP wants (i don't think he's following us to intently) and what OP might in fact need. Your solution implemented in PgSQL would be pretty fine indeed.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.