I have a table with two fields:
- id(UUID) that is primary Key and
- description (
var255)
I want to insert random data with SQL sentence. I would like that description would be something random.
PS: I am using PostgreSQL.
I dont know exactly if this fits the requirement for a "random description", and it's not clear if you want to generate the full data: but, for example, this generates 10 records with consecutive ids and random texts:
test=# SELECT generate_series(1,10) AS id, md5(random()::text) AS descr;
id | descr
----+----------------------------------
1 | 65c141ee1fdeb269d2e393cb1d3e1c09
2 | 269638b9061149e9228d1b2718cb035e
3 | 020bce01ba6a6623702c4da1bc6d556e
4 | 18fad4813efe3dcdb388d7d8c4b6d3b4
5 | a7859b3bcf7ff11f921ceef58dc1e5b5
6 | 63691d4a20f7f23843503349c32aa08c
7 | ca317278d40f2f3ac81224f6996d1c57
8 | bb4a284e1c53775a02ebd6ec91bbb847
9 | b444b5ea7966cd76174a618ec0bb9901
10 | 800495c53976f60641fb4d486be61dc6
(10 rows)
SELECT generate_series(1,10) AS id, md5(random()::text) AS descr;?The following worked for me:
create table t_random as select s, md5(random()::text) from generate_Series(1,5) s;
Here it is a more elegant way using the latest features. I will use the Unix dictionary (/usr/share/dict/words) and copy it into my PostgreSQL data:
cp /usr/share/dict/words data/pg95/words.list
Then, you can easily create a ton of no sense description BUT searchable using dictionary words with the following steps:
1) Create table and function. getNArrayS gets all the elements in an array and teh number of times it needs to concatenate.
CREATE TABLE randomTable(id serial PRIMARY KEY, description text);
CREATE OR REPLACE FUNCTION getNArrayS(el text[], count int) RETURNS text AS $$
SELECT string_agg(el[random()*(array_length(el,1)-1)+1], ' ') FROM generate_series(1,count) g(i)
$$
VOLATILE
LANGUAGE SQL;
Once you have all in place, run the insert using CTE:
WITH t(ray) AS(
SELECT (string_to_array(pg_read_file('words.list')::text,E'\n'))
)
INSERT INTO randomTable(description)
SELECT getNArrayS(T.ray, 3) FROM T, generate_series(1,10000);
And now, select as usual:
postgres=# select * from randomtable limit 3;
id | description
----+---------------------------------------------
1 | ultracentenarian splenodiagnosis manurially
2 | insequent monopolarity funipendulous
3 | ruminate geodic unconcludable
(3 rows)
I assume sentance == statement? You could use perl or plperl as perl has some good random data generators. Check out perl CPAN module Data::Random to start.
Here's a sample of a perl script to generate some different random stuff taken from CPAN.
use Data::Random qw(:all);
my @random_words = rand_words( size => 10 );
my @random_chars = rand_chars( set => 'all', min => 5, max => 8 );
my @random_set = rand_set( set => \@set, size => 5 );
my $random_enum = rand_enum( set => \@set );
my $random_date = rand_date();
my $random_time = rand_time();
my $random_datetime = rand_datetime();
open(FILE, ">rand_image.png") or die $!;
binmode(FILE);
print FILE rand_image( bgcolor => [0, 0, 0] );
close(FILE);
Generating random strings with the specified characters and length
The fact that md5(random()::text) only has hex digits and a fixed length saddens me a bit, which made me look for a more elegant function.
This question has some good answers: Generate random String in PostgreSQL I think I'll end up going with the following function:
CREATE OR REPLACE FUNCTION random_string(int) RETURNS TEXT as $$
select
string_agg(substr(characters, (random() * length(characters) + 1)::integer, 1), '') as random_word
from (values('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789- ')) as symbols(characters)
join generate_series(1, $1) on 1 = 1
$$ language sql;
which can then be used as:
CREATE TABLE "mytable" ("i" INTEGER, "j" INTEGER, "s" TEXT, "t" TEXT);
INSERT INTO "mytable" SELECT i, i*2, random_string(10), random_string(20)
FROM generate_series(1, 10) as s(i);
to produce data like:
i | j | s | t
----+----+------------+----------------------
1 | 2 | VgXg6 9c9T | 3SbRe0CmyDzL6tbIi s8
2 | 4 | 6Nqx ynI7 | nQhOZrnXI9HdwBlNzFB
3 | 6 | FXGl4Wn4va | RZ2zdOLFqErJ44MhoQVm
4 | 8 | 6i IrvUVE | bY7Gta1 c8tVdDpMTiJ2
5 | 10 | aU-30dwGVl | mzZPchylUL6FgQ04N Fy
6 | 12 | yQMnXFPIt | 1 fMb7wQE-gqCzkFoVD
7 | 14 | Okj7XypLbQ | TosYftFbbRSD4GnQOELV
8 | 16 | 1w P3zUW | kY1c kU0mJNURCCw1na
9 | 18 | vYcFmC8W82 | R6Omwp KcL H1MNJKzR
10 | 20 | v0I fPg32 | 8 cjc0q8DCOczxNNvF4N
Performance-wise it is not amazing however, with about 12s for 1 million rows. For comparison:
INSERT INTO "tmp" (i, j, s, t) SELECT i, i*2, md5(random()::text), md5(random()::text)
took 1.8 s, and:
INSERT INTO "tmp" (i, j, s, t) SELECT i, i*2, 'a', 'a'
took 0.5 s.
Generate a CSV with an external tool and import it with \copy FROM STDIN
Since every pure PostgreSQL option is a bit awkward and/or slow, perhaps the best option is to just use an external tool to generate the data, and then feed it to PostgreSQL as a CSV from stdin.
For example, given the table:
CREATE TABLE "mytable" ("i" INTEGER, "j" INTEGER, "s" TEXT, "t" TEXT);
a potentially slow but convenient option would be to just use a Python script such as:
generate_data.py
import csv
import random
import string
import sys
CHARS = string.ascii_letters + string.digits + ' '
def rands(n):
return ''.join(random.choice(CHARS) for _ in range(n))
if len(sys.argv) > 1:
n = int(sys.argv[1])
else:
n = 10
writer = csv.writer(sys.stdout)
for i in range(n):
writer.writerow((i, random.randrange(100), rands(10), rands(20)))
and then use it as:
python generate_data.py 10 |
psql mydb -c '\copy "mytable" FROM STDIN'
Related:
Tested on Ubuntu 24.04, PostgreSQL 16.6, Python 3.12.3, Lenovo ThinkPad P14s.