71

I have a table with two fields:

  1. id(UUID) that is primary Key and
  2. description (var255)

I want to insert random data with SQL sentence. I would like that description would be something random.

PS: I am using PostgreSQL.

5 Answers 5

100

I dont know exactly if this fits the requirement for a "random description", and it's not clear if you want to generate the full data: but, for example, this generates 10 records with consecutive ids and random texts:

  test=#  SELECT generate_series(1,10) AS id, md5(random()::text) AS descr;

     id |               descr
    ----+----------------------------------
      1 | 65c141ee1fdeb269d2e393cb1d3e1c09
      2 | 269638b9061149e9228d1b2718cb035e
      3 | 020bce01ba6a6623702c4da1bc6d556e
      4 | 18fad4813efe3dcdb388d7d8c4b6d3b4
      5 | a7859b3bcf7ff11f921ceef58dc1e5b5
      6 | 63691d4a20f7f23843503349c32aa08c
      7 | ca317278d40f2f3ac81224f6996d1c57
      8 | bb4a284e1c53775a02ebd6ec91bbb847
      9 | b444b5ea7966cd76174a618ec0bb9901
     10 | 800495c53976f60641fb4d486be61dc6
    (10 rows)
Sign up to request clarification or add additional context in comments.

2 Comments

Why not SELECT generate_series(1,10) AS id, md5(random()::text) AS descr;?
@AntonyHatchkins For no good reason. Fixed following your suggestion. Thanks
43

The following worked for me:

create table t_random as select s, md5(random()::text) from generate_Series(1,5) s;

1 Comment

Its useful to know how to do the time series in the from portion of the query. Thanks for the follow up.
26

Here it is a more elegant way using the latest features. I will use the Unix dictionary (/usr/share/dict/words) and copy it into my PostgreSQL data:

cp /usr/share/dict/words data/pg95/words.list

Then, you can easily create a ton of no sense description BUT searchable using dictionary words with the following steps:

1) Create table and function. getNArrayS gets all the elements in an array and teh number of times it needs to concatenate.

CREATE TABLE randomTable(id serial PRIMARY KEY, description text);

CREATE OR REPLACE FUNCTION getNArrayS(el text[], count int) RETURNS text AS $$
  SELECT string_agg(el[random()*(array_length(el,1)-1)+1], ' ') FROM generate_series(1,count) g(i)
$$
VOLATILE
LANGUAGE SQL;

Once you have all in place, run the insert using CTE:

WITH t(ray) AS(
  SELECT (string_to_array(pg_read_file('words.list')::text,E'\n')) 
) 
INSERT INTO randomTable(description)
SELECT getNArrayS(T.ray, 3) FROM T, generate_series(1,10000);

And now, select as usual:

postgres=# select * from randomtable limit 3;
 id |                 description                 
----+---------------------------------------------
  1 | ultracentenarian splenodiagnosis manurially
  2 | insequent monopolarity funipendulous
  3 | ruminate geodic unconcludable
(3 rows)

Comments

0

I assume sentance == statement? You could use perl or plperl as perl has some good random data generators. Check out perl CPAN module Data::Random to start.

Here's a sample of a perl script to generate some different random stuff taken from CPAN.

use Data::Random qw(:all);

  my @random_words = rand_words( size => 10 );

  my @random_chars = rand_chars( set => 'all', min => 5, max => 8 );

  my @random_set = rand_set( set => \@set, size => 5 );

  my $random_enum = rand_enum( set => \@set );

  my $random_date = rand_date();

  my $random_time = rand_time();

  my $random_datetime = rand_datetime();

  open(FILE, ">rand_image.png") or die $!;
  binmode(FILE);
  print FILE rand_image( bgcolor => [0, 0, 0] );
  close(FILE);

Comments

0

Generating random strings with the specified characters and length

The fact that md5(random()::text) only has hex digits and a fixed length saddens me a bit, which made me look for a more elegant function.

This question has some good answers: Generate random String in PostgreSQL I think I'll end up going with the following function:

CREATE OR REPLACE FUNCTION random_string(int) RETURNS TEXT as $$
select
string_agg(substr(characters, (random() * length(characters) + 1)::integer, 1), '') as random_word
from (values('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-    ')) as symbols(characters)
join generate_series(1, $1) on 1 = 1
$$ language sql;  

which can then be used as:

CREATE TABLE "mytable" ("i" INTEGER, "j" INTEGER, "s" TEXT, "t" TEXT);
INSERT INTO "mytable" SELECT i, i*2, random_string(10), random_string(20)
  FROM generate_series(1, 10) as s(i);

to produce data like:

 i  | j  |     s      |          t           
----+----+------------+----------------------
  1 |  2 | VgXg6 9c9T | 3SbRe0CmyDzL6tbIi s8
  2 |  4 | 6Nqx ynI7  |  nQhOZrnXI9HdwBlNzFB
  3 |  6 | FXGl4Wn4va | RZ2zdOLFqErJ44MhoQVm
  4 |  8 | 6i  IrvUVE | bY7Gta1 c8tVdDpMTiJ2
  5 | 10 | aU-30dwGVl | mzZPchylUL6FgQ04N Fy
  6 | 12 | yQMnXFPIt  |  1 fMb7wQE-gqCzkFoVD
  7 | 14 | Okj7XypLbQ | TosYftFbbRSD4GnQOELV
  8 | 16 | 1w P3zUW   |  kY1c kU0mJNURCCw1na
  9 | 18 | vYcFmC8W82 | R6Omwp KcL H1MNJKzR
 10 | 20 | v0I fPg32  | 8 cjc0q8DCOczxNNvF4N

Performance-wise it is not amazing however, with about 12s for 1 million rows. For comparison:

INSERT INTO "tmp" (i, j, s, t) SELECT i, i*2, md5(random()::text), md5(random()::text)

took 1.8 s, and:

INSERT INTO "tmp" (i, j, s, t) SELECT i, i*2, 'a', 'a'

took 0.5 s.

Generate a CSV with an external tool and import it with \copy FROM STDIN

Since every pure PostgreSQL option is a bit awkward and/or slow, perhaps the best option is to just use an external tool to generate the data, and then feed it to PostgreSQL as a CSV from stdin.

For example, given the table:

CREATE TABLE "mytable" ("i" INTEGER, "j" INTEGER, "s" TEXT, "t" TEXT);

a potentially slow but convenient option would be to just use a Python script such as:

generate_data.py

import csv
import random
import string
import sys

CHARS = string.ascii_letters + string.digits + ' '
def rands(n):
    return ''.join(random.choice(CHARS) for _ in range(n))
if len(sys.argv) > 1:
    n = int(sys.argv[1])
else:
    n = 10
writer = csv.writer(sys.stdout)
for i in range(n):
    writer.writerow((i, random.randrange(100), rands(10), rands(20)))

and then use it as:

python generate_data.py 10 |
psql mydb -c '\copy "mytable" FROM STDIN'

Related:

Tested on Ubuntu 24.04, PostgreSQL 16.6, Python 3.12.3, Lenovo ThinkPad P14s.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.