Creating Large Random Content Files in Python

Question

I am working on characterizing an SSD drive to determine max TBW / life expectancy.

Currently I am using BASH to generate 500MB files with random (non-zero) content :

dd if=<(openssl enc -aes-128-cbc -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero) of=/media/m2_adv3d/abc${stamp1} bs=1MB count=500 iflag=fullblock&

Note : {stamp1} is a time stamp for ensuring unique file names.

I am looking to accomplish the same result in Python but am not finding efficient ways to do this (generate the file quickly).

Looking for suggestions.

Thanks!

Update

I have been experimenting with the following and seem to have achieved 2 second write; files appear to be random and different :

import os

newfile = open("testfile.001", "a")
newfile.write (os.urandom(500000000))    # generate 500MB random content file
newfile.close ()

A little skeptical that this is truly good enough to stress an SSD. Basically going to infinitely loop this; once drive is full, deleting to oldest file and writing new one, and collecting SMART data every 500 files written to trend the aging.

Thoughts?

Thanks,

Dan.

Perhaps if you edited the question to show the code you would like speeded up people will suggest improvements. Hard to answer without seeing the existing code. — holdenweb
– holdenweb, Commented Feb 28, 2019 at 16:22
One thought: since the IO operation is bound to take time, a threaded or asynchronous solution that allows a new random block to be generated while the last one is being written might speed things up. — holdenweb
– holdenweb, Commented Mar 1, 2019 at 12:42
@holdenweb ; thank you for the suggestions. Tried threading and took a performance hit ... while I seem to be able to consistently write 500MB files at 3 ~ 5 seconds a piece (linear); when I attempt to do two in parallel using threads, I am hitting between 10 ~ 17 seconds ... more towards the 17 seconds. Will post the code for reference and close this one off. Thanks! — Dan G
– Dan G, Commented Mar 5, 2019 at 23:17

Dan G · Accepted Answer · 2020-01-15 04:51:20Z

3

The os.urandom option works best for generating large random files.

answered Jan 15, 2020 at 4:51

Dan G

3961 gold badge4 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Austin Wagner · Accepted Answer · 2019-02-28 16:49:10Z

1

You could try something as easy as this.

import pandas as pd
import numpy as np

rows = 100000
cols = 10000

table_size = [rows,cols]

x = np.ones(table_size)
pd.DataFrame(x).to_csv(path)

You can update the table size to make it larger or smaller. I am not sure if this is more / less efficient than what you are already trying.

answered Feb 28, 2019 at 16:49

Austin Wagner

815 bronze badges

1 Comment

Dan G Over a year ago

Trying different approach (see edited question above).

Collectives™ on Stack Overflow

Creating Large Random Content Files in Python

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related