In Python, what is a good, or the best way to generate some random text to prepend to a file(name) that I'm saving to a server, just to make sure it does not overwrite. Thank you!
15 Answers
You could use the UUID module for generating a random string:
import uuid
filename = str(uuid.uuid4())
This is a valid choice, given that an UUID generator is extremely unlikely to produce a duplicate identifier (a file name, in this case):
Only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.
3 Comments
uuid.uuid4().hex to get an hex string without dashes (-).Python has facilities to generate temporary file names, see http://docs.python.org/library/tempfile.html. For instance:
In [4]: import tempfile
Each call to tempfile.NamedTemporaryFile() results in a different temp file, and its name can be accessed with the .name attribute, e.g.:
In [5]: tf = tempfile.NamedTemporaryFile()
In [6]: tf.name
Out[6]: 'c:\\blabla\\locals~1\\temp\\tmptecp3i'
In [7]: tf = tempfile.NamedTemporaryFile()
In [8]: tf.name
Out[8]: 'c:\\blabla\\locals~1\\temp\\tmpr8vvme'
Once you have the unique filename it can be used like any regular file. Note: By default the file will be deleted when it is
closed. However, if the delete parameter is False, the file is not
automatically deleted.
Full parameter set:
tempfile.NamedTemporaryFile([mode='w+b'[, bufsize=-1[, suffix=''[, prefix='tmp'[, dir=None[, delete=True]]]]]])
it is also possible to specify the prefix for the temporary file (as one of the various parameters that can be supplied during the file creation):
In [9]: tf = tempfile.NamedTemporaryFile(prefix="zz")
In [10]: tf.name
Out[10]: 'c:\\blabla\\locals~1\\temp\\zzrc3pzk'
Additional examples for working with temporary files can be found here
4 Comments
a common approach is to add a timestamp as a prefix/suffix to the filename to have some temporal relation to the file. If you need more uniqueness you can still add a random string to this.
import datetime
basename = "mylogfile"
suffix = datetime.datetime.now().strftime("%y%m%d_%H%M%S")
filename = "_".join([basename, suffix]) # e.g. 'mylogfile_120508_171442'
4 Comments
1. Test if file exists, 2. create file. If another process interrupts yours between steps 1 and 2, and creates the file, when your code resumes it will overwrite the other process' file.tempfile module, which handles this for you. :)...strftime("%y%m%d_%H%M%S%f")The OP requested to create random filenames not random files. Times and UUIDs can collide. If you are working on a single machine (not a shared filesystem) and your process/thread will not stomp on itself, use os.getpid() to get your own PID and use this as an element of a unique filename. Other processes would obviously not get the same PID. If you are multithreaded, get the thread id. If you have other aspects of your code in which a single thread or process could generate multiple different temp files, you might need to use another technique. A rolling index can work (if you aren't keeping them so long or using so many files you would worry about rollover). Keeping a global hash/index to "active" files would suffice in that case.
So sorry for the longwinded explanation, but it does depend on your exact usage.
Comments
If you want to preserve the original filename as a part of the new filename, unique prefixes of uniform length can be generated by using MD5 hashes of the current time:
from hashlib import md5
from time import localtime
def add_prefix(filename):
prefix = md5(str(localtime()).encode('utf-8')).hexdigest()
return f"{prefix}_{filename}"
Calls to the add_prefix('style.css') generates sequence like:
a38ff35794ae366e442a0606e67035ba_style.css
7a5f8289323b0ebfdbc7c840ad3cb67b_style.css
2 Comments
Adding my two cents here:
In [19]: tempfile.mkstemp('.png', 'bingo', '/tmp')[1]
Out[19]: '/tmp/bingoy6s3_k.png'
According to the python doc for tempfile.mkstemp, it creates a temporary file in the most secure manner possible. Please note that the file will exist after this call:
In [20]: os.path.exists(tempfile.mkstemp('.png', 'bingo', '/tmp')[1])
Out[20]: True
Comments
As date and time both change after each second so you need to concatenate data-time with uuid (Universally Unique Identifiers) here is the complete code for your answer
import uuid
imageName = '{}{:-%Y%m%d%H%M%S}.jpeg'.format(str(uuid.uuid4().hex), datetime.now())
1 Comment
In some other cases if you need the random file name to be sensible, use the faker module. This will produce "sensible" file names with common extension. This method might have name collision after some time. I think prepend with uuid is probably better.
pip install faker
Then,
from faker import Faker
fake = Faker()
for _ in range(10):
print(fake.file_name())
Link to faker documentation: https://faker.readthedocs.io/en/master/index.html
Comments
I found considerable benefit to the following solution.
I needed to store hundreds of thousands of images in a directory on a remote server. There was an issue with space on the server we were using and many of these images would often be duplicates of each other, so I found a handy solution is using the SHA of the image to generate the filename, and store the file using that name. If it overwrites a file, that means that file was a duplicate so it's okay that it gets overwritten. SHA256 is considered to be physically incapable of producing collisions, so you can use it forever without worry.
import hashlib
image_path = '/tmp/my_image.png'
with open(image_path, 'rb') as f:
image_bytes = f.read()
file_name = f'{hashlib.sha256(image_bytes).hexdigest()}.png'
with open(filename, 'wb') as ff:
ff.write(image_bytes)
This method is only useful if the following apply to your situation.
- You do not care about file creation date.
- You do not rely on duplicate files for your system to work.
- You do not need to search through the images using their names (that would suck)
This method will also be of little benefit to you if the files you are saving almost certainly will never produce the same SHA, but it can still be useful if you are saving several thousand files and need to guarantee that none of them overwrite(besides duplicates).
Comments
I personally prefer to have my text to not be only random/unique but beautiful as well, that's why I like the hashids lib, which generates nice looking random text from integers. Can installed through
pip install hashids
Snippet:
import hashids
hashids = hashids.Hashids(salt="this is my salt", )
print hashids.encode(1, 2, 3)
>>> laHquq
Short Description:
Hashids is a small open-source library that generates short, unique, non-sequential ids from numbers.
Comments
I am not sure why no-one mentioned this, but one can get the time down to microseconds, so, one solution is:
image_name = datetime.datetime.now().strftime('%Y_%m_%d_%H-%M-%S.%f')
image_name = image_name+".jpeg" #or whatever extension is needed
The output would look something like:
2023_03_21_11-08-08.734965.jpeg
This has the advantage that files can be ordered by time and/or date since the ordering would be the same.
Comments
You could use the random package:
import random
file = random.random()
