20

The data_files parameter for setup takes input in the following format:

setup(...
    data_files = [(target_directory, [list of files to be put there])]
    ....)

Is there a way for me to specify an entire directory of data instead, so I don't have to name each file individually and update it as I change implementation in my project?

I attempted to use os.listdir(), but I don't know how to do that with relative paths, I couldn't use os.getcwd() or os.realpath(__file__) since those don't point to my repository root correctly.

6 Answers 6

23

karelv has the right idea, but to answer the stated question more directly:

from glob import glob

setup(
    #...
    data_files = [
        ('target_directory_1', glob('source_dir/*')), # files in source_dir only - not recursive
        ('target_directory_2', glob('nested_source_dir/**/*', recursive=True)), # includes sub-folders - recursive
        # etc...
    ],
    #...
)
Sign up to request clarification or add additional context in comments.

3 Comments

the first glob('source_dir/**') should be with recursive=True or it just shows top level docs.python.org/3/library/glob.html#glob.glob
A good solution. Was searching elsewhere to get this data directory done searching directory pattern. But, my data directory looks like 'a/file1.yaml', 'a/b/file2.yaml'. to achieve the recursive pattern, have to do 2 globs glob('data/*.yaml'), glob('data/*/*.yaml') I wish if a single glob pattern can do for parent and sub-directory file scans.
use iglob instead of glob to maintain the directory structure
4
import glob

for filename in glob.iglob('inner_dir/**/*', recursive=True):
    print (filename)

Doing this, you get directly a list of files relative to current directory.

Comments

3

Based on @Paul Mundt's answer,which I had to rewrite, this is what worked for me:

# Iterate through all the files and subdirectories
# to build the data files array
def generate_data_files(share_path, dir):
    data_files = []
    
    for path, _, files in os.walk(dir):
        list_entry = (share_path + path, [os.path.join(path, f) for f in files if not f.startswith('.')])
        data_files.append(list_entry)

    return data_files

Example usage:

data_files=[
        ('share/' + package_name + '/maps', glob('maps/*')),
        ('share/' + package_name + '/world', glob('world/*'))
    ] + generate_data_files('share/' + package_name + '/', 'meshes'),

This will add the meshes folder under share with all the files and subfolders recursively. enter image description here

Comments

2

I ran into the same problem with directories containing nested subdirectories. The glob solutions didn't work, as they would include the directory in the list, which setup would blow up on, and in cases where I excluded matched directories, it still dumped them all in the same directory, which is not what I wanted, either. I ended up just falling back on os.walk():

def generate_data_files():
    data_files = []
    data_dirs = ('data', 'plugins')
    for path, dirs, files in chain.from_iterable(os.walk(data_dir) for data_dir in data_dirs):
        install_dir = os.path.join(sys.prefix, 'share/<my-app>/' + path)    
        list_entry = (install_dir, [os.path.join(path, f) for f in files if not f.startswith('.')])
        data_files.append(list_entry)

    return data_files

and then setting data_files=generate_data_files() in the setup() block.

Comments

1

I don't know how to do that with relative paths

You need to get the path of the directory first, so...

Say you have this directory structure:

cur_directory
|- setup.py
|- inner_dir
   |- file2.py

To get the directory of the current file (in this case setup.py), use this:

cur_directory_path = os.path.abspath(os.path.dirname(__file__))

Then, to get a directory path relative to current_directory, just join some other directories, eg:

inner_dir_path = os.path.join(cur_directory_path, 'inner_dir')

If you want to move up a directory, just use "..", for example:

parent_dir_path = os.path.join(current_directory, '..')

Once you have that path, you can do os.listdir

For completeness:

If you want the path of a file, in this case "file2.py" relative to setup.py, you could do:

file2_path = os.path.join(cur_directory_path, 'inner_dir', 'file2.py') 

Comments

-1

With nested subdirectories, if you want to preserve the original directory structure, you can use os.walk(), as proposed in another answer.

However, an easier solution uses pbr library, which extends setuptools. See here for documentation on how to use it to install an entire directory structure:

https://docs.openstack.org/pbr/latest/user/using.html#files

1 Comment

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From Review

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.