5

I have a folder with bunch of subfolders and files which I am fetching from a server and assigning to a variable. Folder Structure is as follows:


└── main_folder
   ├── folder
    │   ├── folder
    │   │   ├── folder
    │   │   │   └── a.json
    │   │   ├── folder
    │   │   │   ├── folder
    │   │   │   │   └── b.json
    │   │   │   ├── folder
    │   │   │   │   └── c.json
    │   │   │   └── folder
    │   │   │       └── d.json
    │   │   └── folder
    │   │       └── e.json
    │   ├── folder
    │   │   └── f.json
    │   └── folder
    │       └── i.json

Now I want to upload this main_folder to S3 bucket with the same structure using boto3. In boto3 there is no way to upload folder on s3.

I have seen the solution on this link but they fetching the files from local machine and I have fetching the data from server and assigining to variable.

Uploading a folder full of files to a specific folder in Amazon S3

upload a directory to s3 with boto

https://gist.github.com/feelinc/d1f541af4f31d09a2ec3

Has anybody faced the same type of issue?

4
  • Do you specifically want to code it yourself, or would you be willing to use the AWS Command-Line Interface (CLI)? It can do it with one command. Commented Jun 3, 2019 at 12:17
  • I want to do via code only @JohnRotenstein Commented Jun 3, 2019 at 12:23
  • It seems that you have data on "a server" and you want to put it in an Amazon S3 bucket. You could either run code on the "server" to send it to S3, or you could run code on another computer to retrieve it from the server and then upload it to S3. So, what precisely is your question? Can you tell us what problem you are facing? Commented Jun 3, 2019 at 21:05
  • Do you want something like stackoverflow.com/q/56428313/3220113 ? Commented Jun 5, 2019 at 14:12

3 Answers 3

7

Below is code that works for me, pure python3.

""" upload one directory from the current working directory to aws """
from pathlib import Path
import os
import glob
import boto3

def upload_dir(localDir, awsInitDir, bucketName, tag, prefix='/'):
    """
    from current working directory, upload a 'localDir' with all its subcontents (files and subdirectories...)
    to a aws bucket
    Parameters
    ----------
    localDir :   localDirectory to be uploaded, with respect to current working directory
    awsInitDir : prefix 'directory' in aws
    bucketName : bucket in aws
    tag :        tag to select files, like *png
                 NOTE: if you use tag it must be given like --tag '*txt', in some quotation marks... for argparse
    prefix :     to remove initial '/' from file names

    Returns
    -------
    None
    """
    s3 = boto3.resource('s3')
    cwd = str(Path.cwd())
    p = Path(os.path.join(Path.cwd(), localDir))
    mydirs = list(p.glob('**'))
    for mydir in mydirs:
        fileNames = glob.glob(os.path.join(mydir, tag))
        fileNames = [f for f in fileNames if not Path(f).is_dir()]
        rows = len(fileNames)
        for i, fileName in enumerate(fileNames):
            fileName = str(fileName).replace(cwd, '')
            if fileName.startswith(prefix):  # only modify the text if it starts with the prefix
                fileName = fileName.replace(prefix, "", 1) # remove one instance of prefix
            print(f"fileName {fileName}")

            awsPath = os.path.join(awsInitDir, str(fileName))
            s3.meta.client.upload_file(fileName, bucketName, awsPath)

if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument("--localDir", help="which dir to upload to aws")
    parser.add_argument("--bucketName", help="to which bucket to upload in aws")
    parser.add_argument("--awsInitDir", help="to which 'directory' in aws")
    parser.add_argument("--tag", help="some tag to select files, like *png", default='*')
    args = parser.parse_args()

    # cd whatever is above your dir, then run it
    # (below assuming this script is in ~/git/hu-libraries/netRoutines/uploadDir2Aws.py )
    # in the example below you have directory structure ~/Downloads/IO
    # you copy full directory of ~/Downloads/IO to aws bucket markus1 to 'directory' 2020/IO
    # NOTE: if you use tag it must be given like --tag '*txt', in some quotation marks...

    # cd ~/Downloads
    # python ~/git/hu-libraries/netRoutines/uploadDir2Aws.py --localDir IO --bucketName markus1 --awsInitDir 2020
    upload_dir(localDir=args.localDir, bucketName=args.bucketName,
               awsInitDir=args.awsInitDir, tag=args.tag)
Sign up to request clarification or add additional context in comments.

1 Comment

+9999 this was the quickest blessing of my life. Couple quick changes and it worked like a charm
0

I had to solve this problem myself, so thought I would include a snippet of my code here.

I also had the requirement to filter for specific file types, and upload the directory contents only (vs the directory itself).

import logging
import boto3

from pathlib import Path


log = logging.getLogger(__name__)


def upload_dir(
    self,
    local_dir: Union[str, Path],
    s3_path: str = "/",
    file_type: str = "",
    contents_only: bool = False,
) -> bool:
    """
    Upload the content of a local directory to a bucket path.

    Args:
        local_dir (Union[str, Path]): Directory to upload files from.
        s3_path (str, optional): The path within the bucket to upload to.
            If omitted, the bucket root is used.
        file_type (str, optional): Upload files with extension only, e.g. txt.
        contents_only (bool): Used to copy only the directory contents to the
            specified path, not the directory itself.

    Returns:
        dict: key:value pair of file_name:upload_status.
            upload_status True if uploaded, False if failed.
    """
    resource = boto3.resource(
        "s3",
        aws_access_key_id="xxx",
        aws_secret_access_key="xxx",
        endpoint_url="xxx",
        region_name=Bucket"xxx",
    )

    status_dict = {}

    local_dir_path = Path(local_dir).resolve()
    log.debug(f"Directory to upload: {local_dir_path}")

    all_subdirs = local_dir_path.glob("**")

    for dir_path in all_subdirs:

        log.debug(f"Searching for files in directory: {dir_path}")
        file_names = dir_path.glob(f"*{('.' + file_type) if file_type else ''}")

        # Only return valid files
        file_names = [f for f in file_names if f.is_file()]
        log.debug(f"Files found: {list(file_names)}")

        for _, file_name in enumerate(file_names):
            s3_key = str(Path(s3_path) / file_name.relative_to(
                local_dir_path if contents_only else local_dir_path.parent
            ))
            log.debug(f"S3 key to upload: {s3_key}")
            status_dict[str(file_name)] = self.upload_file(s3_key, file_name)

    return status_dict

Comments

0

Well, I like recursive code, and this is an easy case:

import os
import boto3
from botocore.exceptions import ClientError
from genericpath import isfile

BUCKET_NAME="my_bucket"

def upload_file(file_name, object_name=None):
  if object_name is None:
    object_name = os.path.basename(file_name)
  s3_client = boto3.client('s3')
  try:
    s3_client.upload_file(file_name, BUCKET_NAME, object_name)
  except ClientError as e:
    print(e)
    return False
  return True

def upload_dir_recursive(localDir, awsInitDir, space=""):
  print(space+"Processing dir: "+localDir)
  for file in os.listdir(localDir):
    file_path = localDir+file
    if file != "logs":
      if isfile(file_path):
        upload_file(file_path, awsInitDir+file)
      else:
        upload_dir_recursive(file_path+'/', awsInitDir+file+'/', space+"  ")
  print(space+"... Done")

if __name__ == '__main__':
  import argparse
  parser = argparse.ArgumentParser()
  parser.add_argument("--localDir", help="which dir to upload to aws")
  parser.add_argument("--awsInitDir", help="to which 'directory' in aws")
  args = parser.parse_args()
  upload_dir_recursive(localDir=args.localDir, awsInitDir=args.awsInitDir)

Note that the upload_file is taken straight from boto3 documentation. Compared with other answers, this solution will go down all subdirectories, no matter how nested. I did filter for specific file types, but it wouldn't be hard to do.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.