I'm training a custom model using a script in Amazon SageMaker and launching the job with the Python SDK. I want to pass some environment variables (like API keys or config flags) to the training job so they’re accessible inside the script via os.environ.
Here’s a simplified version of my code:
from sagemaker.estimator import Estimator
estimator = Estimator(
image_uri='123456789012.dkr.ecr.us-west-2.amazonaws.com/my-custom-image:latest',
role=role,
instance_count=1,
instance_type='ml.g5.xlarge',
entry_point='train.py',
source_dir='src',
environment={
'MY_API_KEY': 'abcdef123456',
'DEBUG_MODE': 'true'
}
)
In my training script, I try to read the variable:
import os
api_key = os.environ.get('MY_API_KEY')
print("API Key:", api_key)
Is this the correct way to pass environment variables to a SageMaker training job using the Python SDK? Are there any limitations or best practices I should be aware of, especially for sensitive information like API keys?
os.environ.get()is standard method used in Python - and it seems OK. Someone may say that only problem is that you can see it directly in system using linux commandenv. Other method is to keep keys in fileenvand use special module to read it - python-dotenv - this way you may have many projects with different keys. But if you send code to GitHub or backup then you may have to remeber to remove this file because someone could get your keys.