Boto3 – Amazon S3 As Python Object Store
Use Amazon Simple Storage Service(S3) as an object store to manage Python data structures.
1.Introduction
Amazon S3 is extensively used as a file storage system to store and share files across the internet. Amazon S3 can be used to store any type of objects, it is a simple key value store. It can be used to store objects created in any programming languages, such as Java, JavaScript, Python etc. AWS DynamoDB recommends to use S3 to store large items of size more than 400KB. This article focuses on using S3 as an object store using Python.
2. Pre-requisites
The Boto3 is the official AWS SDK to access AWS services using Python code. Please ensure Boto3 and awscli are installed in the system.
$pip install boto3
$pip install awscli
Also configure the AWS credentials using “aws configure” command or set up environmental variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY store your keys in the environment. Please DO NOT hard code your AWS Keys inside your Python program.
To configure aws credentials, first install awscli and then use “aws configure” command to setup. For more details refer AWS CLI Setup and Boto3 Credentials.
Configure the AWS credentials using command:
$aws configure
Do a quick check to ensure you can reach AWS.
$aws s3 ls
The above CLI must show the S3 buckets created in your AWS account. The AWS account will be selected based on the credentials configured. In case, multiple AWS accounts are configured, use the “–profile ” option in the AWS CLI. If you don’t mention “–profile ” option the CLI takes the profile “default”.
Use the below commands to configure development profile named “dev” and validate the settings.
$aws configure -profile dev $aws s3 ls --profile dev
The above command show s3 buckets present in the account which belongs to “dev” profile.
3. Connecting to S3
3.1 Connecting to Default Account (Profile)
The client() API connects to the specified service in AWS. The below code snippet connects to S3 using the default profile credentials and lists all the S3 buckets.
import boto3 s3 = boto3.client('s3') buckets = s3.list_buckets() for bucket in buckets['Buckets']: print bucket['CreationDate'].ctime(), bucket['Name']
3.2 Connecting to Specific Account (Profile)
To connect to a specific account, first create session using Session() API. The Session() API allows to mention the profile name and region. It also allows to specify the AWS credentials.
The below code snippet connects to an AWS account configured using “dev” profile and lists all the S3 buckets.
import boto3 session = boto3.Session(profile_name="dev", region_name="us-west-2") s3 = session.client('s3')buckets = s3.list_buckets() for bucket in buckets['Buckets']: print bucket['CreationDate'].ctime(), bucket['Name']
4. Storing and Retrieving a Python LIST
Boto3 supports put_object() and get_object() APIs to store and retrieve objects in S3. But the objects must be serialized before storing. The python pickle library supports serialization and deserialization of objects. Pickle is available by default in Python installation.
The APIs pickle.dumps() and pickle.loads() is used to serialize and deserialize Python objects.
4.1 Storing a List in S3 Bucket
Ensure serializing the Python object before writing into the S3 bucket. The list object must be stored using an unique “key”. If the key is already present, the list object will be overwritten.
import boto3 import pickle s3 = boto3.client('s3') myList=[1,2,3,4,5] #Serialize the object serializedListObject = pickle.dumps(myList) #Write to Bucket named 'mytestbucket' and #Store the list using key myList001 s3.put_object(Bucket='mytestbucket',Key='myList001',Body=serializedListObject)
The put_object() API may return a “NoSuchBucket” exception, if bucket does not exists in your account.
NOTE: Please modify bucket name to your S3 bucket name. I don’t won this bucket.
4.2 Retrieving a List from S3 Bucket
The list is stored as a stream object inside Body. It can be read using read() API of the get_object() returned value. It can throw an “NoSuchKey” exception, if the key is not present.
import boto3 import pickle #Connect to S3 s3 = boto3.client('s3') #Read the object stored in key 'myList001' object = s3.get_object(Bucket='mytestbucket',Key='myList001') serializedObject = object['Body'].read() #Deserialize the retrieved object myList = pickle.loads(serializedObject) print myList
5 Storing and Retrieving a Python Dictionary
Python dictionary objects can be stored and retrieved in the same way using put_object() and get_object() APIs.
5.1 Storing a Python Dictionary Object in S3
import boto3 import pickle #Connect to S3 default profile s3 = boto3.client('s3') myData = {'firstName':'Saravanan','lastName':'Subramanian','title':'Manager', 'empId':'007'} #Serialize the object serializedMyData = pickle.dumps(myData) #Write to S3 using unique key - EmpId007 s3.put_object(Bucket='mytestbucket',Key='EmpId007')
5.2 Retrieving Python Dictionary Object from S3 Bucket
Use the get_object() API to read the object. The data is stored as a stream inside the Body object. This can be read using read() API.
import boto3 s3 = boto3.client('s3') object = s3.get_object(Bucket='mytestbucket',Key='EmpId007') serializedObject = object['Body'].read() myData = pickle.loads(serializedObject) print myData
6 Working with JSON
When working with Python dictionary, it is recommended to store it as JSON, if the consumer applications are not written in Python or do not have support for Pickle library.
The api json.dumps() converts the Python Dictionary into JSON and json.loads() converts a JSON to a Python dictionary.
6.1 Storing a Python Dictionary Object As JSON in S3 bucket
import boto3 import json s3 = boto3.client('s3') myData = {'firstName':'Saravanan','lastName':'Subramanian','title':'Manager', 'empId':'007'} serializedMyData = json.dumps(myData) s3.put_object(Bucket='mytestbucket',Key='EmpId007')
6.2 Retrieving a JSON from S3 bucket
import boto3 import json s3 = boto3.client('s3') object = s3.get_object(Bucket='mytestbucket',Key='EmpId007') serializedObject = object['Body'].read() myData = json.loads(serializedObject) print myData
7 Upload and Download a Text File
Boto3 supports upload_file() and download_file() APIs to store and retrieve files to and from your local file system to S3. As per S3 standards, if the Key contains strings with “/” (forward slash) will be considered as sub folders.
7.1 Uploading a File
import boto3 s3 = boto3.client('s3') s3.upload_file(Bucket='mytestbucket', Key='subdir/abc.txt', Filename='./abc.txt')
7.2 download a File from S3 bucket
import boto3 s3 = boto3.clinet('s3') s3.download_file(Bucket='mytestbucket',Key='subdir/abc.txt',Filename='./abc.txt')
8 Error Handling
The Boto3 APIs can raise various exceptions depends on the condition. For example, “DataNotFoundError”,”NoSuchKey”,”HttpClientError“, “ConnectionError“,”SSLError” are few of them. The Boto3 exceptions inherit Python “Exception” class. So handle the exceptions by looking for Exceptions class in error and exception handling in the code.
import boto3 try: s3 = s3.client('s3') except Exceptions as e: print "Exception ",e
9.Summary
Storing python objects to an external store has many use cases. For example, a game developer can store intermediate state of objects and fetch them when the gamer resumes from where left, API developer can use S3 object store as a simple key value store are few to mention. Please refer the URLs in the Reference sections to learn more. Thanks.
References
[i] Boto3 – https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
[ii] Boto3 S3 API – https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html
[iii] AWS CLI – https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html
[iv] AWS Boto3 Credentials https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
[v]Python 2.7 Pickle Library – https://docs.python.org/3/library/pickle.html
[vi] Boto3 Exceptions https://github.com/boto/botocore/blob/develop/botocore/exceptions.py
Published on Web Code Geeks with permission by Saravanan Subramanian, partner at our WCG program. See the original article here: Boto3 – Amazon S3 As Python Object Store Opinions expressed by Web Code Geeks contributors are their own. |