Automatically backing up your files on a web server and uploading them to Google Drive using a Python script

Creating an archive

To archive our directory, we will be using the bz2 algorithm since that gives us the best performance. Python has a module called tarfile that helps us archive files and directories.

First, we need to import it into our script.

import tarfile

Let’s now create a function that would accept a directory path as an argument and return the name of the backup archive created.

def archive(dir):

    print("Archiving directory %s."%dir)

    now=datetime.datetime.now().isoformat().replace(':','_').split(".")[0]

    fileName="backup_"+now+".tar.bz2"

    with tarfile.open(fileName,"w:bz2") as tar:

        tar.add(dir)

        print("Directory successfully archived. Archive name: %s."%fileName)

        return fileName

Since the backup file will have the created date and time in its name, we should also import the datetime module.

Here, we are storing the current date and time in a variable and then concatenating that to “backup_” to get the name of the archive file.

Then, using the with statement in Python, we are opening a tar file with bz2 compression (with the tar.bz2 extension) and adding the directory to that file. Once the archiving is successful, the name of the file is returned by the function.

Now, we have our backup file.

Creating a Google Drive Service

We need to create a Google Drive Service to upload our file to Google Drive. So, let’s create a function that would accept the path to the token file we downloaded from Google Project Console as an argument and return the created service object.

First, you need to install the Google API Python Client using pip.

$ pip install --upgrade google-api-python-client

Then, import the following modules into the script.

from google.oauth2 import service_account

from googleapiclient.discovery import build

def createDriveService(token):

    SCOPES=['https://www.googleapis.com/auth/drive']

    SERVICE_ACCOUNT_FILE=token

    credentials = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_FILE, scopes=SCOPES)

    return build('drive','v3',credentials=credentials)

Here, we specify the scope of our service as drive so that we can access our Google Drive. Then we create a credential object using the token file, and pass that credential object into the build method to get our service object. We can now access our Google Drive programmatically.

Removing old backups

As you may have already guessed, we are going to create another function that will accept the service object as an argument.

We will search our drive for archives beginning with the name “backup”. We will then store the id of these files as keys, and the names as values in a dictionary. If there is more than one file, we will create a list of names and sort the list in the ascending order. The last member of the list will be the latest file since the names of our files will all have their created date and time. Then, we will delete all the files except the latest one.

def clearPastBackups(service):

    page=None

    filesObj={}

    while True:

        files=service.files().list(q="mimeType='application/gzip' and name contains 'backup'",pageToken=page, fields="nextPageToken, files(id,name)").execute()

        for file in files.get('files',[]):

            filesObj[file.get('id')]=file.get('name')

        page=files.get('nextPageToken',None)

        if page is None:

            break

    if not(not filesObj or len(filesObj)<2):

        print("Two or more previous backups found.")

        latest=sorted(list(filesObj.values()))[len(filesObj)-1]

        for l in sorted(list(filesObj.values())):

            print(l)

        print ("Backup to be kept: %s."%latest)

        print("Deleting all but the latest backup...")

        for file in filesObj:

            if filesObj[file]!=latest:

                service.files().delete(fileId=file).execute()

                print("Backup named %s deleted."%filesObj[file])

We use service.files().list() to search for files in our drive. The mimeType application/gzip specifies that we are searching for archive files. If there are more files, then Google might split the list of files into pages and return only the first page. The while loop uses the page token to iterate over all the pages so that at the end of the loop we will have all the matching files.

The rest of the code, I believe, is self-explanatory.

Leave a Reply

placeholder="comment">