embroidered toolbox

Tutorial: Easy Script to Download Google Sheets

I often do content analysis with colleagues using Google Sheets—when it’s time to download the spreadsheet to my computer, I use a script instead of clicking through menus. (Don’t want a tutorial? Code is here.)

Why bother scripting? Three reasons:

  • Saving Time: I’m probably going to need to do it more than once (many times during the project, at the end of the project, after feedback, time-to-submit-is-anything-wrong-freakout…is this last one just me?).
  • Preventing Errors: A script guarantees me that I’ll do it the same way every time, naming it the same way (not the dreaded FinalData(5)).
  • Preventing Confusion: Also known as ‘kindness to future me’. Research work can have a strange lifecycle—with review timelines putting a project into a quiet phase for perhaps many months at a time, shiny other projects cropping up, wandering down blind alleys that don’t pan out, and life being wild in general, I may not instantly remember what and where everything is if it’s not written down. When I’m documenting how I did a project, it’s a lot easier to say ‘run fetcher.py to get data’ than to explain all the steps and hope I bothered to write complete notes to myself when I first did the work.

Enough justifying—here’s the how to!

What I assume:

  • You have python3 installed
  • You have a Google Sheet you want to download in the happy ways described above, and your data can safely be made “viewable by anyone with the link”. This strategy is not suitable for data with privacy concerns!
  • You can created and edit python scripts on your computer

You can either build the script yourself from the tutorial below, or you can grab it from gitlab and customize it. This script is heavily annotated with ## comments to walk you through it. Feedback welcome!

Step 1: Startup

Create a python script (call it something like fetchgsheet.py) with these lines at the top:

!/usr/bin/env python3 ## this line tells your system to use its default installation of python3
import os.path ## these lines bring in helpful libraries we use below
import requests

Step 2: Customize

The next step is to give the script the details of your spreadsheet. Share your Google Sheet to ‘anyone with the URL’ and get the value for sheet_key located toward the end of the sharing URL (it’s between but doesn’t include the slashes, and it’s just before ‘edit?usp=sharing’). Walk through your tabs and get the gid for each tab. The gid will appear at the end of the URL as you change which one you are using. See screenshots for a visual reference. Insert the sheet_key and gid into the code below as described in the ##comments.


##### Edit the below to match your situation
dataset_dir = 'data/'  ## this is where the files will save
sheet_key = 'someVeryLongAlpha-NumericStringGoesHere4242'
## sheet_key is revealed when you share a sheet to allow for access without a password 
## (is that ok for your data?)


                                     
sheets = { 'alpha' : 123456789, ## which tabs of the spreadsheet do you want?
        'beta' : 234567891, ## the sheets variable is a simple dictionary
        'omega' : 345678912  ## enter them here with a shortname in quotes 
        }                        ## then a colon : and the numeric gid for that sheet, 
                                 ## comma after each pair. Only one? No comma.

Step 3: Finish up

That’s all the customization you will need to do in most cases—you can treat the rest of this as a template and copy-paste.

#### May not need to edit below here

url_base = 'https://docs.google.com/spreadsheets/d/{id}/export'.format(id=sheet_key) 
http_params = { 'format' : 'csv',   ## if you wanted something other than CSV, this is the place to change it
                 'id' : sheet_key }
table_names = sheets.keys() ##the table_names are just the part in 'quotes' from the sheets 

for table in table_names:
    print(f"Now downloading {table}....")
    http_params['gid'] = sheets[table]
    req = requests.get(url_base, params=http_params) ## <== the actual fetching is just this bit!
    output_file = open(os.path.join(dataset_dir, "%s_dataset.csv" % table), 'w') ## filenames are the table_names
    print(req.text, file=output_file)                                            ## NOTE: Will overwrite the old! 
    output_file.close()

    print("INFO: Successfully downloaded table: %s" % table)                                  

Step 4: Ta da!

You can now run this script any time you need a fresh copy of your data from Google Sheets.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s