Guide to setting up a Virtual Machine using Google Cloud for Applied Economics


Why do I need my own virtual machine? Pros and Cons.
Setting up a Google Cloud Platform (GCP) account
Setting up a Virtual Machine Part 1: Creating the VM
Setting up a Virtual Machine Part 2: Adding relevant programmes
Setting up a Virtual Machine Part 3: Attaching a data disk & transferring data
Setting up a Virtual Machine Part 4: Accessing RStudio via Web Browser
Setting up a Virtual Machine Part 5: Accessing RStudio via VSCode
Setting up a Virtual Machine Part 6: Working in Teams
Coding in R for big data 1: You must use data.table!
Coding in R for big data 2: When to use databases like SQL and when not to
Coding in R for big data 3: Tips and tricks for parallel data processing
A discussion of total costs, and to minimise them
A discussion of other great Google resources and APIs

My guide for academics documenting how to set up a virtual machine capable of running RStudio in the cloud using Google Cloud Platform (GCP)

This guide documents my experience setting up and using a virtual machine on the Google Cloud Platform (GCP). Similar setups can be achieved with AWS and Microsoft Azure, though I have no experience with these platforms. This has been an invaluable tool for me since the beginning of my PhD, and has drastically increased my productivity across many projects.

This guide mixes my own personal experience and tips with some more standard instructions on how to set up your own virtual work environment. It is written with a view to aid applied economics researchers. I have borrowed from many sources, and tried my best to correctly attribute these.

This guide is still in development, if you have any questions or suggestions I would love to hear them, so please email me at

Why do I need my own virtual machine? A Pros and Cons

There are at least three reasons to setup a virtual machine: (1) Flexibility, (2) Scalability, (3) Productivity. Reasons not to setup are the technical barriers to entry (though this guide aims to offset this!) and cost. On cost, I would argue that the relevant counterfactual would be purchasing a mid-range laptop (32GB of memory or higher) which will depreciate, puts an upper bound on the power available to you, and requires liquidity to purchase.

(1) Flexibility

While it is very common for universities to have their own computational resources for working with big data, I have found that having my own server affords a great deal more flexibility. The most obvious advantage is that I can turn the machine on and off as needed, and do not have to submit jobs to a schedule nor compete for resources with others. The ability to work in real-time drastically increases productivity and avoids lengthy wait times which (for me) disrupt my work flow substantially. Another way in which a virtual machine offers flexibility is in costs. Apart from the cost of data storage, you will not pay for any computational resources when not using the machine (unlike personal hardware). And if a project is going to be dormant for a while, you can download the data to a personal hard disk and shudder the virtual machine completely – at which point your costs become 0.

(2) Scalability

The best feature of the Google Cloud