I am currently working on three public resources, which will be available soon.
Applied Economics in the Cloud
A guide for doing data work in R using cloud computing resources. This covers set up a virtual machine, setting up R and RStudio Server, and working with the cloud in R using VSCode. I discuss the pros and cons of these coding environments relative to working on your local machine.
Splitting up Job Ad Text
Longer documents contain richer information, but sometimes this length will make analysis of text imprecise, or harder to interpret. To overcome this, I created a function which can split documents into smaller pieces quickly, while preserving the important structure. Crucially, this tool works quickly and is “dumb” in that it doesn’t use document specific information, but rather just the document structure. I apply this to pre-process hundreds of millions of job ads into billions of smaller documents.
Cleaning Balance Sheet data from ORBIS
My code for cleaning ORBIS balance sheet data. It covers mundane but very important things, like de-duplication, collapsing ownership structure to avoid double counting, and more.