@queerterpreter I’ve found generally that it’s hard to incorporate corpora ad hoc to an existing linguistics project.
Whether you’re annotating data or just looking for n-gram counts, both of those are relatively labor-intensive and are going to shape the direction of your project pretty dramatically from the start.