Head·word /ˈhedˌwɜː(ɹ)d/ n. @headword

Recent searches

Search options

Not available on lingo.lol.

**Ártemis López** @queerterpreter · Nov 29, 2022

Nov 29, 2022

#Corpus people: an engineer/programmer friend has a project coming up for a class, and they asked if there’s anything they could help with for my diss since they have no good project ideas. They could program something to help with my corpus stuff, turn that in, maybe get an article with me out of it.

Any… any good ideas on what could be generally useful for (ES) #corpora? I can try and come up with something just for me, but it’d be cool if it’s useful for the field at large too.

Ártemis López @queerterpreter@lingo.lol

The project is apparently specifically on large data management, so corpus seems like a great topic, but… what do we do. Help.

Nov 29, 2022, 11:21 AM··Metatext

0boosts·2favorites

**Dr. Angus Andrea Grieve-Smith** @grvsmth · Nov 29, 2022

Nov 29, 2022

Dr. Angus Andrea Grieve-Smith @grvsmth

@queerterpreter A universal annotation converter?

**Ártemis López** @queerterpreter · Nov 29, 2022

Nov 29, 2022

Ártemis López @queerterpreter

@grvsmth I've only used one program, so I'm not super sure of how much conversion is needed to switch from one to another. Wouldn't it be a fairly straight-forward find-and-replace RegEx? Like how translation tools will sometimes have <1>, or {1}, or a couple of things like that.

**Dr. Angus Andrea Grieve-Smith** @grvsmth · Nov 29, 2022

Nov 29, 2022

Dr. Angus Andrea Grieve-Smith @grvsmth

@queerterpreter Even if it were, something that collects all the regexes for each system would be helpful!

But it's actually more complex. There are inline and offset annotation systems, for one thing!

**Ártemis López** @queerterpreter · Nov 29, 2022

Nov 29, 2022

Ártemis López @queerterpreter

@grvsmth Hmmm, this could be interesting to her! Do you know if there's a good starting point I could direct her (it's HER final project, after all) to start looking into the different annotations out there?

**Dr. Angus Andrea Grieve-Smith** @grvsmth · Nov 29, 2022

Nov 29, 2022

Dr. Angus Andrea Grieve-Smith @grvsmth

@queerterpreter ahahahaha! Yes, I know people have compiled a list of annotation systems. Several lists, in fact! I've probably bookmarked some of them on my other computer, and there are some in my email inbox. Maybe there's a list of lists of annotation systems. This Wikipedia article is probably a good place to start!

https://en.wikipedia.org/wiki/Text_annotation

en.wikipedia.orgText annotation - Wikipedia

**reviewer 2** @thedansimonson · Nov 29, 2022

Nov 29, 2022

reviewer 2 @thedansimonson

@queerterpreter I’ve found generally that it’s hard to incorporate corpora ad hoc to an existing linguistics project.

Whether you’re annotating data or just looking for n-gram counts, both of those are relatively labor-intensive and are going to shape the direction of your project pretty dramatically from the start.

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back