lingo.lol is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for linguists, philologists, and other lovers of languages.

Server stats:

82
active users

#forensiclinguistics

0 posts0 participants0 posts today

Free access for two weeks to our new Cambridge Element (I.e. mini book) “Decoding Terrorism: An interdisciplinary approach to a loan-actor case”, in which we analyze the extreme-right terror attack in Halle from various countries perspectives, including forensic linguistics ones:

cambridge.org/core/elements/ab

Cambridge CoreDecoding TerrorismCambridge Core - Law: General Interest - Decoding Terrorism

In this short blog post on R-bloggers (r-bloggers.com/2024/04/grammar) the amazing Valerio Gherardi gives a very concise yet comprehensive summary of our latest pre-print on the LambdaG method for authorship analysis
#forensiclinguistics #nlp #language #llm

R-bloggers · Grammar as a biometric for Authorship Verification | R-bloggersAbout a month ago we finally managed to drop (Nini et al. 2024), “Authorship Verification based on the Likelihood Ratio of Grammar Models”, on the arXiv. Delving into topics such as authorship verification, grammar and forensics, was quite a detour for me, and I’d like to summarize here some of the ideas and learnings I got from working with all this new and interesting material. The main qualitative idea put forward by Ref. (Nini et al. 2024) is that grammar is a fundamentally personal and unique trait of an individual, therefore providing a sort of “behavioural biometric”. One first goal of this work was to put this general principle under test, by applying it to the problem of Authorship Verification (AV): the process of validating whether a certain document was written by a claimed author. Concretely, we built an algorithm for AV that relies entirely on the grammatical features of the examined textual data, and compared it with the state-of-the-art methods for AV. The results were very encouraging. In fact, our method actually turned out to be generally superior to the previous state-of-the-art on the benchmarks we examined. This is a notable result, keeping also into account that our method uses less textual information (only the grammar part) than other methods to perform its inferences. The algorithm I sketch here a pseudo-implementation of our method in R. For the fit of \(k\)-gram models and perplexity computations, I use my package {kgrams}, which can be installed from CRAN. Model (hyper)parameters such as number of impostors, order of the \(k\)-gram models, etc. are hardcoded, see (Nini et al. 2024) for details. This is just for illustrating the essence of the method. For practical reasons, in the code chunk below I’m not reproducing the definition of the function extract_grammar(), which in our work is embodied by the POS-noise algorithm. This function should transform a regular sentence, such as “He wrote a sentence”, to its underlying grammatical structure, say “[Pronoun] [verb] a [noun]”. #' @param q_doc character. Text document whose authorship is questioned. #' @param auth_corpus character. Text corpus of claimed author. #' @param imp_corpus character. Text corpus of impostors. score

I’m extremely excited to announce the pre-print of our new paper: “Authorship Verification based on the Likelihood Ratio of Grammar Models”

arxiv.org/abs/2403.08462v1
with Oren Halvani, Lukas Graner, Valerio Gherardi and Shunichi Ishihara.

arXiv.orgAuthorship Verification based on the Likelihood Ratio of Grammar ModelsAuthorship Verification (AV) is the process of analyzing a set of documents to determine whether they were written by a specific author. This problem often arises in forensic scenarios, e.g., in cases where the documents in question constitute evidence for a crime. Existing state-of-the-art AV methods use computational solutions that are not supported by a plausible scientific explanation for their functioning and that are often difficult for analysts to interpret. To address this, we propose a method relying on calculating a quantity we call $λ_G$ (LambdaG): the ratio between the likelihood of a document given a model of the Grammar for the candidate author and the likelihood of the same document given a model of the Grammar for a reference population. These Grammar Models are estimated using $n$-gram language models that are trained solely on grammatical features. Despite not needing large amounts of data for training, LambdaG still outperforms other established AV methods with higher computational complexity, including a fine-tuned Siamese Transformer network. Our empirical evaluation based on four baseline methods applied to twelve datasets shows that LambdaG leads to better results in terms of both accuracy and AUC in eleven cases and in all twelve cases if considering only topic-agnostic methods. The algorithm is also highly robust to important variations in the genre of the reference population in many cross-genre comparisons. In addition to these properties, we demonstrate how LambdaG is easier to interpret than the current state-of-the-art. We argue that the advantage of LambdaG over other methods is due to fact that it is compatible with Cognitive Linguistic theories of language processing.

Corpus Linguistics has been a field that I’m very interested in lately as I find it very useful for conducting forensic linguistics studies. I still need a lot more training on the use of corpora, but here’s a great article on how corpus linguistics can be used for legal interpretation by Römer and Cunningham.
sciencedirect.com/science/arti

#linguistics #corpuslinguistics #forensiclinguistics @academicchatter @linguistics

Continuing with reading “Language and Online Identities” (hope I’ll have time to finish today):
interesting how the task of assuming victim’s identity seems to clash with intelligence gathering, requiring diff speech acts -> strings of interrogatives unusual for the vic

Came across this chapter on by Tim Grant - I loved the sophisticated theoretical discussion about the idea of idiolect and cognitive & stylistic frameworks!
Also, can I just say “4n6” is genius 😁 not going to spell the word any other way from now on

taylorfrancis.com/chapters/edi

Taylor & FrancisText messaging forensics | 43 | v2 | Txt 4n6: idiolect-free authorshipText messaging forensics - 2 - Txt 4n6: idiolect-free authorship analysis?