lingo.lol is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for linguists, philologists, and other lovers of languages.

Server stats:

59
active users

#trainingdata

0 posts0 participants0 posts today

'The New York Times' takes OpenAI to court. ChatGPT's future could be on the line

A group of news organizations, led by The New York Times, took ChatGPT maker OpenAI to federal court on Tuesday in a hearing that could determine whether the tech company has to face the publishers in a high-profile copyright infringement trial.

#NYT #media #copyright #legal #ChatGPT #OpenAI #artificialintellilgence #AI #LLM #data #TrainingData #data #technololgy #tech

npr.org/2025/01/14/nx-s1-52589

We've made #Swedish language training data for development of #HTR models available for download, riksarkivet.se/psidata/traning

This data, together with data from other archives whose training data is not for us to publish, is the basis for our HTR-model Swedish Lion Libre, huggingface.co/collections/Rik

If you do use the training data, the model or, even better, you have ground-truth data you'd like to share, just get in touch!

RiksarkivetTräningsdata för HTR-modeller - RiksarkivetDatasetet innehåller noggrant och manuellt avskrivna och uppdelade texter från arkivhandlingar på Riksarkivet.

Dialogue from 53,000 movies and 85,000 TV episodes is included in an AI-training data set that has been used by Apple, Anthropic, Meta, Nvidia, Salesforce, Bloomberg, and other companies.

It includes writing from every film nominated for Best Picture from 1950 to 2016 and at least 616 episodes of The Simpsons.

#OpenSubtitles #hollywood #TV #movies #copyright #ArtificialIntelligence #AI #LLM #TrainingData #data #bigdata #technology #tech

theatlantic.com/technology/arc

The Atlantic · The Hollywood AI DatabaseBy Alex Reisner

Meta fed its AI on almost everything you’ve posted publicly since 2007

Meta acknowledged all text and photos that adult #Facebook and #Instagram users publicly published since 2007 have been fed into its artificial intelligence models.

"Meta has just decided to scrape all of the photos and all of the texts from every public post on Instagram or Facebook since 2007"

#privacy #socialmedia #meta #data #bigdata #trainingdata #ArtificialIntelligence #AI #technology #tech

theverge.com/2024/9/12/2424278

The Verge · Meta fed its AI on everything adults have publically posted since 2007By Jess Weatherbed

Automated Plagiarism for BS (not me, the other BS, Harry Frankfurt's) :
> Robots.txt is a single bit of code that's been used since the late 1990s as a way for websites to tell bot crawlers they don't want their data scraped and collected. It was widely accepted as one of the unofficial rules supporting the web...
> The world's top two #AIStartups are ignoring requests by media publishers to stop scraping their web content for free model #TrainingData,
businessinsider.com/openai-ant
#AiSalami #LLM

Insider · OpenAI, Anthropic ignore rule that prevents bots scraping web contentBy Kali Hays

⚠️ Resistance Is Futile - Maven to Assimilate All Mastodon Posts - Even Private Posts!

I just received notice of this Maven activity from a friend. I'll be deleting all of my private posts on Mastodon in case more groups are performing similar activities in the background.

Maven is being funded by OpenAI's Sam Altman 👎

wedistribute.org/2024/06/maven

#Maven #OpenAI #Fediverse #ActivityPub #Mastodon #Toots #Posts #Privacy #Invasion #AI #GAI #TrainingData #Copyright #Infringement

We Distribute · Maven Imported 1.12 Million Fediverse Posts (Updated)A social network founded by a former OpenAI employee was caught importing public posts from Mastodon...and ran AI analysis to add tags to them.