lingo.lol is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for linguists, philologists, and other lovers of languages.

Server stats:

64
active users

#datasets

0 posts0 participants0 posts today

Howard University: Howard University and Google Research Enhance A.I. Speech Recognition of African American English. “Researchers collected 600 hours of data from users of different [African American English] dialects in an effort to address implicit barriers to improving [automatic speech recognition] performance. Thirty-two states are represented in the dataset.”

https://rbfirehose.com/2025/06/26/howard-university-howard-university-and-google-research-enhance-a-i-speech-recognition-of-african-american-english/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · Howard University: Howard University and Google Research Enhance A.I. Speech Recognition of African American English | ResearchBuzz: Firehose
More from ResearchBuzz: Firehose

Call For Manuscript Submissions - Real-Time GIS For Disaster Management
--
nature.com/collections/bjdhbfi <-- shared link to submission details
--
[note that I have NO affiliation with this journal, the guest editors, etc]
[I wonder if anybody from FEMA has compiled use case / effectiveness / robustness on/of the #WaffleHouseIndex in the southern USA, especially related to hurricanes?]
#GIS #paper #mapping #spatial #manuscripts #callforpapers #callformanuscripts #submissions #callforsubmissions #realtime #disaster #management #mitigation #prevention #preparedness #response #recovery #risk #hazard #naturalhazard #naturalhazard #emergency #remotesensing #earthobservation #satellite #drone #sensor #socialmedia #WaffleHouseIndex #datasets #AI #InternetOfThings #research #monitoring #evacuation #planning #resourceallocation #hazardmapping #realworld #global

Ready to supercharge your #OpenScience profile?

With #OpenAIREEXPLORE + @ORCID_Org you can seamlessly complete your #ORCID record with all your research outputs, from papers & #datasets to #software tools.

Backed by the @OpenAIREGraph EXPLORE identifies and matches your work, including:

Journal articles
Research data
Software & more

Read the article to learn more openaire.eu/openaire-explore-a

Visit explore.openaire.eu to make your contributions count publicly and properly.

Ready to supercharge your #OpenScience profile?

With #OpenAIREEXPLORE + @ORCID_Org , you can seamlessly complete your #ORCID record with all your research outputs, from papers & #datasets to #software tools.

Backed by the @OpenAIREGraph, EXPLORE identifies and matches your work, including:

-Journal articles
-Research data
-Software & more

Log in with your ORCID → check what’s missing → sync it to your profile in just a few clicks.

Read the article: explore.openaire.eu

Massive, Unarchivable #Datasets of #Cancer, #Covid, #HIV and #Alzheimer's Research Could Be Lost Forever
Days before RFK announced 10,000 #HHS staffers would lose their jobs, a message appeared on #NIH research repository sites saying they were "under review." Unlike other government datasets or web pages, downloading or otherwise archiving NIH data often requires a Data Use Agreement between a researcher institution and the agency.
404media.co/nih-archives-repos
archive.ph/Y8asq

404 Media · Massive, Unarchivable Datasets of Cancer, Covid, and Alzheimer's Research Could Be Lost ForeverDays before Robert F. Kennedy Jr. announced that 10,000 HHS staffers would lose their jobs, a message appeared on NIH research repository sites saying they were "under review."

#ListenBrainz / #MetaBrainz I'm confused. Aren't sponsors the true customer? Why use this? 🤔

On one hand #Music: "Listen together", "Ethical forever"

On the other: #DATASETS

"Some of the world’s biggest platforms such as Google and Amazon, use our data"

"We ask commercial supporters to support us in order to help fund the creation and maintenance of these datasets."

"The following organizations make use of the data-sets published by MetaBrainz"

"Unicorn tier: #Google, #Amazon, #Spotify"

STAT: Gold-standard maternal mortality database in limbo as CDC staff placed on leave. “As part of the sweeping layoffs that rocked the Department of Health and Human Services on Tuesday, the entire staff that oversaw an annual survey to better understand infant and maternal health — and that was considered the gold standard in the field — was placed on administrative leave. The Pregnancy […]

https://rbfirehose.com/2025/04/02/stat-gold-standard-maternal-mortality-database-in-limbo-as-cdc-staff-placed-on-leave/

Clemson News: Study: Researchers’ choices could result in different conclusions from the same data . “If you give hundreds of researchers the same data and the same hypotheses to test, they will reach the same conclusions, right? Wrong, according to a recent study published in the journal BMC Biology. Two hundred forty-six researchers in the fields of ecology and evolutionary biology — […]

https://rbfirehose.com/2025/04/01/study-researchers-choices-could-result-in-different-conclusions-from-the-same-data-clemson-news/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · Study: Researchers’ choices could result in different conclusions from the same data (Clemson News) | ResearchBuzz: Firehose
More from ResearchBuzz: Firehose

From handling massive #DataSets to streamlining delivery, UC Berkeley #Library is ensuring that #ResearchData is well-managed, accessible, and compliant with licensing agreements through #Dataverse, so resources are discoverable and usable by the entire university community. #RDM #DataManagement youtu.be/XVBUna3wzgk?si=c_Ixa-

This data may vanish under Trump, so we charted it
Some of most valuable #datasets in human history vanished from #US #government websites, felt like watching Library of Alexandria go up in smoke
Many have gone on record describing #Census Bureau’s #American Community Survey as wonder of modern world
Another loss? #HouseholdPulse survey, online survey that provided week-by-week data on income losses, economic struggles and precarious mental health
washingtonpost.com/business/20
archive.ph/mB512

The Washington Post · This data may vanish under Trump, so we charted itBy Andrew Van Dam

"On Friday, numerous essential #datasets were #purged from federal agency websites, including #data from #CDC PLACES (Population Level Analysis and Community Estimates), the Social Vulnerability Index (SVI), and the Climate and Economic Justice Screening Tool (CEJST)—to name just a few. While we don’t know when or if this data will return, we want to assure you that they are still accessible on our platform." policymap.com/blog/purged-fede #PolicyMap #PublicHealth #USPol #Project2025 #CivilRights

PolicyMap · Purged Federal Agency Data Available on PolicyMapOn Friday, numerous essential datasets were purged from federal agency websites, including data from CDC PLACES (Population Level Analysis and Community Estimates), the Social Vulnerability Index (SVI), and the Climate...

PLOS Biology: Linking citation and retraction data reveals the demographics of scientific retractions among highly cited authors. “Retractions are becoming increasingly common but still account for a small minority of published papers. It would be useful to generate databases where the presence of retractions can be linked to impact metrics of each scientist. We have thus incorporated […]

https://rbfirehose.com/2025/02/02/plos-biology-linking-citation-and-retraction-data-reveals-the-demographics-of-scientific-retractions-among-highly-cited-authors/

Federation of American Scientists: Kickstarting Collaborative, AI-Ready Datasets in the Life Sciences with Government-funded Projects . “In the age of Artificial Intelligence (AI), large high-quality datasets are needed to move the field of life science forward. However, the research community lacks strategies to incentivize collaboration on high-quality data acquisition and sharing. The […]

https://rbfirehose.com/2025/01/03/federation-of-american-scientists-kickstarting-collaborative-ai-ready-datasets-in-the-life-sciences-with-government-funded-projects/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · Federation of American Scientists: Kickstarting Collaborative, AI-Ready Datasets in the Life Sciences with Government-funded Projects | ResearchBuzz: Firehose
More from ResearchBuzz: Firehose

University of Manitoba: FAUM researchers release new open data sets for data-driven urban design. “The Future Elements Studio, led by Dr. Yuhao Lu at the Faculty of Architecture, has released two new open data sets as part of its ongoing efforts to advance urban digital twin models for Canadian communities. Beginning with the City of Winnipeg, these datasets serve as data-rich and […]

https://rbfirehose.com/2024/12/19/university-of-manitoba-faum-researchers-release-new-open-data-sets-for-data-driven-urban-design/

For The Love Of The Web. Posting Publicly Is Going To Get Used In Some Way

Sam Cole over at 404 Media wrote an article about a Hugging Face Machine Learning Librarian making a public data set of 1 million Bluesky posts available to everyone for Machine Learning.

People were of course outraged. Afterall it’s the Internet. People thrive on being outraged, pissed off, and otherwise salty.

What people seem to miss is that what they’re posting on Bluesky is public and scrapable.

The way this guy made the data set was a bit sloppy and , in my opinion, irresponsible. He didn’t anonymize the data and left personal identifiable information in the data set. He also didn’t get consent from people first.

Yea, I agree it feels a bit icky that this was done, mostly without consent or anonymizing the data. But for the love of the Web, what you put online publicly is — PUBLIC. People will see it and possibly use it for whatever they want. How hard is this to grasp?

This collection, according to Sam’s article, is also in a legal gray area right now and is going through the courts around the world.

To give some credit to the librarian, he down the data set after getting quite a bit of “feedback.” 😵‍💫😜

But that didn’t stop the trolls from making even bigger data sets and putting the out online.

I really do in fact understand why people are upset, but those posts are public. Don’t post stuff and expect it to be private when it’s PUBLIC!

Honestly, I’m fine with my content that I post publicly be used to train LLMs and AI, because it will improve the technology that I benefit from.

I agree with Rand Fishkin, the founder of Moz and Sparktoro.

He posted on Bluesky:

I know others are probably upset about this, but LLM training is, for me, a benefit of participating in spaces like this. I *want* my word usage, brands, and content to be part of how AI answers questions in the future. Just like I wanted Google to index my websites.

— Rand Fishkin (@randfish.bsky.social) December 8, 2024 at 4:06 PM

I don’t think that’s crazy desire. Right? Am I completely off-base? What do you think?

#AI#Bluesky#Data

University of Cambridge: New datasets will train AI models to think like scientists. “What can exploding stars teach us about how blood flows through an artery? Or swimming bacteria about how the ocean’s layers mix? A collaboration of researchers, including from the University of Cambridge, has reached a milestone toward training artificial intelligence models to find and use transferable […]

https://rbfirehose.com/2024/12/08/university-of-cambridge-new-datasets-will-train-ai-models-to-think-like-scientists/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · University of Cambridge: New datasets will train AI models to think like scientists | ResearchBuzz: Firehose
More from ResearchBuzz: Firehose