lingo.lol is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for linguists, philologists, and other lovers of languages.

Server stats:

53
active users

#sciop

2 posts2 participants1 post today
jonny (good kind)<p>if you are seeding anything on <a href="https://neuromatch.social/tags/sciop" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>sciop</span></a> (or anywhere else too) using qbittorrent (and probably other clients too), you should increase your max torrent size to something like 2GB - that's what's causing the recurring problem that many people have flagged to us where their torrents seem to disappear from their client after restarting:<br><a href="https://github.com/arvidn/libtorrent/issues/8012" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/arvidn/libtorrent/i</span><span class="invisible">ssues/8012</span></a></p><p>tools &gt; options &gt; advanced, set both torrent file size limit and bdecode token limit very high</p><p>v2 torrents are very very good for archives, but they are more rarely used in piracy, so there is comparatively less optimization pressure for them. so this explains why our seed stats are so spiky, because we encourage hybrid/v2, and by default any v2 torrent larger than a few dozen GB will just go <em>poof</em> on restart.</p><p>edit: this was actually fixed in qbt 5.1.2, so you can also just update</p>
jonny (good kind)<p>Last week trump announced plans to "review" 8 Smithsonian museums. Today he doubled down, very explicit about the intent to revise history to reflect the ethno-nationalist fantasy of US history.</p><p>You can do something about that! We are backing up the digital archives of those museums on sciop: <a href="https://sciop.net/tags/smithsonian" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">sciop.net/tags/smithsonian</span><span class="invisible"></span></a></p><p>You can take direct action to preserve the historical artifacts the right wants to destroy:</p><p>1) you can download a copy and <a href="https://sciop.net/docs/quickstart/#seed-anything" rel="nofollow noopener" target="_blank">seed it</a>, every seeder counts. Subscribe to the <a href="https://sciop.net/rss/tag/smithsonian.rss" rel="nofollow noopener" target="_blank">smithsonian RSS feed</a> to auto-download torrents as they are scraped.</p><p>2) we have also <a href="https://codeberg.org/Safeguarding/sciop-scraping" rel="nofollow noopener" target="_blank">written a crawler</a> connected to sciop that distributes the scraping work, and automatically creates and uploads a validated torrent that piggybacks off the s3 bucket as a webseed source while it lasts (instructions in reply). </p><p>The data from the 8 threatened museums is on the order of ~10 TB, and we have split it up by jpg/tif so people without much spare storage can join in on the jpg's at least. The full contents of the public smithsonian bucket is ~700TB, so if we want to have a full independent copy we'll need lots more seeders.</p><p>All this code is being written flat out, on the run, as it's needed by volunteers with exactly zero resources, so it's not polished or well documented, and if you're interested in helping damp the flames of the book burning by contributing to any of the code or docs, we'd love to have you.</p><p><a href="https://neuromatch.social/tags/Smithsonian" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Smithsonian</span></a> <a href="https://neuromatch.social/tags/Sciop" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Sciop</span></a></p>
Henrik Schönemann<p>The slides of my talk at <a href="https://fedihum.org/tags/WHY2025" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WHY2025</span></a> "Safeguarding Research &amp; Culture: Save public data from the digital bookburnings!" are now online:<br><a href="https://hu.berlin/SRC-WHY2025" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">hu.berlin/SRC-WHY2025</span><span class="invisible"></span></a></p><p>Recording here (27min):<br><a href="https://media.ccc.de/v/why2025-238-safeguarding-research-culture-save-public-data-from-the-digital-bookburnings" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">media.ccc.de/v/why2025-238-saf</span><span class="invisible">eguarding-research-culture-save-public-data-from-the-digital-bookburnings</span></a><br>(Wow, awesome work by <span class="h-card" translate="no"><a href="https://chaos.social/@c3voc" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>c3voc</span></a></span> 💜)</p><p>More context: <a href="https://program.why2025.org/why2025/talk/B8DANE/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">program.why2025.org/why2025/ta</span><span class="invisible">lk/B8DANE/</span></a></p><p><a href="https://fedihum.org/tags/SafeguardingResearch" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SafeguardingResearch</span></a> <a href="https://fedihum.org/tags/SciOp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SciOp</span></a> <span class="h-card" translate="no"><a href="https://fedihum.org/@SafeguardingResearch" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>SafeguardingResearch</span></a></span></p>
Safeguarding Research/Culture<p>Some of you may have seen the news re National <a href="https://fedihum.org/tags/Climate" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Climate</span></a> Assessment Reports<br><a href="https://www.cnn.com/2025/08/07/climate/wright-national-climate-assessments-updating" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">cnn.com/2025/08/07/climate/wri</span><span class="invisible">ght-national-climate-assessments-updating</span></a></p><p>A friendly reminder:<br>They are all accessible here in this archive from November <a href="https://globalchange.govarchive.us/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">globalchange.govarchive.us/</span><span class="invisible"></span></a></p><p>As well as on <a href="https://fedihum.org/tags/SciOp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SciOp</span></a> (from April)<br><a href="https://sciop.net/datasets/globalchange-gov-webrip/pdf" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">sciop.net/datasets/globalchang</span><span class="invisible">e-gov-webrip/pdf</span></a></p>
Teun 🌏 ❤️ 🏳️‍🌈 🇺🇦 🇵🇸<p>Digital archival projects are crucial in the fight against fascism. I wrote about the why and the how.</p><p>And if you're reading this, that means you have a computer, so you too can contribute! </p><p><a href="https://carefullmusings.bearblog.dev/the-urgency-of-digital-archiving/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">carefullmusings.bearblog.dev/t</span><span class="invisible">he-urgency-of-digital-archiving/</span></a></p><p><a href="https://kolektiva.social/tags/ArchiveTeam" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ArchiveTeam</span></a> <a href="https://kolektiva.social/tags/SciOp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SciOp</span></a> <a href="https://kolektiva.social/tags/fascism" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>fascism</span></a> <a href="https://kolektiva.social/tags/archive" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>archive</span></a> <a href="https://kolektiva.social/tags/resistance" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>resistance</span></a> <a href="https://kolektiva.social/tags/DigitalPreservation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DigitalPreservation</span></a></p>
ℒӱḏɩę :blahaj: 💾<p>I just downloaded every single <a href="https://tech.lgbt/tags/torrent" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>torrent</span></a> from <a href="https://tech.lgbt/tags/SciOp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SciOp</span></a> and am ensuring that all of them are now in the <a href="https://tech.lgbt/tags/antifa" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>antifa</span></a> torrent server.</p><p>btw does anyone want the complete 1.2 gigs of torrents from SciOp in one ZIP package? </p><p><a href="https://tech.lgbt/tags/resist" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>resist</span></a> <a href="https://tech.lgbt/tags/fascism" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>fascism</span></a> <a href="https://tech.lgbt/tags/datahoarder" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datahoarder</span></a> <a href="https://tech.lgbt/tags/digitalpreservation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>digitalpreservation</span></a> <a href="https://tech.lgbt/tags/archive" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>archive</span></a></p>

man i just had a series of extremely good ideas* that are very simple and very implementable for #sciop that i think will cause an absolutely disgusting amount of (good, intrinsically deduplicating, actually decrease server load by creating a supporting swarm of peers) public data scraping to happen and basically lower the barrier to scouting endangered datasets to zero

*if you received the message flood of me having them you are not allowed to tell people if they are actually bad

:crt_w_green_lines: Hackathon: Data Under Threat / Data Rescueing (Aug 7) in #München

The LMU Open Science Center (@lmu_osc) runs a hackathon to support the #SciOp #SafeguardingResearch initiative: Rescuing research data that is deleted by the Trump administration.

Bonus: @lavaeolus will give an ignition talk!

📅 Thursday, 2025-08-07, 16 – 19 (only in-person)
👉 Details and signup: github.com/lmu-osc/safeguar.de

Become a data rescuer by turning your own laptop into a Research Data Rescue Node, scraping at-risk data sets, and breathing new life into your old HDD as part of a global, decentralised network.

#LMUMünchen #OpenScience #OpenData #DataRescue
CC @SafeguardingResearch @bitsUndBaeumeAuxMuc

I revived an old HDD with a #RaspberryPi Zero W 2 for #DataRescue:ing:

It runs ...

(a) a Bittorrent client that seeds at-risk data sets from the #SciOp database
(b) the `sciop-scraper` script to get new datasets into the swarm

Setup instructions for the Pi Zero: codeberg.org/nicebread/HiveSee

Setup instructions for `sciop-scrape` (on macOS & RPi): codeberg.org/nicebread/HiveSee

Let me know if the instructions work for you; happy to collaborate on the manual.

#WasFehlt (?): Ein Torrent-Client für #SciOp und ähnliche Projekte, bei dem ich nicht festlegen muss, was ich seeden will, sondern der das automatisch entscheidet. Je nach dem, wo Kapazität gebraucht wird, und je nach dem, wie viel Speicherplatz ich ihm dafür zur Verfügung stelle.

@jonny With the updated commands I got it to run now (with minor modifications) on macOS. On RPi I will try again tomorrow (currently no access to the machine).

I am currently scraping „rp_enchanter_ver02“ with 24 GB and counting. Three questions:

(1) Can I know how large the download will be?
(2) Can I stop the scraping, or will the download then be corrupted?
(3) I assume that after downloading it automatically starts seeding?

Should we keep this conversation on (a) Mastodon, (b) safeguar.de forum or (c) Codeberg issues? Where can most people profit from it?

@jonny this entire thread is amazing, top-notch tool development for a noble cause.
@ #academia : if you feel desperate about the wholesale breakdown of science under the current US administration, consider helping out with #SciOp: Decentralized backups of datasets under threat, in a torrent swarm.

Have a disused laptop or Raspi? Make it part of the swarm and take the data outside the US (or any) administration's grasp!

@jonny Very cool. A couple months back, I resurrected an 8T NAS I'd slated for donation when I came across #sciop

So far I've been creating WARCs using zimit and #deluge for the torrent because the client/server is convenient for a headless unit.

Anyway, I'm giving this a try and it's grabbed 10G very quickly, which seems much faster than zimit. I'm not exactly sure how to turn this around into a torrent and get it up to Sci-Op, but I'll keep an eye on it and am happy to provide feedback.

check this out if you want to help preserve the archive of "most local newspapers through most of US history" that had its funding pulled, even if you only have a couple dozen gigabytes to spare, you can
a) make an account on sciop.net/ ,
b) run a qbittorrent instance, go to preferences>web ui and click enable,

and just do this

python -m pip install sciop-scraping
sciop-cli login
sciop-cli client login
sciop-scrape chronicling-america --next

and that's all.

if you have spare storage, you can sort by seeders, ascending, and start from there. or subscribe to the rss feed and auto-download it.

this is an archive funded by the library of congress (threatened) and the national endowment for the humanities (actively being eliminated). the alternative is that an enormous amount of US history that doesn't percolate into history books is owned and operated by lexisnexis and other for-profit data brokers.

this is the first run of some tooling to lower the bar for participatory scraping - at the moment, the archive is still online, and the scraper will automatically embed a webseed URL in the created torrent. so even if you don't have space to seed, you can scrape the data, upload the torrent, and make it possible for waiting peers to become mirrors

sciop.netChronicling America - Dataset - SciOpPreserving Public Information
Continued thread

Sciop is as easy to run as a bittorrent client. The idea will be to have it serve as a companion to a client, where we are going to implement a minor mutation of the FEP for mobile identity so you can mirror an identity from your personal client companion to any other instance that chooses to mirror yours. So this isn't like "come help our website" this is "get the fun parts of this website ready for when it's time to talk to other websites"