lingo.lol is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for linguists, philologists, and other lovers of languages.

Server stats:

56
active users

#ocr

2 posts2 participants0 posts today
your auntifa liza 🇵🇷 🦛 🦦<p class="quote-inline">RE: <a href="https://mastodon.social/@SouthDakota/115520508392114834" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">mastodon.social/@SouthDakota/1</span><span class="invisible">15520508392114834</span></a></p><p>okeydockey, since we now have embedded toots here’s an <a href="https://mastodon.social/tags/AltTxt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AltTxt</span></a> <a href="https://mastodon.social/tags/yaddaYadda" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>yaddaYadda</span></a> because it seems <a href="https://mastodon.social/tags/OCR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OCR</span></a> doesn’t exist in South Dakota… </p><p>AltTxt: there is a masked man with a green face-features-altering mask and a backwards baseball cap inside a car. he is pointing a gun at the photographer, like in a drive-by shooting. ironically, the masked guy looks ambiguously brown-skinned. the caption: "Your tax dollars at work in Chicago yesterday."</p><p>(implying the gang banger is working for Trump’s <a href="https://mastodon.social/tags/DHS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DHS</span></a> <a href="https://mastodon.social/tags/ICE" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ICE</span></a>)</p>
Oliver Ammann<p>Is there a viable <a href="https://swiss.social/tags/OCR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OCR</span></a> solution for <a href="https://swiss.social/tags/ancientGreek" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ancientGreek</span></a> text in 16th century prints? Ideas? Experiences?</p><p><a href="https://swiss.social/tags/rarebooks" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>rarebooks</span></a> <a href="https://swiss.social/tags/greek" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>greek</span></a> <a href="https://swiss.social/tags/tesseract" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tesseract</span></a> <a href="https://swiss.social/tags/transkribus" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>transkribus</span></a> <a href="https://swiss.social/tags/16thcentury" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>16thcentury</span></a></p>
athmane mokraoui [BoF] ⏚ꝃ⌁⁂<p>Plaque de rue trilingue dans une commune algérienne.</p><p>L'<a href="https://mstdn.fr/tags/OCR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OCR</span></a> ne fonctionne pas en kabyle.</p>
Jeanne (spellboundblog)<p>When will AI understand document semantics? Examining how LLMs may (or may not?) help improve OCR of text - with examples of reading across column breaks.</p><p><a href="https://pdfa.org/when-will-ai-understand-document-semantics" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">pdfa.org/when-will-ai-understa</span><span class="invisible">nd-document-semantics</span></a></p><p><a href="https://digipres.club/tags/ipres2025" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ipres2025</span></a> <a href="https://digipres.club/tags/ocr" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ocr</span></a></p>
flumen_calculi<p>Jetzt mal angenommen, ich wollte auf meinem Mac Scans/Fotos (pdf/jpg/png) von handschriftlichen Notizen in Text umwandeln. Es braucht keine großartige Automatisierung und die Texte sind eher kurz.</p><p>Kennt Ihr da eine App, mit die dafür geeignet ist und die auch mit der vereinfachten Ausgangsschrift zurechtkommt?</p><p><a href="https://ruhr.social/tags/boost" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>boost</span></a> <a href="https://ruhr.social/tags/followerpower" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>followerpower</span></a> <a href="https://ruhr.social/tags/ocr" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ocr</span></a> <a href="https://ruhr.social/tags/macos" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>macos</span></a></p>
Lambert Heller<p>"Cutting-edge Open OCR Models / We’ve seen an incredible wave of new models this past year. Because so much work is happening in the open, these players build on and benefit from each other’s work. A great example is AllenAI’s release of OlmOCR, which not only released a model but also the dataset used to train it. With these, others can build upon them in new directions. The field is incredibly active, but it’s not always obvious which model to use."</p><p><a href="https://openbiblio.social/tags/vlm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>vlm</span></a> <a href="https://openbiblio.social/tags/atr" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>atr</span></a> <a href="https://openbiblio.social/tags/ocr" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ocr</span></a> </p><p><a href="https://toot.cafe/@tomayac/115418110661215543" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">toot.cafe/@tomayac/11541811066</span><span class="invisible">1215543</span></a></p>
athmane mokraoui [BoF] ⏚ꝃ⌁⁂<p>Je trouve que c'est une « comique » qu'en <a href="https://mstdn.fr/tags/OCR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OCR</span></a> ont nous force à choisir une langue.</p><p>Genre le texte obtenu tu vas l'avoir dans la langue voulue 🤭</p><p>Si on parle de script ou de situation, ce serait mieux mais delà à parler de langue en sortie, c'est juste, je ne trouve pas de mots, c'est comique.</p><p>On pourra parler de langue en sortie s'il y a détection de langue or L'OCR classique ne fait pas de détection de langue.</p>
Terence Eden<p>What's working:</p><p>✅ Download JSON of each page from Amazon.<br>✅ Deobfuscate the SVG "DRM".<br>✅ Draw each letter on the page with the correct indent, placement, and font (italics, etc).</p><p>What's mostly working:<br>🚧 OCR. Tesseract gets most of the text, but some errors.</p><p>What's not working:<br>❌ OCR doesn't output italics.<br>❌ Linebreaks are hardcoded.<br>❌ Doesn't integrate into the original ePub code - so no chapters etc.<br>❌ No idea about footnotes, images, etc.</p><p><a href="https://mastodon.social/tags/Kindle" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Kindle</span></a> <a href="https://mastodon.social/tags/DRM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DRM</span></a> <a href="https://mastodon.social/tags/OCR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OCR</span></a></p>
Terence Eden<p>I'm using <a href="https://mastodon.social/tags/tesseract" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tesseract</span></a> 5 to do the <a href="https://mastodon.social/tags/OCR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OCR</span></a> locally.</p><p>It is *mostly* pretty good - because the default font is nice and clear.</p><p>But it does have some problems with spaces. Ideally I want this to be fully automated, I don't want to manually correct stuff.</p><p>Any recommendations for a *local* OCR which runs on <a href="https://mastodon.social/tags/Linux" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Linux</span></a></p>
Oblomov<p><a href="https://sociale.network/tags/AskFedi" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AskFedi</span></a> is there an <a href="https://sociale.network/tags/OCR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OCR</span></a> for <a href="https://sociale.network/tags/music" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>music</span></a> notation? Something that can convert scanned sheet music in some standardized music notation format that can be typeset with appropriate programs?</p>
Mia<p>In which Sak Supple shares his experiments with using LLMs to transcribe 18th and 19th century playbills from the British Library - manicules, long 's' and all! 'blplaybills.org: better search results using LLMs' <a href="https://www.bl.uk/stories/blogs/posts/blplaybills-org-better-search-results-using-llms" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">bl.uk/stories/blogs/posts/blpl</span><span class="invisible">aybills-org-better-search-results-using-llms</span></a></p><p><a href="https://hcommons.social/tags/OCR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OCR</span></a> <a href="https://hcommons.social/tags/TheaterHistory" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TheaterHistory</span></a> <a href="https://hcommons.social/tags/LLMs" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLMs</span></a> <a href="https://hcommons.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://hcommons.social/tags/ATR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ATR</span></a></p>
your auntifa liza 🇵🇷 🦛 🦦<p>just the top stub for posts without <a href="https://mastodon.social/tags/AltText" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AltText</span></a> that i <a href="https://mastodon.social/tags/OCR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OCR</span></a> because <a href="https://mastodon.social/tags/yaddaYadda" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>yaddaYadda</span></a></p>
Jürgen Hubert<p>Frage: Kann jemand gute <a href="https://mementomori.social/tags/FOSS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>FOSS</span></a> <a href="https://mementomori.social/tags/OCR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OCR</span></a> -Werkzeuge empfehlen, die gut mit <a href="https://mementomori.social/tags/Fraktur" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Fraktur</span></a> -Schrift umgehen können?</p>
Alexander Winkler<p><span class="h-card" translate="no"><a href="https://fedihum.org/@sarahalang" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>sarahalang</span></a></span> at "Digital Neo-Latin studies: ideas and perspectives" on efficient <a href="https://openbiblio.social/tags/OCR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OCR</span></a> Post-Correction.</p><p><a href="https://openbiblio.social/tags/neolatin" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>neolatin</span></a></p>
Ethan Black<p><span class="h-card" translate="no"><a href="https://fandom.ink/@Fragglemuppet" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>Fragglemuppet</span></a></span> <span class="h-card" translate="no"><a href="https://mastodon.social/@JKrotkov" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>JKrotkov</span></a></span> You'd think modern accessibility tech would use <a href="https://fosstodon.org/tags/OCR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OCR</span></a> ...</p>
⚯ Michel de Cryptadamus ⚯<p>Released v1.17.0 of The Pdfalyzer, the surprisingly popular tool for analyzing (possibly malicious) PDFs I created after my own unpleasant experience. Now ships with two command line tools for extracting stuff from PDF files:</p><p>1. extract_text_from_pdfs() - brute force extract all text from a PDF, including doing an <a href="https://universeodon.com/tags/OCR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OCR</span></a> extraction of any embedded images</p><p>2. extract_pdf_pages() - rip a page range from a <a href="https://universeodon.com/tags/PDF" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PDF</span></a> and write them to a new one</p><p>* Github: <a href="https://github.com/michelcrypt4d4mus/pdfalyzer" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/michelcrypt4d4mus/p</span><span class="invisible">dfalyzer</span></a><br>* Pypi: <a href="https://pypi.org/project/pdfalyzer/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">pypi.org/project/pdfalyzer/</span><span class="invisible"></span></a><br>* Homebrew: <a href="https://formulae.brew.sh/formula/pdfalyzer" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">formulae.brew.sh/formula/pdfal</span><span class="invisible">yzer</span></a><br>* Fun thread someone made last week using Pdfalyzer to explain some of how byzantine the PDF format is: <a href="https://x.com/VikParuchuri/status/1965773078585344215" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">x.com/VikParuchuri/status/1965</span><span class="invisible">773078585344215</span></a></p><p><a href="https://universeodon.com/tags/pypi" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>pypi</span></a> <a href="https://universeodon.com/tags/python" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>python</span></a> <a href="https://universeodon.com/tags/pdf" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>pdf</span></a> <a href="https://universeodon.com/tags/pdfs" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>pdfs</span></a> <a href="https://universeodon.com/tags/malware" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>malware</span></a> <a href="https://universeodon.com/tags/Threatassessment" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Threatassessment</span></a> <a href="https://universeodon.com/tags/maldoc" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>maldoc</span></a> <a href="https://universeodon.com/tags/malwareanalysis" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>malwareanalysis</span></a> <a href="https://universeodon.com/tags/homebrew" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>homebrew</span></a> <a href="https://universeodon.com/tags/infosec" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>infosec</span></a> <a href="https://universeodon.com/tags/cybersecurity" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>cybersecurity</span></a> <a href="https://universeodon.com/tags/yararule" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>yararule</span></a> <a href="https://universeodon.com/tags/PdfFies" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PdfFies</span></a></p>
CrossAsia<p>Major milestone! 121 million Chinese characters from 1,937 historial titles are now fully searchable thanks to OCR collaboration between <span class="h-card" translate="no"><a href="https://openbiblio.social/@stabi_berlin" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>stabi_berlin</span></a></span> East Asia Department and Academia Sinica Taiwan. Read more: <a href="http://sbb.berlin/x05w2" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">http://</span><span class="">sbb.berlin/x05w2</span><span class="invisible"></span></a><br><a href="https://openbiblio.social/tags/DigitalHumanities" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DigitalHumanities</span></a> <a href="https://openbiblio.social/tags/OCR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OCR</span></a> <a href="https://openbiblio.social/tags/OpenAcess" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenAcess</span></a> <a href="https://openbiblio.social/tags/CrossAsia" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CrossAsia</span></a></p>
de.hypotheses<p>Strg + F statt digitales Blättern: <span class="h-card" translate="no"><a href="https://troet.cafe/@manuel_kamenzin" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>manuel_kamenzin</span></a></span> nimmt die automatische Texterkennung als Ausgangspunkt, um Studierende in die Welt der <a href="https://fedihum.org/tags/DigitalHumanities" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DigitalHumanities</span></a> einzuführen. </p><p>Ein Bericht aus der Praxis inklusive Übungen, die den Studierenden verdeutlichen, wie sich ihr Arbeitsalltag durch digitale Methoden entscheidend erleichtern lässt – mit überschaubarem Aufwand 👇</p><p><a href="https://digitrip.hypotheses.org/3787" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">digitrip.hypotheses.org/3787</span><span class="invisible"></span></a> via <span class="h-card" translate="no"><a href="https://fedihum.org/@digitrip" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>digitrip</span></a></span><br>fedihum.org</p><p><a href="https://fedihum.org/tags/Lehre" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Lehre</span></a> <a href="https://fedihum.org/tags/OCR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OCR</span></a> <a href="https://fedihum.org/tags/WissenschaftlichesArbeiten" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WissenschaftlichesArbeiten</span></a></p>
Alíz Horváth<p>Coming up on October 3-4, 2025 at Central European University in Vienna: OCR/HTR Workshop for Under-resourced and Under-represented Languages in Digital Humanities, funded by the Cluster of Excellence EurAsian Transformations and by CLARIAH-AT! (Main organizer: yours truly) <a href="https://fedihum.org/tags/digitalhumanities" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>digitalhumanities</span></a> <a href="https://fedihum.org/tags/multilingualdh" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>multilingualdh</span></a> <a href="https://fedihum.org/tags/textrecognition" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>textrecognition</span></a> <a href="https://fedihum.org/tags/ocr" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ocr</span></a> <a href="https://fedihum.org/tags/htr" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>htr</span></a></p>
ZfdG<p>Wer heute noch etwas interessantes zu lesen braucht, kann sich diesen frisch bei uns erschienenen Fachartikel zu Gemüte führen:</p><p>Norbert Fischer, Dominik Kimmel und Frank Puppe berichten über ein Experiment, bei dem die Texte von beschrifteten Bildkarten des Leibniz-Zentrums für Archäologie (LEIZA) einerseits mit einer klassischen <a href="https://fedihum.org/tags/Deep" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Deep</span></a>-Learning-Pipeline und andererseits mit <a href="https://fedihum.org/tags/Large" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Large</span></a>-Language-Modellen (LLMs) erschlossen werden: </p><p><a href="https://doi.org/10.17175/2025_09" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">doi.org/10.17175/2025_09</span><span class="invisible"></span></a></p><p><a href="https://fedihum.org/tags/DigitalHumanities" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DigitalHumanities</span></a> <a href="https://fedihum.org/tags/dh" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dh</span></a> <a href="https://fedihum.org/tags/archive" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>archive</span></a> <a href="https://fedihum.org/tags/llm" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>llm</span></a> <a href="https://fedihum.org/tags/ocr" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ocr</span></a></p>