lingo.lol is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for linguists, philologists, and other lovers of languages.

Server stats:

66
active users

#pymupdf

0 posts0 participants0 posts today

Ever felt the need to convert a #PDF into a fixed-layout #EPUB that preserves the table of contents, internal cross-references and hyperlinks? Finding no out-of-the-box solution, I've developed one myself using #Python and the #PyMuPDF library. Here it is, open source, and ready for use:

github.com/aourednik/pdf2epub3

My script is particularly suitable for the conversion of complex layout PDFs generated with variants of #TeXLaTeX.
Enjoy!

GitHubGitHub - aourednik/pdf2epub3fixed: Convert PDF to fixed-layout EPUB, conserving the table of contents, inner cross-references and hyperlinks.Convert PDF to fixed-layout EPUB, conserving the table of contents, inner cross-references and hyperlinks. - aourednik/pdf2epub3fixed

Today I managed to cobble up a #Python script to remove your name from #PDF annotations using #PyMuPDF and #FreeSimpleGUI, then I tried #pyinstaller and I have something that seems to run on Linux... so many steps!!!

It never ceases to amaze me how hard it its to provide software for other people to run!

If you think it could be useful to you or someone, I AGPL licensed it here:

github.com/villares/anonymize-

UPDATE: @Introscopia built a Windows.exe version for me also using pyinstaller, yay!

#python #linguistics #NLP #pymupdf

Let's say I have a raw text that I got from a pdf , where the authors of said pdf are too boomer to release it as a structured text.

But there are Keywords and chapters.

Do you have good advice or a good resource for how to get that structure back from the content?

(I'm going into it with the agenda to prove that they are badly written, so if I can't identify what a paragraph is about that's "good")

I used to be able to point my #ThonnyIDE to a #conda env but in this other computer I can't seem to make it work anymore :((

Update 1: Well, it runs, but some libraries seem to break :((

(maybe the lib is not well behaved, but I don't have the energy to chase this right now)

Update 2: #PyMuPDF I'm looking at you!

(runs fine from the command line or from Thonny's other env, go figure)