lingo.lol is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for linguists, philologists, and other lovers of languages.

Server stats:

65
active users

#dataengineering

2 posts2 participants1 post today
Posit<p>What makes tools truly useful? </p><p>Episode 2 of <a href="https://fosstodon.org/tags/TheTestSet" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TheTestSet</span></a> features Wes McKinney (Part 1of 2!) sharing his experience building Pandas &amp; Arrow, plus his surprising past in speedrun communities.</p><p>Tune in for his story at thetestset.co, on Spotify, or Apple Podcasts</p><p><a href="https://fosstodon.org/tags/DataStack" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataStack</span></a> <a href="https://fosstodon.org/tags/DataEngineering" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataEngineering</span></a> <a href="https://fosstodon.org/tags/OpenSource" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenSource</span></a> <a href="https://fosstodon.org/tags/Podcast" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Podcast</span></a> <a href="https://fosstodon.org/tags/Python" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Python</span></a></p>
HackerNoon<p>Discover how CocoIndex transforms data orchestration with a pure Data Flow Programming model — ensuring traceable, immutable, and declarative pipelines for know <a href="https://hackernoon.com/redefining-data-operations-with-data-flow-programming-in-cocoindex-u486ao8" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">hackernoon.com/redefining-data</span><span class="invisible">-operations-with-data-flow-programming-in-cocoindex-u486ao8</span></a> <a href="https://mas.to/tags/dataengineering" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dataengineering</span></a></p>
Posit<p>Ever wonder about the mind behind Pandas &amp; Apache Arrow? 🤔 Ep. 2 of <a href="https://fosstodon.org/tags/TheTestSet" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TheTestSet</span></a> (Part 1!) unpacks Wes McKinney's journey – including his speedrunning past! What makes good tools good?</p><p>🎧 Listen at <a href="https://thetestset.co" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">thetestset.co</span><span class="invisible"></span></a>, on Spotify, or Apple Podcasts</p><p><a href="https://fosstodon.org/tags/DataStack" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataStack</span></a> <a href="https://fosstodon.org/tags/DataEngineering" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataEngineering</span></a> <a href="https://fosstodon.org/tags/Pandas" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Pandas</span></a> <a href="https://fosstodon.org/tags/OpenSource" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenSource</span></a> <a href="https://fosstodon.org/tags/PodcastLaunch" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PodcastLaunch</span></a> <a href="https://fosstodon.org/tags/Python" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Python</span></a></p>
blaze.email<p>🔍 Excited about AXLearn for modular ML training, Pinterest's Moka for massive data processing, and PromiseTune for causal configuration tuning! <a href="https://mastodon.social/tags/MachineLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MachineLearning</span></a> <a href="https://mastodon.social/tags/DataEngineering" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataEngineering</span></a> </p><p><a href="https://blaze.email/Machine-Learning-Engineer" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">blaze.email/Machine-Learning-E</span><span class="invisible">ngineer</span></a></p>

🧙‍♂️ One does not simply build reports on OLTP data…

This week on The Drill Down with Ahmad & James, our special guest
Kristyna Ferris will be presenting a session titled "The Fellowship of the Star Schema: Transforming OLTP Data for Power BI"

🛠️ This session is packed with:
- Clear distinctions between OLTP & OLAP
- Tips for building Power BI-ready models
- A sprinkle of Slowly Changing Dimension magic

💡Whether you’re a data wizard 🧙, business hobbit 🧝‍♀️, or SQL ranger 🏹 — this is your quest.

🗓️ Join us LIVE on LinkedIn | Wednesday, July 2nd @ 2PM Central
lnkd.in/eWh4SsBb

pro tip for user interface designers:

if you have hundreds of millions of dollars of venture capital and you want to make a user facing data analytics tool of some kind and you think it's reasonable to ask an average human being to type this:

CAST('2023-05-01' AS TIMESTAMP)

to do literally anything with a date or time in your application's user interface, just stop right there. do not pass go, do not collect $200, and do not ever attempt to offer feedback to a UX designer ever again. something is deeply broken inside you that means there are certain mysteries of the universe that even the guys who designed the postgres command line can access that you will never know, and that's ok. You can still live a really rad life.

Pinpointing differences between two tables is very important for tasks like validating data migrations or spotting corruption. But when those tables live in different databases, it becomes tricky due to issues like network costs and different SQL dialects. In this article, Erez Shinnan shared how Reladiff tackles these challenges and its development journey.

eshsoft.com/blog/how-reladiff-

eshsoft.comHow Reladiff Works | Esh Software BlogA deep dive into the workings of Reladiff, exploring the challenges and techniques in data engineering with SQL.

The @huggingface team has created tiny-agents, a new feature that lets their huggingface_hub software act as a Model Context Protocol (MCP) Client. In their recent article, they explained how to set up these tiny agents to give new abilities to your LLMs to interact with the world and perform complex tasks.

huggingface.co/blog/python-tin

huggingface.coTiny Agents in Python: a MCP-powered agent in ~70 lines of codeWe’re on a journey to advance and democratize artificial intelligence through open source and open science.

🔔 Slides zu Legal Data Engineering 🔔

Was ist Legal Data Engineering? Wie sieht die Praxis juristischer Daten in Deutschland aus? Welche rechtlichen Probleme ergeben sich im Zusammenhang mit Legal Data Engineering? Diese Präsentation bietet eine Einführung zu Legal Data Engineering und sucht Antworten auf diese Fragen.

Slides: zenodo.org/records/15575231/fi

Legal Data Engineering ist der Schwerpunkt eines jeden Legal Data Science Projekts. Kern von Data Engineering ist der ETL-Prozess: Extraktion, Transformation und das (Hoch-)Laden von Daten. Die Slides bieten dazu einen allgemeinverständlichen Überblick.

Weitere praktische Themen sind die Verfügbarkeit juristischer Daten in Deutschland (insbesondere strukturierter Daten und Programmierschnittstellen), Probleme bei der Tokenisierung in Large Language Models und die Fehlerkennung von Gen-Namen in Microsoft Excel.

Bei den rechtlichen Fragen des Legal Data Engineering behandle ich die tradierte Rechtslage, das neue Datennutzungsgesetz (DNG) und Bayern als Negativbeispiel einer verschlossenen juristischen Datenkultur. Eine Diskussion der Datenschutzklage gegen OpenJur und der Open Data-Klage der Gesellschaft für Freiheitsrechte (GFF) gegen die Bundespolizei klären über aktuelle Entwicklungen in diesem Rechtsbereich auf.

The @trailofbits of Bits team managed to cut down PyPI’s test suite runtime from 163 seconds to just 30 - a remarkable 81% improvement. In their write-up, they detailed the steps that made it possible: using pytest-xdist for parallel test execution, leveraging Python 3.12’s sys.monitoring, removing unnecessary imports, etc.

blog.trailofbits.com/2025/05/0

The Trail of Bits Blog · Making PyPI's test suite 81% fasterSee how we slashed PyPI’s test suite runtime from 163 to 30 seconds. The techniques we share can help you dramatically improve your own project’s testing performance without sacrificing coverage.