One does not simply build reports on OLTP data…
This week on The Drill Down with Ahmad & James, our special guest
Kristyna Ferris will be presenting a session titled "The Fellowship of the Star Schema: Transforming OLTP Data for Power BI"
This session is packed with:
- Clear distinctions between OLTP & OLAP
- Tips for building Power BI-ready models
- A sprinkle of Slowly Changing Dimension magic
Whether you’re a data wizard
, business hobbit
, or SQL ranger
— this is your quest.
Join us LIVE on LinkedIn | Wednesday, July 2nd @ 2PM Central
https://lnkd.in/eWh4SsBb
pro tip for user interface designers:
if you have hundreds of millions of dollars of venture capital and you want to make a user facing data analytics tool of some kind and you think it's reasonable to ask an average human being to type this:
CAST('2023-05-01' AS TIMESTAMP)
to do literally anything with a date or time in your application's user interface, just stop right there. do not pass go, do not collect $200, and do not ever attempt to offer feedback to a UX designer ever again. something is deeply broken inside you that means there are certain mysteries of the universe that even the guys who designed the postgres command line can access that you will never know, and that's ok. You can still live a really rad life.
scariest shit i've seen in years
Pinpointing differences between two tables is very important for tasks like validating data migrations or spotting corruption. But when those tables live in different databases, it becomes tricky due to issues like network costs and different SQL dialects. In this article, Erez Shinnan shared how Reladiff tackles these challenges and its development journey.
The @huggingface team has created tiny-agents, a new feature that lets their huggingface_hub software act as a Model Context Protocol (MCP) Client. In their recent article, they explained how to set up these tiny agents to give new abilities to your LLMs to interact with the world and perform complex tasks.
A great job with a fantastic group: https://www.dataorchard.org.uk/analytics-engineer-vacancy
#DataScience #DataEngineering #RStats #JobFairy #FediHire @data_orchard
Slides zu Legal Data Engineering
Was ist Legal Data Engineering? Wie sieht die Praxis juristischer Daten in Deutschland aus? Welche rechtlichen Probleme ergeben sich im Zusammenhang mit Legal Data Engineering? Diese Präsentation bietet eine Einführung zu Legal Data Engineering und sucht Antworten auf diese Fragen.
Slides: https://zenodo.org/records/15575231/files/Fobbe_2025-05-28_Legal-Data-Engineering.pdf?download=1
Legal Data Engineering ist der Schwerpunkt eines jeden Legal Data Science Projekts. Kern von Data Engineering ist der ETL-Prozess: Extraktion, Transformation und das (Hoch-)Laden von Daten. Die Slides bieten dazu einen allgemeinverständlichen Überblick.
Weitere praktische Themen sind die Verfügbarkeit juristischer Daten in Deutschland (insbesondere strukturierter Daten und Programmierschnittstellen), Probleme bei der Tokenisierung in Large Language Models und die Fehlerkennung von Gen-Namen in Microsoft Excel.
Bei den rechtlichen Fragen des Legal Data Engineering behandle ich die tradierte Rechtslage, das neue Datennutzungsgesetz (DNG) und Bayern als Negativbeispiel einer verschlossenen juristischen Datenkultur. Eine Diskussion der Datenschutzklage gegen OpenJur und der Open Data-Klage der Gesellschaft für Freiheitsrechte (GFF) gegen die Bundespolizei klären über aktuelle Entwicklungen in diesem Rechtsbereich auf.
Tired of babysitting DIY scraping scripts that crash the moment you scale?
You’re not alone.
PromptCloud takes the pain out of large-scale data extraction with fully managed, reliable solutions — so you can focus on what really matters: insights.
The @trailofbits of Bits team managed to cut down PyPI’s test suite runtime from 163 seconds to just 30 - a remarkable 81% improvement. In their write-up, they detailed the steps that made it possible: using pytest-xdist for parallel test execution, leveraging Python 3.12’s sys.monitoring, removing unnecessary imports, etc.
https://blog.trailofbits.com/2025/05/01/making-pypis-test-suite-81-faster/
#OpenToWork | Data Engineer | ETL & Contrôle Qualité
CV PDF : http://gabriel.chandesris.free.fr/gabysblog/docs/CVGabrielChandesris.pdf
Expert #ETL #Python #SQL #DataQuality #BigData
Prêt à optimiser vos pipelines de données ! RT plz #i4emploi #Recrutement #Emploi #DataEngineering #Spark #Scala ...
OPEN SOURCE
The Problem
There have been many instances where I needed to compare two dataframes and analyze their differences. To address this need, I created a fast Python library called "data_fingerprint" that does exactly that.
Check it out and let me know what you think!
https://github.com/SimpleSimpler/data_fingerprint
Dust.tt lets your AI agents talk to data like a pro. No more format headaches. Pure data power at your fingertips.
#DataEngineering #AI #DeveloperTools https://blog.dust.tt/spreadsheets-databases-and-beyond-creating-a-universal-ai-query-layer/
Identify Numpy Universal Functions! #pythonprogramming #dataengineering #softwareengineering #python #datascience #numypy #arrays #machinelearning #dataanalytics #coding
Numpy Vector vs Loop on Array Speed #pythonprogramming #dataengineering #softwareengineering #python #datascience #numypy #arrays #machinelearning #dataanalytics #coding #dataanalysis
NumPy Array Copy vs View in Python #softwareengineering #dataengineering #datascience #dataanalytics #pythonprogramming #arrays #python #numypy #coding