lingo.lol is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for linguists, philologists, and other lovers of languages.

Server stats:

66
active users

#k8s

3 posts3 participants0 posts today

Kubernetes curly: deployment with autoscaling, each pod depends on and occasionally writes-to an external database.
To minimise database reads, an in-memory cache is implemented in the application.

However, when a pod writes to the database it should invalidate that key in the cache for all pods.
This works fine for the local cache, but how to distribute that invalidation?

I suppose we could use a statefulset and then hit the service for each other running pod but that seems... messy.

Replied in thread

@phil well, it's definitely not what you want but it does what you want; have you tried minikube?

#k8s gives you the ability to cut a deployment that either works and spins down your previously running code, or fails and uh, doesn't.

So, having a home #k8s cluster something about Posgres HA has been bugging me a lot. When there's an electric blackout (this is #Spain after all), all the pods go down. But what happens with Postgres is that the replicas go into a process to sync and elect a new master, and this takes time.

Meanwhile, the pgpool will give successful database connections to apps in pods, but only read-only.

What happens with an app like #Matrix #Synapse is that I think it gets database connections in a pool at start-up, and as it succeeds, it just continues. However, when it actually tries to make updates and inserts, it will get errors, but now it will only log them; they aren't fatal. Or would log them unless the logs were by default off because of privacy and security.

The initial read-only database connections are never upgraded to read-write because the application doesn't expect this kind of a failure, even when the new master is chosen.

Meanwhile the Matrix server continues in a highly degraded mode without being able to persist messages sent. It will only be able to relay them to currently connected online clients. This leads to users getting diverging views to the messages on channels.

I solved this by adding an initContainer to check for read-write connection to Postgres before the Synapse pod start-up, but it's a hack.

So, I've been using Thanos to receive and store my prometheus metrics long term in a self hosted S3 bucket. Thanos also acts as a datasource for my dashboards in Grafana, and provides a Ruler, which evaluates alerting rulers and forwards them to my alertmanager. It's ok. It's certainly got it's downsides, which I can go into later, but I've thinking... what about Mimir?

How do you all feel about Grafana's Mimir (source on GitHub)? It's AGPL and seems to literally be a replacement of Thanos, which is Apache 2.0.

Thanos description from their website:

Open source, highly available Prometheus setup with long term storage capabilities.

Mimir description from their website:

...open source software project that provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus and OpenTelemetry metrics.

Both with work with alloy and prometheus alike. Both require you to configure initially confusing hashrings and replication parameters. Both have a bunch of large companies adopting them, so... now I feel conflicted. Should I try mimir? Poll in reply.

GitHubGitHub - grafana/mimir: Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus. - grafana/mimir

Man Prometheus is a pain to recover once its data store is in any way out of shape. Did NOT help that it was buried inside Kubernetes inside a PVC.

Thankfully it was only Dev environment today but if this ever pages on Prod we're losing data as it stands.

I'll write something up for a run book but eesh.