DevOps & SRE notes

Kubernetes Goat is a "Vulnerable by Design" cluster environment to learn and practice Kubernetes security using an interactive hands-on playground 🚀

https://github.com/madhuakula/kubernetes-goat

GitHub

GitHub - madhuakula/kubernetes-goat: Kubernetes Goat is a "Vulnerable by Design" cluster environment to learn and practice Kubernetes…

Kubernetes Goat is a "Vulnerable by Design" cluster environment to learn and practice Kubernetes security using an interactive hands-on playground 🚀 - madhuakula/kubernetes-goat

🔥7👍3

2.63K viewstutunak, 08:50

DevOps & SRE notes

The article features an interview with Landon Clipp, who built a multi-tenant GPU-based CaaS platform.
- Bypassing the NVIDIA GPU Operator
- Why gVisor Fails for GPUs
- VM Boot Delays
- Firmware and Memory Security
- Ideal Workload

https://kube.fm/gpu-containers-as-a-service-landon

KubeFM

GPU Containers as a Service | KubeFM

👍6

2.68K viewstutunak, 06:38

DevOps & SRE notes

Bulk port forwarding Kubernetes services for local development.
https://github.com/txn2/kubefwd

GitHub

GitHub - txn2/kubefwd: Bulk port forwarding Kubernetes services for local development.

Bulk port forwarding Kubernetes services for local development. - txn2/kubefwd

👍5

2.59K viewstutunak, 07:46

DevOps & SRE notes

It's time to update your kernel An unprivileged local user can write 4 controlled bytes into the page cache of any readable file on a Linux system, and use that to gain root. https://copy.fail/

the Dirty Frag vulnerability class, first discovered and reported by Hyunwoo Kim (@v4bel), which can obtain root privileges on major Linux distributions by chaining the xfrm-ESP Page-Cache Write vulnerability and the RxRPC Page-Cache Write vulnerability.

https://github.com/V4bel/dirtyfrag

X (formerly Twitter)

V4bel (@v4bel) on X

Independent Vuln. Researcher / Pwn2Own Berlin 2025, 2026 / Google kernelCTF 0-day / Pwnie Awards 2025

🔥4❤2

2.7K viewstutunak, edited 09:35

DevOps & SRE notes

The article explains that while Kubernetes excels at scheduling and isolating workloads, it lacks the context to secure Large Language Models (LLMs), which process untrusted natural language inputs. Highlighting four key risks from the OWASP Top 10 for LLMs, the author argues that security controls shouldn't live within the model runtime (like Ollama). Instead, organizations need a dedicated, LLM-aware policy layer (such as LiteLLM, Kong AI Gateway, or Portkey) in front of the model to enforce validation, filtering, and authorization.

https://www.cncf.io/blog/2026/03/30/llms-on-kubernetes-part-1-understanding-the-threat-model/

CNCF

LLMs on Kubernetes Part 1: Understanding the threat model

Let’s say you’ve got an LLM running on Kubernetes. Pods are healthy, logs are clean, users are chatting. Everything looks fine. But here’s the thing: Kubernetes is great at scheduling workloads and…

❤4👍4

2.88K viewstutunak, 06:50

DevOps & SRE notes

Uber engineered an automated approach to migrate its massive Java monorepo (over 600,000 tests, 15 million lines of code) from the deprecated JUnit 4 to JUnit 5. Facing challenges like the lack of native JUnit 5 support in their Bazel build system and custom test configurations, they successfully migrated over 75,000 test classes and 1.25 million lines of code in just four months without disrupting developer workflows.

https://www.uber.com/us/en/blog/junit-migration/

🔥7

2.79K viewstutunak, 08:03

DevOps & SRE notes

Claude Code gave me three "tickets" for a free week. ~~You can grab them using this link:~~ ~~https://claude.ai/referral/NXtyf-cgbQ~~

Claude

Join Claude!

You've been invited to try Claude

❤6👍1👎1

2.71K viewstutunak, edited 11:18

DevOps & SRE notes

The observability market is shifting from volume-based data ingestion to a value-driven model due to the unsustainable costs of scaling cloud-native and AI workloads. Driven by innovations like Chronosphere’s "Logs 2.0" and its subsequent acquisition by Palo Alto Networks, the industry is prioritizing "signal discipline"—retaining only actionable telemetry—and integrating observability directly into broader AI and security platforms.

https://siliconangle.com/2026/02/05/observability-cost-ai-scale-chronosphere-opensourcesummit/

SiliconANGLE

Cloud-native observability enters a new phase as the market pivots from volume to value

Observability is entering a new phase. As cloud-native architectures scale and AI workloads intensify, enterprises are being forced to rethink how they collect, manage and pay for telemetry data — a

❤3👍3

2.86K viewstutunak, 06:34

DevOps & SRE notes

A popular & widely deployed Open Source Container Native Storage platform for Stateful Persistent Applications on Kubernetes.

https://github.com/openebs/openebs

GitHub

GitHub - openebs/openebs: A popular & widely deployed Open Source Container Native Storage platform for Stateful Persistent Applications…

👍3

2.48K viewstutunak, 07:59

DevOps & SRE notes

Managing expenses in the cloud requires a strategic approach beyond just looking at bills. A senior engineer shares valuable insight into optimizing costs effectively in this detailed read.
https://medium.com/@razkevich8/cloud-cost-optimization-a-senior-engineers-guide-d49ed4606de1

Medium

Cloud Cost Optimization: A Senior Engineer’s Guide

👍3❤1

2.51K viewstutunak, 06:11

DevOps & SRE notes

This informative post details a clever method for securing Grafana dashboards when using Google Cloud Identity-Aware Proxy. You will learn how to seamlessly integrate these two powerful technologies for enhanced access control.
https://www.vidbregar.com/blog/grafana-gcp-iap

Vid Bregar

Securing Grafana on Kubernetes with GCP IAP, Gateway API, and Terraform

Follow a step-by-step guide to secure your Grafana deployment on Kubernetes using Google Cloud Identity-Aware Proxy (GCP IAP), Gateway API, and Terraform. This setup helps mitigate CVE risks, enables granular access control, protects against DDoS attacks…

👍3

2.78K viewstutunak, 07:59

DevOps & SRE notes

kro | Kube Resource Orchestrator
https://github.com/kubernetes-sigs/kro

GitHub

GitHub - kubernetes-sigs/kro: kro | Kube Resource Orchestrator

kro | Kube Resource Orchestrator. Contribute to kubernetes-sigs/kro development by creating an account on GitHub.

👍3🤣1

2.37K viewstutunak, 07:59

DevOps & SRE notes

Many organizations are looking for more efficient logging solutions than the traditional stack. This comparison highlights a modern alternative to ELK that aims to reduce complexity and resource usage.
https://osuite.io/articles/modern-alternative-to-elk

osuite.io

ELK alternative: Modern log management setup with Opentelemetry and Opensearch

Full stack observability designed for scale

👍2

2.37K viewstutunak, 09:36

DevOps & SRE notes

The article details how to implement production-grade distributed tracing for complex multi-agent AI workflows using OpenTelemetry.

https://developers.redhat.com/articles/2026/04/06/distributed-tracing-agentic-workflows-opentelemetry#

Red Hat Developer

Distributed tracing for agentic workflows with OpenTelemetry | Red Hat Developer

Agentic applications often involve complex interactions between routing agents, specialist agents, knowledge bases, Model Context Protocol (MCP) servers, and external systems. This complexity makes

👍4❤1

1.81K viewstutunak, 08:01

DevOps & SRE notes

Networking within container orchestration can often seem like a black box to developers. This explanation aims to demystify Kubernetes CNI providers and how they manage connectivity.
https://medium.com/@csinclair11/demystifying-kubernetes-cni-providers-5ed79569c797

Medium

Demystifying Kubernetes CNI Providers

Computer networks have changed. It makes sense, computing platforms have been changing for several years now. From the old days of beefy…

❤4👍1

1.56K viewstutunak, 08:10

DevOps & SRE notes

I found a good example of why autoscaling based only on CPU utilization can cause an outage.

About a week ago, Twingate had an incident that affected us as a client. They've published a postmortem, and it's a good example of why CPU isn't a good metric to rely on when autoscaling your services.

The incident was triggered by elevated network latency affecting communication paths used by the Authorization service. As requests took longer to complete, individual service instances were able to process fewer requests than normal.

This reduction in throughput exposed a limitation in our auto-scaling configuration, which primarily relied on CPU utilization to determine service capacity requirements.

So, from the CPU utilization perspective, everything was OK, but the number of processed requests decreased.

https://status.twingate.com/incidents/49qvqk7swjpq

Twingate

Twingate Service Incident

Twingate's Status Page - Twingate Service Incident.

👍6🔥2

1.54K viewstutunak, 13:02

DevOps & SRE notes

Forwarded from AI Vibe Notes

kagent runs your agents where your workloads already live — on Kubernetes. Deploy, observe, and govern AI agents with the tools your platform team already trusts. Open source. Production grade. Built by the founders of Istio.

https://github.com/kagent-dev/kagent

GitHub

GitHub - kagent-dev/kagent: Cloud Native Agentic AI | Discord: https://bit.ly/kagentdiscord

Cloud Native Agentic AI | Discord: https://bit.ly/kagentdiscord - kagent-dev/kagent

👍4❤2

1.41K viewstutunak, 09:32

DevOps & SRE notes

When you have a special math to calculate your uptime, you always have 100%.

🤣6👏3😱1

1.18K viewstutunak, 08:33

DevOps & SRE notes

The new DNSTracking feature in the Red Hat network observability operator 1.11, which now captures DNS query names directly via eBPF without additional configuration.

https://developers.redhat.com/articles/2026/04/09/how-dns-name-tracking-enhances-network-observability#

Red Hat Developer

How DNS name tracking enhances network observability | Red Hat Developer

Network observability has long had a feature that reports the DNS latencies and response codes for the DNS resolutions in your Kubernetes cluster

👍4

984 viewstutunak, 07:59

DevOps & SRE notes

CLI tool for linting and testing Helm charts
https://github.com/helm/chart-testing

GitHub

GitHub - helm/chart-testing: CLI tool for linting and testing Helm charts

CLI tool for linting and testing Helm charts. Contribute to helm/chart-testing development by creating an account on GitHub.

👍5🔥2❤1

630 viewstutunak, 08:03

About

Blog

Apps

Platform