<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:webfeeds="http://webfeeds.org/rss/1.0" xmlns:media="http://search.yahoo.com/mrss/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>Joseph Velliah</title>
    <description>Learn best practices, news, tips, scenarios and code samples about Cloud Computing, Security, Kubernetes, DevOps, IaC, Microsoft 365, Azure, AWS and SharePoint.</description>
    <link>https://blog.josephvelliah.com/</link>
    <image>
      <url>https://blog.josephvelliah.com/assets/images/joseph.jpg</url>
      <title>Joseph Velliah</title>
      <link>https://blog.josephvelliah.com/</link>
    </image>
    <atom:link href="https://blog.josephvelliah.com/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Tue, 31 Mar 2026 00:04:41 +0000</pubDate>
    <lastBuildDate>Tue, 31 Mar 2026 00:04:41 +0000</lastBuildDate>
    <generator>Jekyll v4.4.1</generator>
    <webfeeds:analytics id='G-B1XQNQTJXT' engine="GoogleAnalytics"/>
    <ttl>60</ttl>
    
      <item>
        <title>Building a Rust gRPC AI Security Gateway for LLM Traffic</title>
        <description>&lt;p&gt;I wanted a &lt;strong&gt;small, honest implementation&lt;/strong&gt; of the GenAI governance &lt;em&gt;shape&lt;/em&gt; in code: a component on &lt;strong&gt;every LLM call&lt;/strong&gt; that applies policy first, optionally scrubs prompts and responses, and emits metrics—without pretending to be enterprise inline inspection. This repo is a &lt;strong&gt;Rust gRPC MVP&lt;/strong&gt; with keyword and rate limits, regex redaction, Prometheus counters, and pluggable providers (OpenAI, Anthropic, mock).&lt;/p&gt;

&lt;p&gt;Industry collateral often uses the same vocabulary—visibility, inline policy, sensitive data in prompts and answers—for example &lt;a href=&quot;https://www.zscaler.com/products-and-solutions/securing-generative-ai&quot;&gt;Zscaler on securing generative AI&lt;/a&gt; and &lt;a href=&quot;https://www.zscaler.com/products-and-solutions/ai-guardrails&quot;&gt;AI Guardrails&lt;/a&gt;. &lt;em&gt;No affiliation with Zscaler; not an endorsement or a capability comparison.&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&quot;problem&quot;&gt;Problem&lt;/h2&gt;

&lt;p&gt;When many clients talk straight to a provider API, you get recurring failure modes: &lt;strong&gt;no single choke point&lt;/strong&gt; for policy, &lt;strong&gt;accidental or careless PII&lt;/strong&gt; in prompts or model output, &lt;strong&gt;abuse and cost spikes&lt;/strong&gt;, and &lt;strong&gt;weak signals&lt;/strong&gt; for operators who need to know what was allowed, blocked, or altered.&lt;/p&gt;

&lt;p&gt;A gateway in front of the provider gives you that choke point: enforce rules before the model runs, redact or block on the way in and out, and emit metrics so you are not flying blind.&lt;/p&gt;

&lt;p&gt;This repo implements that as an MVP in Rust (Tokio, gRPC/&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tonic&lt;/code&gt;, pluggable backends): keyword blocklist, fixed-window rate limit, regex redaction—not a full DLP catalog or ML classifiers, but the same narrative line: &lt;strong&gt;inspect and govern the path&lt;/strong&gt;, then call the model.&lt;/p&gt;

&lt;h2 id=&quot;why-rust-and-grpc-for-this-kind-of-gateway&quot;&gt;Why Rust and gRPC for this kind of gateway&lt;/h2&gt;

&lt;p&gt;The gateway sits &lt;strong&gt;inline&lt;/strong&gt;: if that hop jitters, people stop trusting “govern every call.” &lt;strong&gt;Rust&lt;/strong&gt; buys predictable latency in the enforcement path—no GC pauses while you scan and rewrite text—and memory safety while doing it. &lt;strong&gt;gRPC with Protobuf&lt;/strong&gt; gives a &lt;strong&gt;versioned&lt;/strong&gt; request/response contract (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SecureCompletionRequest&lt;/code&gt; / &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SecureCompletionResponse&lt;/code&gt;), compact wire encoding, and generated server stubs so callers share one schema instead of ad hoc JSON that drifts quietly as fields change. The same surface extends cleanly to &lt;strong&gt;server streaming&lt;/strong&gt; when you want token-by-token replies without a bespoke HTTP contract per client.&lt;/p&gt;

&lt;h2 id=&quot;architecture&quot;&gt;Architecture&lt;/h2&gt;

&lt;p&gt;The one-liner version:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Client → gRPC Gateway (Rust) → Policy Pipeline → Pluggable LLM Provider → Response + Metrics
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The diagram below is &lt;strong&gt;inspired by the “at a glance” story&lt;/strong&gt; common in GenAI security collateral (visibility, inline control, data-in-motion)—for example &lt;a href=&quot;https://www.zscaler.com/resources/data-sheets/zscaler-gen-ai-security-at-a-glance.pdf&quot;&gt;Zscaler’s Gen AI Security at-a-glance PDF&lt;/a&gt;—but redrawn for &lt;strong&gt;this open-source MVP only&lt;/strong&gt;. It is not a depiction of Zscaler’s product or deployment model.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2026/03/rust-grpc-ai-security-gateway.png&quot; alt=&quot;Architecture diagram&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The three stacked stages echo the &lt;strong&gt;access · data · visibility&lt;/strong&gt; framing used in GenAI security “at a glance” sheets: one inline choke point, with &lt;strong&gt;metrics&lt;/strong&gt; as the separate HTTP scrape surface (port 8080), not an extra hop on the gRPC path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Components:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Gateway:&lt;/strong&gt; Tokio async server. gRPC on port 50051 for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SecureCompletion&lt;/code&gt;, HTTP on 8080 for Prometheus &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/metrics&lt;/code&gt; only.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Policy before the model:&lt;/strong&gt; Keyword blocklist and per-user rate limit run inside &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PolicyEngine&lt;/code&gt; before any LLM call.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Redaction:&lt;/strong&gt; Regex-based scrubbing on the prompt and/or response in the gRPC handler when enabled (not part of the allow/block decision).&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Providers:&lt;/strong&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ChatCompletionProvider&lt;/code&gt; trait. Swap via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LLM_PROVIDER&lt;/code&gt; env: OpenAI, Anthropic, or in-process &lt;strong&gt;mock&lt;/strong&gt; (no HTTP; see README for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ghz&lt;/code&gt; benchmarks).&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Observability:&lt;/strong&gt; Prometheus metrics (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gateway_total_requests&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gateway_blocked_requests&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gateway_allowed_requests&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gateway_provider_errors_total&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gateway_request_latency_seconds&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gateway_tokens_used_total&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;implementation&quot;&gt;Implementation&lt;/h2&gt;

&lt;h3 id=&quot;async-and-grpc&quot;&gt;Async and gRPC&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Tokio&lt;/strong&gt; for async runtime. gRPC server uses &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tonic&lt;/code&gt;; HTTP uses &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;axum&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Protobuf&lt;/strong&gt; defines &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SecureCompletionRequest&lt;/code&gt; (user_id, prompt) and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SecureCompletionResponse&lt;/code&gt; with &lt;strong&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SecureCompletionDecision&lt;/code&gt;&lt;/strong&gt; enum (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ALLOWED&lt;/code&gt; / &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BLOCKED&lt;/code&gt;), plus response text and reason. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tonic-build&lt;/code&gt; compiles proto to Rust at build time.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Dual servers:&lt;/strong&gt; gRPC and HTTP run on separate ports. HTTP serves Prometheus scrape only; chat flows through gRPC only.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;policies&quot;&gt;Policies&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Keywords:&lt;/strong&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BANNED_KEYWORDS&lt;/code&gt; env (comma-separated). Case-insensitive match; blocks before LLM call.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Rate limit:&lt;/strong&gt; In-memory &lt;strong&gt;fixed window&lt;/strong&gt; per &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;user_id&lt;/code&gt; (counter resets after the window elapses). Configurable via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RATE_LIMIT_REQUESTS&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RATE_LIMIT_WINDOW_SECS&lt;/code&gt;; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RATE_LIMIT_MAX_TRACKED_USERS&lt;/code&gt; caps how many distinct IDs are tracked (eviction when full).&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Redaction:&lt;/strong&gt; Regex-based. Built-in patterns for email, API keys, SSN, credit cards, private IPs. Custom patterns via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REDACT_CUSTOM_PATTERNS&lt;/code&gt; (JSON file path); each custom rule’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;id&lt;/code&gt; must also appear in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;REDACT_PATTERNS&lt;/code&gt; to run. Runs on prompt (before LLM) and response (before client).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;pluggable-providers&quot;&gt;Pluggable Providers&lt;/h3&gt;

&lt;p&gt;Each provider implements &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ChatCompletionProvider&lt;/code&gt;. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;from_config()&lt;/code&gt; reads &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LLM_PROVIDER&lt;/code&gt; and instantiates OpenAI, Anthropic, or an in-process &lt;strong&gt;mock&lt;/strong&gt; that returns a fixed string with no HTTP—useful for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ghz&lt;/code&gt; runs that isolate gRPC, policy, and redaction from real LLM latency (see README &lt;em&gt;Gateway-only benchmark&lt;/em&gt;).&lt;/p&gt;

&lt;h2 id=&quot;running-it&quot;&gt;Running It&lt;/h2&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;sk-...
cargo run
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;grpcurl &lt;span class=&quot;nt&quot;&gt;-plaintext&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-import-path&lt;/span&gt; proto &lt;span class=&quot;nt&quot;&gt;-proto&lt;/span&gt; ai_security_gateway.proto &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;{&quot;user_id&quot;:&quot;user-1&quot;,&quot;prompt&quot;:&quot;Say hello&quot;}&apos;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  localhost:50051 ai_security.AiSecurityGateway/SecureCompletion
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Docker (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dockerfile&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker-compose.yml&lt;/code&gt;) and Kubernetes manifests (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;k8s/&lt;/code&gt;) support local images and kind/minikube-style deploys. For cluster runs, the README covers loading the image, creating the API key &lt;strong&gt;Secret&lt;/strong&gt; before pods start when using OpenAI or Anthropic (otherwise the container exits on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OPENAI_API_KEY must be set&lt;/code&gt;), &lt;strong&gt;rollout restart&lt;/strong&gt; after ConfigMap or Secret changes, and port-forward smoke tests with &lt;strong&gt;grpcurl&lt;/strong&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/metrics&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;gateway-only-load-check&quot;&gt;Gateway-only load check&lt;/h3&gt;

&lt;p&gt;With &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LLM_PROVIDER=mock&lt;/code&gt; and the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ghz&lt;/code&gt; commands in the README, you can stress &lt;strong&gt;gRPC + policy + redaction&lt;/strong&gt; without spending tokens. Latency and RPS depend on your machine and concurrency; turn keyword checks and redaction back on when you want those paths included. For long runs with a single &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;user_id&lt;/code&gt;, raise &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RATE_LIMIT_REQUESTS&lt;/code&gt; and clear &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BANNED_KEYWORDS&lt;/code&gt; as the README describes so you are measuring the stack, not the default rate limit.&lt;/p&gt;

&lt;h2 id=&quot;what-worked&quot;&gt;What worked&lt;/h2&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Separate ports for gRPC and HTTP:&lt;/strong&gt; Prometheus &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/metrics&lt;/code&gt; on HTTP; chat only on gRPC. No gRPC-Web or transcoding in this MVP.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Keywords and rate limits before the LLM call.&lt;/strong&gt; Redaction on allowed traffic mirrors the “sensitive data in prompts &lt;em&gt;and&lt;/em&gt; answers” theme—bidirectional scrub, separate from the allow/block decision.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Trait-based providers:&lt;/strong&gt; new backend = new type + &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;from_config()&lt;/code&gt; branch.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;In-memory rate limit&lt;/strong&gt; is enough for one replica; multiple replicas need a shared store (e.g. Redis).&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;next-steps&quot;&gt;Next steps&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Streaming:&lt;/strong&gt; gRPC server-streaming for token-by-token responses.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Distributed rate limiting:&lt;/strong&gt; Redis-backed for horizontal scaling.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;More backends:&lt;/strong&gt; Vertex AI, Azure OpenAI, Ollama.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Jailbreak / prompt-injection classifiers:&lt;/strong&gt; Closer to the guardrails pages’ “inspect before harm” story than a static keyword list (still out of scope for this MVP).&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Response caching:&lt;/strong&gt; Cache by prompt hash to reduce LLM calls.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Repository:&lt;/strong&gt; &lt;a href=&quot;https://github.com/sprider/rust-grpc-ai-security-gateway&quot;&gt;github.com/sprider/rust-grpc-ai-security-gateway&lt;/a&gt;&lt;/p&gt;

</description>
        <pubDate>Sun, 29 Mar 2026 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/building-rust-grpc-ai-security-gateway</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/building-rust-grpc-ai-security-gateway</guid>
        
        <category>Rust</category>
        
        <category>gRPC</category>
        
        <category>AI-Security</category>
        
        
      </item>
    
      <item>
        <title>Claude Code Security: The Smart Way to Integrate AI</title>
        <description>&lt;p&gt;Anthropic just dropped Claude Code Security, and if you’re anywhere near AppSec or DevSecOps, you’ve probably already seen the debate lighting up on LinkedIn and Hacker News. The tool promises to scan entire repositories, reason about code the way a human researcher would, and even suggest patches your team can review before merging.&lt;/p&gt;

&lt;p&gt;Here’s my take on how to actually use it—without throwing away everything you’ve already built.&lt;/p&gt;

&lt;h2 id=&quot;key-takeaways&quot;&gt;Key Takeaways&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Don’t replace your existing tools.&lt;/strong&gt; Deterministic rules stay as the hard gate; AI sits on top as an advisory layer.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Use AI for triage, not truth.&lt;/strong&gt; Claude excels at sorting findings by exploitability and risk—not at being the final word on what ships.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Deploy a “two-net” architecture.&lt;/strong&gt; Your SAST and linters catch known-bad patterns; Claude catches the subtle stuff they miss.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Lock down variability.&lt;/strong&gt; Pin model versions, log everything, and never let AI merge to protected branches on its own.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;the-four-pillar-framework&quot;&gt;The Four-Pillar Framework&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2026/02/4-pillar-framework.drawio.svg&quot; alt=&quot;Four-Pillar Framework: Baseline → Triage → Coverage → Control&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;what-is-claude-code-security&quot;&gt;What is Claude Code Security?&lt;/h2&gt;

&lt;p&gt;If you haven’t seen the announcement yet, Claude Code Security is a new capability baked into Claude Code. It’s currently in limited research preview for Enterprise and Team customers, with broader availability expected later this year.&lt;/p&gt;

&lt;p&gt;The short version: it scans your codebase for vulnerabilities and suggests fixes—but unlike traditional static analysis, it actually &lt;em&gt;reasons&lt;/em&gt; about your code. It traces data flows across files, understands business logic, and catches issues that pattern-matching tools routinely miss. Anthropic claims Claude Opus 4.6 found over 500 vulnerabilities in production open-source projects that had gone undetected for years.&lt;/p&gt;

&lt;p&gt;What caught my attention is the multi-stage verification. Every finding goes through an adversarial self-review before it reaches your dashboard, which (in theory) should cut down on the false positive noise that makes most SAST tools unbearable at scale.&lt;/p&gt;

&lt;p&gt;But here’s the thing: reasoning-based detection is powerful, and it’s also non-deterministic. That’s not a flaw you can toggle off. It’s baked into how large language models work. So the question isn’t whether Claude Code Security is useful—it clearly is. The question is how you integrate it without losing the guarantees your compliance and governance teams depend on.&lt;/p&gt;

&lt;h3 id=&quot;how-claude-differs-from-traditional-sast&quot;&gt;How Claude Differs from Traditional SAST&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2026/02/claude-vs-traditional-sast.drawio.svg&quot; alt=&quot;Comparison: Traditional SAST vs Claude Code Security&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;keep-rules-as-the-baseline-gate&quot;&gt;Keep Rules as the Baseline Gate&lt;/h2&gt;

&lt;p&gt;Most mature teams already run a stack of deterministic controls in CI/CD: linters, SAST scanners, secret detection, dependency checks, policy-as-code gates. These tools aren’t glamorous, but they give you something AI fundamentally cannot: predictable coverage.&lt;/p&gt;

&lt;p&gt;Every rule executes on every build, in exactly the same way. You can reason about that behavior when you write policy. You can audit it. You can explain it to regulators.&lt;/p&gt;

&lt;p&gt;And look, I know the pain points. A 2023 Ponemon study found that developers consider nearly half of all security alerts to be false positives, with the average engineer burning six hours a week just chasing down noise. Some SAST configurations hit false positive rates above 60-70%, depending on the language and ruleset. That’s brutal.&lt;/p&gt;

&lt;p&gt;But turning off your deterministic tools in favor of an LLM doesn’t fix that problem—it trades one kind of uncertainty for another. The issue in most organizations isn’t “we can’t find vulnerabilities.” It’s “we can’t keep up with the ones we already find” and “we don’t know which ones actually matter.”&lt;/p&gt;

&lt;p&gt;So the first principle here is non-negotiable: &lt;strong&gt;your existing static tools remain the hard gate in the pipeline&lt;/strong&gt;. If a critical or high-severity rule fires, the build fails. Full stop. Claude can add signal, flag additional risks, even open blocking findings of its own—but it should never be able to override a deterministic rule that’s already failing. That’s how you preserve the guarantees your governance story is built on.&lt;/p&gt;

&lt;h2 id=&quot;use-claude-primarily-for-triage&quot;&gt;Use Claude Primarily for Triage&lt;/h2&gt;

&lt;p&gt;Where Claude Code Security genuinely moves the needle isn’t raw detection—it’s &lt;strong&gt;triage and explanation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Anyone who’s run static analysis at scale knows exactly what I’m talking about. You roll out a new scanner, it dumps three thousand findings on your backlog, and within two weeks your developers have learned to ignore it entirely. Not because they don’t care about security. Because the signal-to-noise ratio is terrible and nothing in that wall of warnings tells them which issues are actually exploitable.&lt;/p&gt;

&lt;p&gt;This is precisely the kind of problem large language models are good at.&lt;/p&gt;

&lt;p&gt;Claude can look at a finding and answer questions like:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Is this actually exploitable given how data flows through this specific code path?&lt;/li&gt;
  &lt;li&gt;What’s the realistic blast radius if an attacker hits this?&lt;/li&gt;
  &lt;li&gt;How would I fix it in a way that fits this repository’s patterns and conventions?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of handing your team a flat list sorted by severity label, you can pipe your SAST results into Claude and ask it to rank findings by real-world risk. The output isn’t just another label—it’s a narrative explanation and a patch suggestion that makes sense in context.&lt;/p&gt;

&lt;p&gt;The key is that &lt;strong&gt;triage is advisory, not authoritative&lt;/strong&gt;. You’re still enforcing your rules. But now you’re giving engineers a prioritized, annotated backlog instead of an undifferentiated wall of warnings. That cuts alert fatigue, shortens time-to-remediation, and honestly makes your legacy tools feel a lot less “legacy” because they’re plugging into a smarter workflow.&lt;/p&gt;

&lt;h2 id=&quot;use-claude-as-a-second-net-for-coverage&quot;&gt;Use Claude as a “Second Net” for Coverage&lt;/h2&gt;

&lt;p&gt;Once your baseline and triage story are solid, you can start thinking about Claude as a second net—an additional layer that catches what your rules miss.&lt;/p&gt;

&lt;p&gt;Traditional static tools are excellent at the patterns they were explicitly built to find: SQL injection sinks, missing output encoding, direct use of dangerous APIs, weak cryptographic primitives. They’re much less effective at anything that requires understanding business logic, tracing data across multiple files, or reasoning about authorization invariants. That’s where a model that can read and summarize code like a human starts to earn its keep.&lt;/p&gt;

&lt;p&gt;Claude Code Security builds an internal model of how your application works—where data enters, how it transforms, what the code is trying to accomplish. In practice, that means it can surface vulnerabilities that never trip a regex or AST pattern. Things like:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;An authorization check applied in most controllers but quietly bypassed in one edge-case endpoint&lt;/li&gt;
  &lt;li&gt;A multi-step workflow where an assumption about state can be violated if services execute out of order&lt;/li&gt;
  &lt;li&gt;A data path that’s harmless in default configuration but dangerous when a specific feature flag is enabled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here’s how I’d wire this into a pipeline:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2026/02/second-net.drawio.svg&quot; alt=&quot;Two-Net Security Pipeline: Deterministic Tools → Claude Security → Human Review&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The asymmetry here is intentional. If Claude misses something your rules caught, the build still fails. If Claude finds something your rules missed, you’ve just upgraded your coverage. &lt;strong&gt;AI can only help you win more—it can’t redefine what “safe enough to ship” means on its own.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A reasonable policy might look like:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Finding Source&lt;/th&gt;
      &lt;th&gt;Severity&lt;/th&gt;
      &lt;th&gt;Action&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Deterministic tool&lt;/td&gt;
      &lt;td&gt;Critical/High&lt;/td&gt;
      &lt;td&gt;Auto-block PR&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Deterministic tool&lt;/td&gt;
      &lt;td&gt;Medium/Low&lt;/td&gt;
      &lt;td&gt;Create ticket, don’t gate&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Claude only&lt;/td&gt;
      &lt;td&gt;Critical/High&lt;/td&gt;
      &lt;td&gt;Block after human confirms&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Claude only&lt;/td&gt;
      &lt;td&gt;Medium/Low&lt;/td&gt;
      &lt;td&gt;Comment on PR, create ticket&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;That gives you a practical balance. You’re not ignoring AI insights, but you’re not handing over the keys either.&lt;/p&gt;

&lt;h2 id=&quot;how-claude-compares-to-other-tools&quot;&gt;How Claude Compares to Other Tools&lt;/h2&gt;

&lt;p&gt;It’s worth understanding where Claude Code Security sits relative to the other options you’re probably already evaluating.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Capability&lt;/th&gt;
      &lt;th&gt;Claude Code Security&lt;/th&gt;
      &lt;th&gt;Snyk Code&lt;/th&gt;
      &lt;th&gt;Semgrep&lt;/th&gt;
      &lt;th&gt;GitHub Advanced Security&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Detection approach&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;LLM reasoning + self-verification&lt;/td&gt;
      &lt;td&gt;AI + rules (DeepCode)&lt;/td&gt;
      &lt;td&gt;Pattern-based YAML rules&lt;/td&gt;
      &lt;td&gt;Semantic analysis (CodeQL)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Cross-file data flow&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Strong&lt;/td&gt;
      &lt;td&gt;Moderate&lt;/td&gt;
      &lt;td&gt;Limited&lt;/td&gt;
      &lt;td&gt;Strong&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Business logic flaws&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Yes&lt;/td&gt;
      &lt;td&gt;Limited&lt;/td&gt;
      &lt;td&gt;No&lt;/td&gt;
      &lt;td&gt;Limited&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;False positive handling&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Adversarial self-review&lt;/td&gt;
      &lt;td&gt;ML-based filtering&lt;/td&gt;
      &lt;td&gt;Rule tuning&lt;/td&gt;
      &lt;td&gt;Manual triage&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Custom rules&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Natural language prompts&lt;/td&gt;
      &lt;td&gt;Limited (Enterprise)&lt;/td&gt;
      &lt;td&gt;YAML (minutes to write)&lt;/td&gt;
      &lt;td&gt;QL queries (hours to learn)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Scan speed&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Minutes (depends on repo size)&lt;/td&gt;
      &lt;td&gt;Fast&lt;/td&gt;
      &lt;td&gt;Very fast (~10 sec)&lt;/td&gt;
      &lt;td&gt;Slow (minutes to 30+ min)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Pricing&lt;/strong&gt;*&lt;/td&gt;
      &lt;td&gt;Enterprise (custom)&lt;/td&gt;
      &lt;td&gt;$25/month per product (Team)&lt;/td&gt;
      &lt;td&gt;$40/month per contributor&lt;/td&gt;
      &lt;td&gt;$30/month per committer&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;*Pricing as of February 2026. Snyk Team plan requires minimum 5 developers; Enterprise is custom. GitHub unbundled GHAS in April 2025 into Code Security ($30) and Secret Protection ($19) per committer. See vendor sites for current pricing.&lt;/p&gt;

&lt;p&gt;The honest assessment: Claude isn’t trying to replace your SAST tooling. It’s trying to do something those tools can’t—reason about code semantically and explain its findings in plain language. The tradeoff is non-determinism, which is why the two-net architecture makes sense. Use Semgrep or CodeQL for the predictable baseline, and use Claude for the intelligent layer on top.&lt;/p&gt;

&lt;h2 id=&quot;lock-down-variability-where-it-matters&quot;&gt;Lock Down Variability Where It Matters&lt;/h2&gt;

&lt;p&gt;Everything I’ve described only works if you’re honest about how large language models behave. Even with temperature cranked down and prompts held constant, you won’t get identical output every time. That’s not a bug. It’s the nature of the technology.&lt;/p&gt;

&lt;p&gt;So instead of pretending otherwise, deliberately lock down where that variability can affect outcomes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At the configuration level:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Pin model versions and prompt templates where the platform allows—you want behavior to stay stable across builds&lt;/li&gt;
  &lt;li&gt;Define exactly which branches and events trigger AI scans (every PR for smaller services, nightly for monoliths)&lt;/li&gt;
  &lt;li&gt;Log all requests and responses so you can audit what the system did when it influenced a decision&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;At the process level:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;AI &lt;em&gt;can&lt;/em&gt; propose patches, open PRs, annotate findings, and request human review&lt;/li&gt;
  &lt;li&gt;AI &lt;em&gt;cannot&lt;/em&gt; merge to protected branches or override mandatory controls&lt;/li&gt;
  &lt;li&gt;AI-suggested changes go through the same code review standards as any human commit&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;ai-permissions-boundary&quot;&gt;AI Permissions Boundary&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2026/02/ai-permissions-boundary.drawio.svg&quot; alt=&quot;AI Permissions: What AI Can and Cannot Do&quot; /&gt;&lt;/p&gt;

&lt;p&gt;And finally, treat this like any other production system. Threat-model its inputs—yes, including prompt injection risks from code comments and config files. Monitor its behavior over time. Build feedback loops for when it gets things wrong.&lt;/p&gt;

&lt;p&gt;We’ve all seen examples of models confidently suggesting insecure patterns or ignoring instructions under the right (wrong?) conditions. Those stories aren’t reasons to avoid AI entirely. But they’re strong arguments for never putting it in sole control of your deployment gates.&lt;/p&gt;

&lt;h2 id=&quot;final-thoughts&quot;&gt;Final Thoughts&lt;/h2&gt;

&lt;p&gt;The question in 2026 isn’t “should we use AI in application security?” The marginal cost of additional signal is low, and the upside for developer experience is significant. The real question is &lt;em&gt;how&lt;/em&gt; we integrate it.&lt;/p&gt;

&lt;p&gt;If you keep deterministic rules as your baseline gate, use Claude primarily for triage, deploy it as a second net for additional coverage, and deliberately constrain where its variability can influence outcomes—you get the best of both worlds. You keep the guarantees and auditability that security and compliance teams require, while giving your engineers a much more usable experience on top of the tools they already know.&lt;/p&gt;

&lt;p&gt;That’s not about replacing “legacy” tooling. It’s about surrounding those tools with enough intelligence that they finally deliver on their original promise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ready to try it?&lt;/strong&gt; Claude Code Security is currently in limited research preview for &lt;a href=&quot;https://www.anthropic.com/news/claude-code-security&quot;&gt;Anthropic Enterprise and Team customers&lt;/a&gt;. Access it through the Claude Code web interface, where you can scan repositories, review findings in the dashboard, and approve suggested patches—all within the tools you already use. Open-source maintainers can also apply for free, expedited access.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have questions or want to share how you’re approaching AI in your security stack? Drop me a note—I’m always interested in hearing what’s working (and what isn’t) in production environments.&lt;/em&gt;&lt;/p&gt;
</description>
        <pubDate>Sat, 21 Feb 2026 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/claude-code-security-the-smart-way-to-integrate-ai</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/claude-code-security-the-smart-way-to-integrate-ai</guid>
        
        <category>Claude Code Security</category>
        
        <category>AI Security</category>
        
        <category>DevSecOps</category>
        
        <category>AppSec</category>
        
        <category>Vulnerability Detection</category>
        
        <category>SAST</category>
        
        <category>CI/CD Security</category>
        
        
      </item>
    
      <item>
        <title>How I Built a Semantic Cache Using Only AWS Services</title>
        <description>&lt;p&gt;LLM calls are expensive and slow, but here’s the thing - users ask the same questions in different ways all the time. “What’s your refund policy?” and “How do I get my money back?” are different strings but the same question. Without semantic caching, you’re paying full price to answer identical questions over and over again.&lt;/p&gt;

&lt;p&gt;I spent a weekend building a semantic cache that matches queries by meaning, not exact text - using only AWS-native services. S3 Vectors for similarity search, Bedrock for embeddings and LLM, Lambda for compute. Fully serverless, no external dependencies. The result? Cache hits that are 10x faster and cost nothing compared to calling the LLM.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2026/01/semantic_cache_architecture_updated.png&quot; alt=&quot;Semantic Cache Architecture&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;the-problem&quot;&gt;The Problem&lt;/h2&gt;

&lt;p&gt;Every call to Amazon Bedrock costs money and takes 1-3 seconds. But here’s the thing - a huge chunk of queries are semantically identical to ones you’ve already answered. You’re literally paying to answer the same question over and over.&lt;/p&gt;

&lt;h2 id=&quot;the-solution&quot;&gt;The Solution&lt;/h2&gt;

&lt;p&gt;Instead of matching exact strings, I used vector embeddings to match &lt;em&gt;meaning&lt;/em&gt;. When a new query comes in:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Convert the query into a vector embedding (using Titan V2)&lt;/li&gt;
  &lt;li&gt;Search for similar queries in the cache (using S3 Vectors)&lt;/li&gt;
  &lt;li&gt;If similarity is above 85%, return the cached response&lt;/li&gt;
  &lt;li&gt;Otherwise, call the LLM and cache the result for next time&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Simple concept. The trick was making it work with AWS-native services only.&lt;/p&gt;

&lt;h2 id=&quot;the-tech-stack&quot;&gt;The Tech Stack&lt;/h2&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Component&lt;/th&gt;
      &lt;th&gt;Service&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Vector Storage&lt;/td&gt;
      &lt;td&gt;Amazon S3 Vectors&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Embeddings&lt;/td&gt;
      &lt;td&gt;Bedrock Titan V2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;LLM&lt;/td&gt;
      &lt;td&gt;Bedrock Claude Haiku 4.5&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Compute&lt;/td&gt;
      &lt;td&gt;Lambda&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;API&lt;/td&gt;
      &lt;td&gt;API Gateway HTTP API&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The best part? It’s fully serverless. No baseline costs. Pay only for what you use.&lt;/p&gt;

&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;

&lt;p&gt;After running some tests:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Cache hits are ~10x faster&lt;/strong&gt; than calling the LLM&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Semantic matching works&lt;/strong&gt; - “capital of France” matches “France’s capital city”&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Graceful degradation&lt;/strong&gt; - if the cache fails, it falls back to the LLM&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;what-i-learned&quot;&gt;What I Learned&lt;/h2&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;S3 Vectors is underrated&lt;/strong&gt; - native similarity search without managing infrastructure&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Serverless means fast startup&lt;/strong&gt; - requests start processing in ~300ms&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Similarity threshold matters&lt;/strong&gt; - 0.85 worked well to avoid false matches while still catching rephrased questions&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;try-it-yourself&quot;&gt;Try It Yourself&lt;/h2&gt;

&lt;p&gt;The complete code is available on GitHub. One-click deploy, one-click cleanup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href=&quot;https://github.com/sprider/semantic-cache-demo&quot;&gt;github.com/sprider/semantic-cache-demo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The repo includes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Full infrastructure as code (ready to deploy)&lt;/li&gt;
  &lt;li&gt;71 unit tests&lt;/li&gt;
  &lt;li&gt;One-click deploy and cleanup scripts&lt;/li&gt;
  &lt;li&gt;Architecture diagrams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fair warning: it creates AWS resources that cost money. But the scripts make cleanup easy, and a few hours of testing costs less than a dollar.&lt;/p&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s Next?&lt;/h2&gt;

&lt;p&gt;This is a demo, not production-ready code. For real use, you’d want:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;API authentication&lt;/li&gt;
  &lt;li&gt;Cache invalidation strategy&lt;/li&gt;
  &lt;li&gt;Multi-region deployment&lt;/li&gt;
  &lt;li&gt;Better observability&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;design-alternatives&quot;&gt;Design alternatives&lt;/h2&gt;

&lt;p&gt;This demo uses S3 Vectors only for the cache layer. S3 Vectors has its own trade-offs (e.g. no built-in TTL, 40 KB metadata limit per vector). Combining S3 Vectors with DynamoDB—for example, storing vectors in S3 Vectors for similarity search and payloads or TTL in DynamoDB—lets you design differently for larger payloads, expiry, or exact-key lookups without changing the core flow shown here.&lt;/p&gt;

&lt;p&gt;But as a proof of concept? It works. And it’s a pattern worth knowing.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Questions? Found a bug? Open an issue on the repo. Happy to chat about semantic caching, AWS architecture, or why vector databases are the future.&lt;/em&gt;&lt;/p&gt;
</description>
        <pubDate>Sun, 25 Jan 2026 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/semantic-cache-aws-services</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/semantic-cache-aws-services</guid>
        
        <category>AWS</category>
        
        <category>Semantic Cache</category>
        
        <category>S3 Vectors</category>
        
        <category>Bedrock</category>
        
        <category>Serverless</category>
        
        <category>LLM Optimization</category>
        
        
      </item>
    
      <item>
        <title>How to Build Better AI Agent Tools: Cut Costs by 70% (MCP Server Case Study)</title>
        <description>&lt;p&gt;Building tools for AI agents isn’t the same as building regular APIs. This guide shows you how to design tools that reduce token costs by 60-70% while improving accuracy. Whether you’re building Model Context Protocol (MCP) servers, LangChain tools, or custom agent functions—these principles apply.&lt;/p&gt;

&lt;h2 id=&quot;quick-take&quot;&gt;Quick Take&lt;/h2&gt;

&lt;p&gt;I reduced my AI tool count from 30 to 8 (73% reduction) and cut token usage by 60-70% per response. This guide shows you how to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Consolidate tools using action parameters&lt;/li&gt;
  &lt;li&gt;Optimize response formats to reduce costs&lt;/li&gt;
  &lt;li&gt;Write tool descriptions that AI agents understand&lt;/li&gt;
  &lt;li&gt;Avoid common pitfalls in AI tool design&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;the-problem-why-multiple-ai-tools-are-costing-you-money&quot;&gt;The Problem: Why Multiple AI Tools Are Costing You Money&lt;/h2&gt;

&lt;p&gt;I thought I was being smart when I built 30 separate tools for my AI agent. Each tool did exactly one thing. Clean. Organized. Professional.&lt;/p&gt;

&lt;p&gt;Then I got the token bill.&lt;/p&gt;

&lt;p&gt;And watched my AI agent call the wrong tool 38% of test queries—calling three tools when it only needed one, requesting detailed responses when summaries would work, and burning through my budget.&lt;/p&gt;

&lt;p&gt;Here’s what happened: I was building a Model Context Protocol (MCP) server for SharePoint integration, and I did what seemed logical: create one tool for every API endpoint. Need to get site info? That’s a tool. Need to list subsites? Another tool. Need to search? Yet another tool.&lt;/p&gt;

&lt;p&gt;I ended up with 30 tools. It seemed organized on paper.&lt;/p&gt;

&lt;p&gt;But when I tested it, the reality hit hard. The AI agent kept making mistakes. It would call the wrong tool, or call three tools when it only needed one. And the token costs? They were way higher than expected.&lt;/p&gt;

&lt;h2 id=&quot;the-solution-consolidating-ai-tools-for-better-performance&quot;&gt;The Solution: Consolidating AI Tools for Better Performance&lt;/h2&gt;

&lt;p&gt;I took a step back and asked: “What are people actually trying to do?”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The key insight: Think about tasks, not API endpoints.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of wrapping each API call in its own tool, I focused on what users were trying to accomplish. This single mindset shift changed everything.&lt;/p&gt;

&lt;p&gt;I combined 30 tools into 8. That’s a 73% reduction. Here’s what it looked like:&lt;/p&gt;

&lt;h3 id=&quot;visual-the-transformation&quot;&gt;Visual: The Transformation&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2026/01/mcp-tool-consolidation-strategy.png&quot; alt=&quot;MCP Tool Consolidation Strategy&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;before&quot;&gt;Before&lt;/h3&gt;

&lt;p&gt;❌ get_site_info&lt;br /&gt;
❌ get_site_lists&lt;br /&gt;
❌ get_site_libraries&lt;br /&gt;
❌ get_site_pages&lt;br /&gt;
❌ search_sites&lt;br /&gt;
… (25 more tools)&lt;/p&gt;

&lt;h3 id=&quot;after&quot;&gt;After&lt;/h3&gt;

&lt;p&gt;✅ sharepoint_site (actions: get_info, list_subsites, search)&lt;br /&gt;
✅ sharepoint_list (actions: get_lists, get_items, create_item)&lt;br /&gt;
✅ sharepoint_files (actions: search, get_metadata, download)&lt;/p&gt;

&lt;h2 id=&quot;two-small-changes-that-made-a-big-difference&quot;&gt;Two Small Changes That Made a Big Difference&lt;/h2&gt;

&lt;h3 id=&quot;1-action-parameter---one-tool-can-do-multiple-things&quot;&gt;1. Action parameter - One tool can do multiple things&lt;/h3&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Literal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;get_info&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;list_subsites&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;search&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;2-response-format-parameter---control-how-much-detail-you-get-back&quot;&gt;2. Response format parameter - Control how much detail you get back&lt;/h3&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;response_format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Literal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;concise&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;detailed&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;concise&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;how-token-costs-impact-ai-development-budget&quot;&gt;How Token Costs Impact AI Development Budget&lt;/h2&gt;

&lt;p&gt;Every API call your AI agent makes costs money. When your agent calls the wrong tool or requests more data than needed, those costs add up fast.&lt;/p&gt;

&lt;h3 id=&quot;tokens-are-expensive&quot;&gt;Tokens Are Expensive&lt;/h3&gt;

&lt;p&gt;Here’s a real example from my SharePoint server that made me rethink everything. When you ask for a site’s information, you can get back a lot of detail:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Detailed response (~280 tokens):&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;@odata.context&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;https://graph.microsoft.com/v1.0/$metadata#sites/$entity&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;@microsoft.graph.tips&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;Use $select to choose only the properties your app needs, as this can lead to performance improvements. For example: GET sites(&apos;&amp;lt;key&amp;gt;&apos;)/microsoft.graph.getByPath(path=&amp;lt;key&amp;gt;)?$select=displayName,error&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;createdDateTime&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;2025-04-12T16:40:22.963Z&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;description&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;A centralized repository for accessing country-specific HR policies and procedures across ACME Corporation&apos;s global operations.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;spridermvp.sharepoint.com,506b7692-04ba-4be9-afc6-df146925948b,c7f4ceb0-f301-4280-8cc4-a8dba8560b64&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;lastModifiedDateTime&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;2026-01-24T13:42:56Z&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;acme-global-hr-policies-portal&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;webUrl&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;https://spridermvp.sharepoint.com/sites/acme-global-hr-policies-portal&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;displayName&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;ACME Global HR Policies Portal&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;root&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{},&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;siteCollection&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
        &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;hostname&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;spridermvp.sharepoint.com&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Concise response (~88 tokens):&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;description&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;A centralized repository for accessing country-specific HR policies and procedures across ACME Corporation&apos;s global operations.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;lastModifiedDateTime&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;2026-01-24T13:42:56Z&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;name&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;acme-global-hr-policies-portal&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;webUrl&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;https://spridermvp.sharepoint.com/sites/acme-global-hr-policies-portal&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Most of the time, you just need the name and URL. You don’t need all those IDs and timestamps. So I made &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;concise&lt;/code&gt; the default. If the agent needs the technical details for a follow-up call, it can ask for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;detailed&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This approach can reduce token usage by 60-70% per response.&lt;/p&gt;

&lt;h2 id=&quot;how-does-the-ai-know-which-action-and-format-to-use&quot;&gt;How Does the AI Know Which Action and Format to Use?&lt;/h2&gt;

&lt;p&gt;You might be wondering: “How does the AI agent pick the right action and response format?”&lt;/p&gt;

&lt;h3 id=&quot;the-tool-description-pattern&quot;&gt;The Tool Description Pattern&lt;/h3&gt;

&lt;p&gt;The answer is in your tool description. The AI reads it like instructions. Here’s an example:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nd&quot;&gt;@mcp.tool&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;sharepoint_site&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Literal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;get_info&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;list_subsites&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;search&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;site_url&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;response_format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Literal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;concise&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;detailed&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;concise&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;
    Work with SharePoint sites.
    
    Actions:
    - get_info: Get details about a specific site (requires site_url)
    - list_subsites: List all subsites under a parent site (requires site_url)
    - search: Find sites matching a query (requires query)
    
    Response formats:
    - concise: Returns only essential information (names, titles, URLs)
    - detailed: Returns full metadata including IDs for follow-up operations
    
    Use &lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;detailed&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt; only when you need technical IDs for subsequent tool calls.
    &lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;example-walkthrough-finding-a-marketing-site&quot;&gt;Example Walkthrough: Finding a Marketing Site&lt;/h3&gt;

&lt;p&gt;When a user asks “Find the marketing site”, the AI:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Reads the tool description&lt;/li&gt;
  &lt;li&gt;Sees that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;search&lt;/code&gt; action requires a query&lt;/li&gt;
  &lt;li&gt;Picks &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;action=&quot;search&quot;&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;query=&quot;marketing&quot;&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Uses default &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;response_format=&quot;concise&quot;&lt;/code&gt; since it just needs to show results&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;example-walkthrough-fetching-documents&quot;&gt;Example Walkthrough: Fetching Documents&lt;/h3&gt;

&lt;p&gt;If the user then says “Get all the documents from that site”, the AI:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Remembers it needs the site ID for the next call&lt;/li&gt;
  &lt;li&gt;Goes back and calls the same tool with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;response_format=&quot;detailed&quot;&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Gets the technical IDs it needs&lt;/li&gt;
  &lt;li&gt;Uses those IDs in the next tool call&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;the-key-principle&quot;&gt;The Key Principle&lt;/h3&gt;

&lt;p&gt;💡 &lt;strong&gt;Key Insight&lt;/strong&gt;: The AI isn’t magic—it’s following your instructions. The better you explain what each action does and when to use each format, the better it performs.&lt;/p&gt;

&lt;h2 id=&quot;the-tradeoffs&quot;&gt;The Tradeoffs&lt;/h2&gt;

&lt;p&gt;Nothing is perfect. Here are the downsides I ran into:&lt;/p&gt;

&lt;h3 id=&quot;1-more-complex-tool-descriptions&quot;&gt;1. More Complex Tool Descriptions&lt;/h3&gt;

&lt;p&gt;Before, each tool was simple: “Get site info.” Done.&lt;/p&gt;

&lt;p&gt;Now, I have to explain multiple actions in one description. The tool description got longer. If you have 5-6 actions in one tool, it can get messy and the AI might get confused.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My rule:&lt;/strong&gt; Keep it to 3-4 actions max per tool. If you need more, split it into two tools.&lt;/p&gt;

&lt;h3 id=&quot;2-harder-to-debug&quot;&gt;2. Harder to Debug&lt;/h3&gt;

&lt;p&gt;When something goes wrong, it’s trickier to figure out what happened. With 30 separate tools, if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;get_site_info&lt;/code&gt; failed, I knew exactly where to look.&lt;/p&gt;

&lt;p&gt;Now, if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sharepoint_site&lt;/code&gt; fails, I have to check: Which action was called? What parameters were passed? Was it a problem with the action logic or the parameter validation?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My solution:&lt;/strong&gt; Add detailed logging for each action within the tool. Log the action name, parameters, and response format every time.&lt;/p&gt;

&lt;h3 id=&quot;3-the-ai-can-still-pick-wrong&quot;&gt;3. The AI Can Still Pick Wrong&lt;/h3&gt;

&lt;p&gt;Even with clear descriptions, the AI sometimes picks the wrong action or forgets to use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;detailed&lt;/code&gt; when it needs IDs for the next call.&lt;/p&gt;

&lt;p&gt;This happens maybe 5-10% of the time. It’s better than the 38% error rate I had with 30 tools (where test queries resulted in wrong tool selection), but it’s not zero.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What helps:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add examples in your tool description&lt;/li&gt;
  &lt;li&gt;Test with real user queries&lt;/li&gt;
  &lt;li&gt;Use clear parameter names (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;site_url&lt;/code&gt; not just &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;url&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;4-not-every-tool-should-be-consolidated&quot;&gt;4. Not Every Tool Should Be Consolidated&lt;/h3&gt;

&lt;p&gt;Some tools are better left separate. If two operations are completely different and rarely used together, don’t force them into one tool just to reduce the count.&lt;/p&gt;

&lt;p&gt;For example, I kept &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;user_profile&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;user_search&lt;/code&gt; as separate tools. They serve different purposes and combining them would make the description confusing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The test:&lt;/strong&gt; Ask yourself: “Would a person naturally think these actions belong together?” If not, keep them separate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reflection point:&lt;/strong&gt; Which of these tradeoffs concerns you most for your use case? The debugging complexity or the risk of AI confusion?&lt;/p&gt;

&lt;h2 id=&quot;when-this-approach-works-best&quot;&gt;When This Approach Works Best&lt;/h2&gt;

&lt;p&gt;This works great when:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;You have multiple tools that operate on the same resource (sites, files, users)&lt;/li&gt;
  &lt;li&gt;The actions are related and often used in sequence&lt;/li&gt;
  &lt;li&gt;You’re dealing with high token costs&lt;/li&gt;
  &lt;li&gt;Your users do varied tasks (not just one specific workflow)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This might not work if:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;You have very specialized, single-purpose tools&lt;/li&gt;
  &lt;li&gt;Each tool has completely different parameters&lt;/li&gt;
  &lt;li&gt;You need extremely precise error handling for each operation&lt;/li&gt;
  &lt;li&gt;Your users only do one or two specific tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;what-i-learned-key-principles-for-ai-tool-design&quot;&gt;What I Learned: Key Principles for AI Tool Design&lt;/h2&gt;

&lt;h3 id=&quot;1-think-about-tasks-not-api-endpoints-most-important&quot;&gt;1. Think about tasks, not API endpoints (Most Important!)&lt;/h3&gt;

&lt;p&gt;Don’t just wrap your API. Think about what people are trying to accomplish. This is the most important principle that drives everything else.&lt;/p&gt;

&lt;p&gt;❌ Three separate tools: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;list_users&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;list_events&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;create_event&lt;/code&gt;&lt;br /&gt;
✅ One tool: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;schedule_event&lt;/code&gt; (finds availability and creates the event)&lt;/p&gt;

&lt;h3 id=&quot;2-return-information-people-can-actually-read&quot;&gt;2. Return information people can actually read&lt;/h3&gt;

&lt;p&gt;AI agents do better with names than with cryptic IDs.&lt;/p&gt;

&lt;p&gt;❌ &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;user_uuid: &quot;e1b2c3d4-e5f6-7890&quot;&lt;/code&gt;&lt;br /&gt;
✅ &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;user_name: &quot;Sarah Chen, Engineering Manager&quot;&lt;/code&gt;&lt;/p&gt;

&lt;h3 id=&quot;3-use-smart-defaults&quot;&gt;3. Use smart defaults&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Start with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;concise&lt;/code&gt; responses&lt;/li&gt;
  &lt;li&gt;Add pagination (I limit responses to 25,000 tokens)&lt;/li&gt;
  &lt;li&gt;Let agents filter results to get exactly what they need&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;4-write-tool-descriptions-like-youre-explaining-to-a-coworker&quot;&gt;4. Write tool descriptions like you’re explaining to a coworker&lt;/h3&gt;

&lt;p&gt;The AI reads your tool description. Make it clear and helpful.&lt;/p&gt;

&lt;p&gt;❌ “Searches SharePoint”&lt;br /&gt;
✅ “Search across SharePoint sites, documents, and lists. Use filters to narrow results. Returns top 10 matches by default.”&lt;/p&gt;

&lt;h2 id=&quot;the-results&quot;&gt;The Results&lt;/h2&gt;

&lt;p&gt;Based on the consolidation and MCP best practices:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;Impact&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Total Tools&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;73% reduction (30 → 8)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Token Efficiency&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;~70% fewer tokens per response&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Agent Performance&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;Faster tool selection, fewer errors&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Monthly Cost Savings&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;50-80% reduction (varies by query complexity)*&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;em&gt;Note: These are projected savings based on tool consolidation and response format optimization. Actual results will vary depending on your specific use cases and query patterns.&lt;/em&gt;&lt;/p&gt;

&lt;h2 id=&quot;how-to-do-this-yourself&quot;&gt;How to Do This Yourself&lt;/h2&gt;

&lt;p&gt;Here’s a basic template you can use:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;enum&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Enum&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;typing&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Literal&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ResponseFormat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Enum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;DETAILED&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;detailed&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;CONCISE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;concise&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;@mcp.tool&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;my_action_tool&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Literal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;search&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;response_format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ResponseFormat&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ResponseFormat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CONCISE&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;
    Multi-purpose tool for [resource].
    
    Actions:
    - search: Find items matching query
    - get: Retrieve specific item details
    - list: Show all available items
    
    Use &lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;concise&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt; for human-readable summaries.
    Use &lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;detailed&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt; when you need IDs for follow-up calls.
    &lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&quot;&quot;&lt;/span&gt;
    
    &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;perform_action&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;response_format&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ResponseFormat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CONCISE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;format_concise&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;format_detailed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;quick-summary&quot;&gt;Quick Summary&lt;/h2&gt;

&lt;p&gt;Before you dive in, here’s the roadmap:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Combine related tools&lt;/strong&gt; using action parameters → reduces tool count and confusion&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Add a response_format option&lt;/strong&gt; (concise vs detailed) → cuts token usage by 60-70%&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Default to concise&lt;/strong&gt; to save tokens → agents request detailed only when needed&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Return human-readable information&lt;/strong&gt;, not just IDs → improves agent decision-making&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Write clear tool descriptions&lt;/strong&gt; → think of them as instructions for a coworker&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Test with real tasks&lt;/strong&gt; and measure results → validate your optimizations&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;other-token-reduction-techniques&quot;&gt;Other Token Reduction Techniques&lt;/h2&gt;

&lt;p&gt;Beyond tool design, consider these approaches:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://www.tensorlake.ai/blog/toon-vs-json&quot;&gt;TOON Format&lt;/a&gt;&lt;/strong&gt; – A JSON alternative designed for LLMs, reducing tokens by 30-60%&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching&quot;&gt;Prompt Caching&lt;/a&gt;&lt;/strong&gt; – Cache repeated context for 75% cheaper tokens&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://medium.com/elementor-engineers/optimizing-token-usage-in-agent-based-assistants-ffd1822ece9c&quot;&gt;Model Cascading&lt;/a&gt;&lt;/strong&gt; – Use cheaper models for simple tasks, up to 90% savings&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://www.flowhunt.io/blog/context-engineering-ai-agents-token-optimization/&quot;&gt;RAG&lt;/a&gt;&lt;/strong&gt; – Retrieve only relevant context instead of full documents&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;want-to-learn-more&quot;&gt;Want to Learn More?&lt;/h2&gt;

&lt;p&gt;The official MCP documentation has a great guide on this topic: &lt;a href=&quot;https://modelcontextprotocol.io/docs/tools/best-practices&quot;&gt;Writing Effective Tools for Agents&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;take-action&quot;&gt;Take Action&lt;/h2&gt;

&lt;p&gt;Ready to optimize your AI tools?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Next Steps:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Audit your current tools - how many could be combined?&lt;/li&gt;
  &lt;li&gt;Identify which tools could benefit from response format options&lt;/li&gt;
  &lt;li&gt;Start with your highest-traffic tools for maximum impact&lt;/li&gt;
  &lt;li&gt;Measure token usage before and after&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Questions or feedback?&lt;/strong&gt; I’d love to hear about your optimization results or challenges you’re facing. What’s your tool count, and which optimization would help your use case most?&lt;/p&gt;

&lt;h2 id=&quot;the-bottom-line&quot;&gt;The Bottom Line&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The key insight: Think about tasks, not API endpoints.&lt;/strong&gt; This single principle drives everything else in AI tool design.&lt;/p&gt;

&lt;p&gt;Building tools for AI isn’t the same as building regular APIs. I cut my tool count by 73% and this approach can reduce token usage by 60-70% per response, depending on the data complexity. The agent worked better, costs went down, and maintenance became simpler.&lt;/p&gt;

&lt;p&gt;Sometimes less really is more.&lt;/p&gt;
</description>
        <pubDate>Sat, 24 Jan 2026 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/ai-tool-optimization-guide-mcp-server-case-study</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/ai-tool-optimization-guide-mcp-server-case-study</guid>
        
        <category>AI Optimization</category>
        
        <category>MCP Server</category>
        
        <category>Token Costs</category>
        
        <category>AI Tools</category>
        
        <category>Cost Reduction</category>
        
        
      </item>
    
      <item>
        <title>Building a DevSecOps Pipeline on AWS (And You Can Too)</title>
        <description>&lt;p&gt;I have been working with CI/CD pipelines for a while now, and honestly, most of them just focus on getting code deployed fast. But what about security? That is usually an afterthought. So I decided to build something different—a platform where security checks happen automatically at every step.&lt;/p&gt;

&lt;h2 id=&quot;why-i-did-this&quot;&gt;Why I Did This&lt;/h2&gt;

&lt;p&gt;Look, pushing code fast is great until you realize you just deployed a vulnerability to production. I needed something that could:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Scan for security issues before deployment&lt;/li&gt;
  &lt;li&gt;Block builds that do not meet security standards&lt;/li&gt;
  &lt;li&gt;Keep an audit trail (because compliance audits are fun, right?)&lt;/li&gt;
  &lt;li&gt;Run without me babysitting it&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;what-is-inside&quot;&gt;What is Inside&lt;/h2&gt;

&lt;p&gt;I built this on AWS using EKS on Fargate. No EC2 instances to patch, which is nice. The whole thing runs on a custom VPC with multi-AZ setup for redundancy.&lt;/p&gt;

&lt;p&gt;Here is how it works:&lt;/p&gt;

&lt;p&gt;Every time code gets pushed, CodePipeline kicks off. The build stage runs security scans—SBOM generation (Syft), container vulnerability scanning (Trivy/Grype), SAST checks (Semgrep), secrets detection (detect-secrets), and OPA policy validation. If anything fails, the pipeline stops. No exceptions.&lt;/p&gt;

&lt;p&gt;I intentionally picked open-source tools for the security gates. This keeps costs down and makes the whole setup reproducible without vendor lock-in. You can swap them out for commercial alternatives if you want, but these work great.&lt;/p&gt;

&lt;p&gt;For access, I’m using Cognito for auth and WAF sits in front of the ALB to block sketchy traffic. CloudWatch alarms watch for anything weird—security events, performance drops, unexpected costs.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2026/01/aws-devsecops-pipeline-architecture.png&quot; alt=&quot;AWS DevSecOps Pipeline Architecture&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;what-i-learned&quot;&gt;What I Learned&lt;/h2&gt;

&lt;p&gt;The automated scans actually caught stuff I missed. SBOM generation showed me I had some old dependencies with known CVEs that I did not even know were there.&lt;/p&gt;

&lt;p&gt;Running on Fargate removed a lot of headaches. No patching EC2 instances, no worrying about the control plane. I just focus on securing my containers.&lt;/p&gt;

&lt;p&gt;OPA policies are great once you write them. They enforce the same rules on every deployment without me having to remember anything.&lt;/p&gt;

&lt;p&gt;Terraform makes this whole thing reproducible. I can destroy everything and rebuild it in 30 minutes flat. No clicking around in the console.&lt;/p&gt;

&lt;p&gt;One thing to note: some verification steps need manual commands (like checking EKS addons or testing WAF rules). I kept these manual instead of fully automating them because they are useful for learning. You get to see exactly what is happening at each step. Once you are comfortable, you can script them if you want.&lt;/p&gt;

&lt;h2 id=&quot;what-it-costs&quot;&gt;What It Costs&lt;/h2&gt;

&lt;p&gt;I tested this for a while then ran &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;terraform destroy&lt;/code&gt; to clean up. While it was running, costs were around $200-300/month. That is mostly the EKS control plane, Fargate pods, ALB, and NAT gateways. Not cheap for a demo, but reasonable for a production workload with this much security built in.&lt;/p&gt;

&lt;h2 id=&quot;check-out-the-code&quot;&gt;Check Out the Code&lt;/h2&gt;

&lt;p&gt;I put everything on GitHub: &lt;a href=&quot;https://github.com/sprider/aws-devsecops-demo&quot;&gt;https://github.com/sprider/aws-devsecops-demo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The repo has the full deployment guide, architecture diagrams, security configs, and screenshots from when I deployed it. I masked all the sensitive stuff so you can clone it and try it yourself.&lt;/p&gt;

&lt;h2 id=&quot;who-is-this-for&quot;&gt;Who is This For&lt;/h2&gt;

&lt;p&gt;This is not a perfect production-ready solution. There are things I would do differently for a real enterprise setup. But if you are trying to understand how to build a secure CI/CD pipeline or want a reference implementation to learn from, this is a solid starting point.&lt;/p&gt;

&lt;p&gt;It is useful if you are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Learning AWS security patterns&lt;/li&gt;
  &lt;li&gt;Building a reference pipeline for your team&lt;/li&gt;
  &lt;li&gt;Setting up security automation&lt;/li&gt;
  &lt;li&gt;Prepping for SOC2 or ISO 27001 audits&lt;/li&gt;
  &lt;li&gt;Understanding how security gates fit together&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;what-you-could-add&quot;&gt;What You Could Add&lt;/h2&gt;

&lt;p&gt;If you want to extend this setup, here are some ideas worth exploring:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Multi-region setup for DR&lt;/li&gt;
  &lt;li&gt;GitOps with ArgoCD&lt;/li&gt;
  &lt;li&gt;GuardDuty integration&lt;/li&gt;
  &lt;li&gt;Spot instances to cut costs&lt;/li&gt;
  &lt;li&gt;Runtime security monitoring with Falco&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Clone it, break it, improve it. That is how you learn.&lt;/p&gt;
</description>
        <pubDate>Sun, 04 Jan 2026 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/aws-devsecops-pipeline</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/aws-devsecops-pipeline</guid>
        
        <category>AWS</category>
        
        <category>DevSecOps</category>
        
        <category>Kubernetes</category>
        
        <category>EKS</category>
        
        <category>Terraform</category>
        
        <category>CloudSecurity</category>
        
        <category>CICD</category>
        
        
      </item>
    
      <item>
        <title>AWS DevOps Agent: AI-Powered Incident Investigation in Seconds</title>
        <description>&lt;p&gt;Stop spending 30 minutes investigating incidents. Let AI do it in seconds. Here is a hands-on demo you can practice in 15 minutes.&lt;/p&gt;

&lt;h2 id=&quot;the-problem&quot;&gt;The Problem&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;3 AM. Production is down. You are doing this:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Open CloudWatch → Check metrics&lt;/li&gt;
  &lt;li&gt;Open Datadog → Review traces&lt;/li&gt;
  &lt;li&gt;Open Splunk → Search logs&lt;/li&gt;
  &lt;li&gt;Check GitHub → Find recent deployments&lt;/li&gt;
  &lt;li&gt;Correlate everything manually → Find root cause&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Time:&lt;/strong&gt; 20-40 minutes of context switching and log correlation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if AI could do all of this in seconds?&lt;/strong&gt;&lt;/p&gt;

&lt;h2 id=&quot;the-solution-aws-devops-agent&quot;&gt;The Solution: AWS DevOps Agent&lt;/h2&gt;

&lt;p&gt;Announced at &lt;strong&gt;AWS re:Invent 2025&lt;/strong&gt;, AWS DevOps Agent is an AI service that automatically investigates incidents by:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Analyzing logs, metrics, and traces across multiple tools&lt;/li&gt;
  &lt;li&gt;Mapping infrastructure dependencies automatically&lt;/li&gt;
  &lt;li&gt;Recommending fixes to prevent future incidents&lt;/li&gt;
  &lt;li&gt;Integrating with your existing DevOps stack&lt;/li&gt;
&lt;/ul&gt;

&lt;table&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Status:&lt;/strong&gt; Public preview (us-east-1)&lt;/td&gt;
      &lt;td&gt;Free during preview&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;who-should-use-this&quot;&gt;Who Should Use This?&lt;/h2&gt;

&lt;h3 id=&quot;perfect-for&quot;&gt;Perfect For&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;On-call engineers&lt;/strong&gt; who spend hours investigating incidents&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;SREs&lt;/strong&gt; managing complex distributed systems&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Platform teams&lt;/strong&gt; running multi-account AWS environments&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;DevOps engineers&lt;/strong&gt; correlating deployments with failures&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;skip-if&quot;&gt;Skip If&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Simple applications with clear failure modes&lt;/li&gt;
  &lt;li&gt;Rarely experience incidents&lt;/li&gt;
  &lt;li&gt;Not heavily using AWS services&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;my-test-real-results&quot;&gt;My Test: Real Results&lt;/h2&gt;

&lt;p&gt;I deployed a Lambda function with an intentional error and let the AI investigate.&lt;/p&gt;

&lt;h3 id=&quot;setup&quot;&gt;Setup&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Lambda function with division-by-zero error&lt;/li&gt;
  &lt;li&gt;CloudWatch alarm monitoring failures&lt;/li&gt;
  &lt;li&gt;3 error-generating invocations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;results&quot;&gt;Results&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What the AI found in seconds:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Lambda function contains intentional test code that throws ZeroDivisionError at line 9 in lambda_test.py with the literal expression ‘result = 1 / 0’. This is not a production bug but an expected test behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What impressed me:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Context-aware&lt;/strong&gt;: Understood it was test code, not a bug&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Complete timeline&lt;/strong&gt;: Linked deployment time to first error&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Exact location&lt;/strong&gt;: Found the error on line 9&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Impact analysis&lt;/strong&gt;: Calculated 100% failure rate&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Fast&lt;/strong&gt;: AI analysis in seconds + 4 minutes total&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;before-vs-after&quot;&gt;Before vs After&lt;/h3&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Task&lt;/th&gt;
      &lt;th&gt;Manual&lt;/th&gt;
      &lt;th&gt;AI Agent&lt;/th&gt;
      &lt;th&gt;Savings&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Check metrics&lt;/td&gt;
      &lt;td&gt;2-3 min&lt;/td&gt;
      &lt;td&gt;Auto&lt;/td&gt;
      &lt;td&gt;100%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Review logs&lt;/td&gt;
      &lt;td&gt;3-5 min&lt;/td&gt;
      &lt;td&gt;Auto&lt;/td&gt;
      &lt;td&gt;100%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Check deployments&lt;/td&gt;
      &lt;td&gt;5-10 min&lt;/td&gt;
      &lt;td&gt;Auto&lt;/td&gt;
      &lt;td&gt;100%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Correlate timeline&lt;/td&gt;
      &lt;td&gt;5-10 min&lt;/td&gt;
      &lt;td&gt;Auto&lt;/td&gt;
      &lt;td&gt;100%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Root cause&lt;/td&gt;
      &lt;td&gt;5-10 min&lt;/td&gt;
      &lt;td&gt;sec&lt;/td&gt;
      &lt;td&gt;90%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;20-40 min&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;~4 min&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;80-90%&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 id=&quot;three-core-features&quot;&gt;Three Core Features&lt;/h2&gt;

&lt;h3 id=&quot;1-ai-investigation&quot;&gt;1. AI Investigation&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Auto-triggers from:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;ServiceNow tickets&lt;/li&gt;
  &lt;li&gt;PagerDuty alerts&lt;/li&gt;
  &lt;li&gt;Datadog/Dynatrace/Splunk webhooks&lt;/li&gt;
  &lt;li&gt;Slack commands&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What it analyzes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;CloudWatch metrics, logs, alarms&lt;/li&gt;
  &lt;li&gt;Third-party observability data&lt;/li&gt;
  &lt;li&gt;Deployment history from GitHub/GitLab&lt;/li&gt;
  &lt;li&gt;Infrastructure topology&lt;/li&gt;
  &lt;li&gt;Historical incident patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Delivers:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Root cause with reasoning&lt;/li&gt;
  &lt;li&gt;Event timeline&lt;/li&gt;
  &lt;li&gt;Blast radius analysis&lt;/li&gt;
  &lt;li&gt;Mitigation steps&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;2-topology-discovery&quot;&gt;2. Topology Discovery&lt;/h3&gt;

&lt;p&gt;Automatically maps your AWS infrastructure:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Resources across all accounts&lt;/li&gt;
  &lt;li&gt;Service dependencies&lt;/li&gt;
  &lt;li&gt;Links to source code&lt;/li&gt;
  &lt;li&gt;Deployment history&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use it to:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Understand blast radius during incidents&lt;/li&gt;
  &lt;li&gt;See cascading failure patterns&lt;/li&gt;
  &lt;li&gt;Assess change impact&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;3-incident-prevention&quot;&gt;3. Incident Prevention&lt;/h3&gt;

&lt;p&gt;After analyzing multiple incidents, the AI recommends:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Observability&lt;/strong&gt;: “Add alarm for Lambda cold starts”&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Testing&lt;/strong&gt;: “Add load testing to pipeline”&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Code&lt;/strong&gt;: “Implement retry logic for API calls”&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt;: “Enable Multi-AZ for RDS”&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;integrations&quot;&gt;Integrations&lt;/h2&gt;

&lt;p&gt;Works with your existing tools:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability:&lt;/strong&gt; CloudWatch • Datadog • Dynatrace • New Relic • Splunk&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CI/CD:&lt;/strong&gt; GitHub • GitLab&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ticketing:&lt;/strong&gt; ServiceNow • PagerDuty&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chat:&lt;/strong&gt; Slack&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes:&lt;/strong&gt; Amazon EKS&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Custom:&lt;/strong&gt; MCP servers for proprietary tools&lt;/p&gt;

&lt;h2 id=&quot;try-it-15-minute-demo&quot;&gt;Try It: 15-Minute Demo&lt;/h2&gt;

&lt;p&gt;A hands-on demo using Terraform for infrastructure and manual Agent Space setup through the AWS Console.&lt;/p&gt;

&lt;h3 id=&quot;prerequisites&quot;&gt;Prerequisites&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;AWS account with admin access&lt;/li&gt;
  &lt;li&gt;AWS CLI v2 + Terraform installed&lt;/li&gt;
  &lt;li&gt;Region: us-east-1&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;quick-start&quot;&gt;Quick Start&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Clone &amp;amp; Deploy Infrastructure&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone https://github.com/sprider/aws-devops-agent-demo.git
&lt;span class=&quot;nb&quot;&gt;cd &lt;/span&gt;aws-devops-agent-demo
&lt;span class=&quot;nb&quot;&gt;chmod&lt;/span&gt; +x lambda-test.sh
./lambda-test.sh deploy
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This automatically creates:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Lambda function with intentional error&lt;/li&gt;
  &lt;li&gt;CloudWatch alarm&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/12/01-terraform-deploy.png&quot; alt=&quot;Terraform Deploy&quot; /&gt;
&lt;img src=&quot;/assets/images/posts/2025/12/02-terraform-output.png&quot; alt=&quot;Terraform Output&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Create Agent Space (Manual - AWS Console)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Agent Space must be created through the AWS Console to ensure proper Primary source configuration.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Open the &lt;a href=&quot;https://console.aws.amazon.com/aidevops/home?region=us-east-1&quot;&gt;AWS DevOps Agent Console&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Click &lt;strong&gt;“Begin setup”&lt;/strong&gt; or &lt;strong&gt;“Create Agent Space”&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Configure:
    &lt;ul&gt;
      &lt;li&gt;&lt;strong&gt;Name&lt;/strong&gt;: TestAgentSpace (or your preferred name)&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;Description&lt;/strong&gt;: Test Agent Space for Lambda error investigation demo&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Click &lt;strong&gt;“Create”&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/12/03-devops-agent-console.png&quot; alt=&quot;DevOps Agent Console&quot; /&gt;
&lt;img src=&quot;/assets/images/posts/2025/12/04-create-agent-space.png&quot; alt=&quot;Create Agent Space&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Configure Cloud Capabilities (Primary Source)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After Agent Space creation, configure AWS account access:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;In your Agent Space, go to &lt;strong&gt;“Settings”&lt;/strong&gt; → &lt;strong&gt;“Cloud capabilities”&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Click &lt;strong&gt;“Add cloud capability”&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Select &lt;strong&gt;“AWS”&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Choose &lt;strong&gt;“Primary source”&lt;/strong&gt; (not Secondary)&lt;/li&gt;
  &lt;li&gt;Configuration:
    &lt;ul&gt;
      &lt;li&gt;&lt;strong&gt;Account ID&lt;/strong&gt;: Your AWS account (from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;terraform output aws_account_id&lt;/code&gt;)&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;IAM Role&lt;/strong&gt;: Use &lt;strong&gt;“Auto-create role”&lt;/strong&gt; option&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Click &lt;strong&gt;“Add”&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/12/05-cloud-capabilities.png&quot; alt=&quot;Cloud Capabilities&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The IAM roles required for the DevOps Agent are automatically created by AWS when you select “Auto-create role” - you do not need to create them manually. The Primary source configuration ensures the agent can properly access CloudWatch alarms, Lambda logs, and other AWS resources needed for investigations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Generate Lambda Errors&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;./lambda-test.sh &lt;span class=&quot;nb&quot;&gt;test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/12/06-lambda-errors-generated.png&quot; alt=&quot;Lambda Errors Generated&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Wait for Alarm to Trigger&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After generating errors, wait 1-2 minutes for the CloudWatch alarm to evaluate and enter ALARM state:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;./lambda-test.sh status
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Wait until you see &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AlarmState: ALARM&lt;/code&gt; before proceeding to the next step.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/12/07-cloudwatch-alarm-triggered.png&quot; alt=&quot;CloudWatch Alarm Triggered&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Start Investigation&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;In the AWS DevOps Agent Console, click on your Agent Space name (e.g., &lt;strong&gt;“TestAgentSpace”&lt;/strong&gt;)&lt;/li&gt;
  &lt;li&gt;Click the &lt;strong&gt;“Incident Response”&lt;/strong&gt; tab&lt;/li&gt;
  &lt;li&gt;In the “Start an investigation” text box, type: &lt;strong&gt;Lambda function throwing errors&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Click &lt;strong&gt;“Start investigation”&lt;/strong&gt; button&lt;/li&gt;
  &lt;li&gt;A modal will appear - fill in the investigation details:
    &lt;ul&gt;
      &lt;li&gt;&lt;strong&gt;Investigation details&lt;/strong&gt;: Keep “Lambda function throwing errors”&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;Investigation starting point&lt;/strong&gt;: CloudWatch alarm AWS-AIDevOps-Lambda-Error-Test&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;Date and time of incident&lt;/strong&gt;: Get current time with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;date -u +&quot;%Y-%m-%dT%H:%M:%SZ&quot;&lt;/code&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Click &lt;strong&gt;“Start investigating…“&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/12/08-incident-response-dashboard.png&quot; alt=&quot;Start Investigation&quot; /&gt;
&lt;img src=&quot;/assets/images/posts/2025/12/10-investigation-details-modal.png&quot; alt=&quot;Investigation Details Modal&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Watch AI Work&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Watch the investigation in real-time. The AI will:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Detect the alarm&lt;/li&gt;
  &lt;li&gt;Pull Lambda logs&lt;/li&gt;
  &lt;li&gt;Identify ZeroDivisionError&lt;/li&gt;
  &lt;li&gt;Correlate deployment time&lt;/li&gt;
  &lt;li&gt;Provide root cause&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/12/11-investigation-in-progress.png&quot; alt=&quot;Investigation In Progress&quot; /&gt;
&lt;img src=&quot;/assets/images/posts/2025/12/12-investigation-completed.png&quot; alt=&quot;Investigation Completed&quot; /&gt;
&lt;img src=&quot;/assets/images/posts/2025/12/13-investigation-summary.png&quot; alt=&quot;Investigation Summary&quot; /&gt;
&lt;img src=&quot;/assets/images/posts/2025/12/14-mitigation-plan.png&quot; alt=&quot;Mitigation Plan&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Investigation time: In seconds&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8. Cleanup Everything&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;./lambda-test.sh destroy
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Manually delete the Agent Space and auto-created resources from the AWS Console before destroying infrastructure.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Delete Agent Space:&lt;/strong&gt;
    &lt;ul&gt;
      &lt;li&gt;Go to AWS DevOps Agent Console&lt;/li&gt;
      &lt;li&gt;Select your Agent Space&lt;/li&gt;
      &lt;li&gt;Click &lt;strong&gt;“Actions”&lt;/strong&gt; → &lt;strong&gt;“Delete Agent Space”&lt;/strong&gt;&lt;/li&gt;
      &lt;li&gt;Confirm deletion&lt;/li&gt;
      &lt;li&gt;Note: This automatically removes the IAM roles created by the Agent Space&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Delete Lambda Log Group:&lt;/strong&gt;
    &lt;ul&gt;
      &lt;li&gt;Go to CloudWatch Console → Log groups&lt;/li&gt;
      &lt;li&gt;Find &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/aws/lambda/AWS-AIDevOps-test-lambda&lt;/code&gt;&lt;/li&gt;
      &lt;li&gt;Select it and click &lt;strong&gt;“Actions”&lt;/strong&gt; → &lt;strong&gt;“Delete log group(s)”&lt;/strong&gt;&lt;/li&gt;
      &lt;li&gt;Confirm deletion&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Verify IAM Roles Cleanup (Optional):&lt;/strong&gt;
    &lt;ul&gt;
      &lt;li&gt;Go to IAM Console → Roles&lt;/li&gt;
      &lt;li&gt;Search for roles created by the Agent Space (they usually have “DevOpsAgent” or “AIDevOps” in the name)&lt;/li&gt;
      &lt;li&gt;These should be automatically deleted when the Agent Space is deleted&lt;/li&gt;
      &lt;li&gt;If any remain, manually delete them&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Then run: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;./lambda-test.sh destroy&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/12/15-terraform-destroy.png&quot; alt=&quot;Terraform Destroy&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;all-available-commands&quot;&gt;All Available Commands&lt;/h3&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;./lambda-test.sh deploy    &lt;span class=&quot;c&quot;&gt;# Deploy Lambda and CloudWatch alarm&lt;/span&gt;
./lambda-test.sh &lt;span class=&quot;nb&quot;&gt;test&lt;/span&gt;      &lt;span class=&quot;c&quot;&gt;# Generate Lambda errors (invoke 3 times)&lt;/span&gt;
./lambda-test.sh status    &lt;span class=&quot;c&quot;&gt;# Check CloudWatch alarm status&lt;/span&gt;
./lambda-test.sh logs      &lt;span class=&quot;c&quot;&gt;# View Lambda function logs&lt;/span&gt;
./lambda-test.sh destroy   &lt;span class=&quot;c&quot;&gt;# Destroy all infrastructure&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;cost&quot;&gt;Cost&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;$0.00&lt;/strong&gt; - Everything covered by AWS Free Tier&lt;/p&gt;

&lt;h2 id=&quot;troubleshooting&quot;&gt;Troubleshooting&lt;/h2&gt;

&lt;h3 id=&quot;issue-aws-account-is-not-accessible-or-monitor-association-not-found&quot;&gt;Issue: “AWS account is not accessible” or “Monitor Association not found”&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error message in investigation:&lt;/strong&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;Unable to investigate the Lambda function errors because AWS account XXX
is not accessible. The error &apos;Monitor Association with AgentSpace agentSpaceId
XXX not found&apos; indicates this account is not associated with the monitoring system.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Your AWS account is not configured as a Primary source in Cloud Capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Open your Agent Space in AWS Console&lt;/li&gt;
  &lt;li&gt;Go to &lt;strong&gt;Settings&lt;/strong&gt; → &lt;strong&gt;Cloud capabilities&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Check if your AWS account is listed under “Primary sources”&lt;/li&gt;
  &lt;li&gt;If not listed or listed under “Secondary sources”:
    &lt;ul&gt;
      &lt;li&gt;Click &lt;strong&gt;“Add cloud capability”&lt;/strong&gt;&lt;/li&gt;
      &lt;li&gt;Select &lt;strong&gt;“AWS”&lt;/strong&gt;&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;CRITICAL:&lt;/strong&gt; Choose &lt;strong&gt;“Primary source”&lt;/strong&gt; (NOT Secondary)&lt;/li&gt;
      &lt;li&gt;Enter your AWS account ID (from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;terraform output aws_account_id&lt;/code&gt;)&lt;/li&gt;
      &lt;li&gt;Use &lt;strong&gt;“Auto-create role”&lt;/strong&gt; option&lt;/li&gt;
      &lt;li&gt;Click &lt;strong&gt;“Add”&lt;/strong&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Verify your account now appears under “Primary sources”&lt;/li&gt;
  &lt;li&gt;Try the investigation again&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; Only Primary sources give the AI agent full access to CloudWatch alarms, Lambda logs, and other AWS resources needed for investigations.&lt;/p&gt;

&lt;h2 id=&quot;key-facts&quot;&gt;Key Facts&lt;/h2&gt;

&lt;h3 id=&quot;what-it-is&quot;&gt;What It Is&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;AI layer that connects your existing tools&lt;/li&gt;
  &lt;li&gt;Not a monitoring tool replacement&lt;/li&gt;
  &lt;li&gt;Reduces investigation time by 80-90%&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;limitations-preview&quot;&gt;Limitations (Preview)&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Region:&lt;/strong&gt; us-east-1 only&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Quotas:&lt;/strong&gt; 20 investigation hours/month, 10 prevention hours/month&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Pricing:&lt;/strong&gt; Free now, pricing TBD at GA&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;security&quot;&gt;Security&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Read-only permissions by default&lt;/li&gt;
  &lt;li&gt;IAM-based access control&lt;/li&gt;
  &lt;li&gt;Agent Space isolation&lt;/li&gt;
  &lt;li&gt;AWS IAM Identity Center support&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;common-questions&quot;&gt;Common Questions&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: Does it replace my observability tools?&lt;/strong&gt;
A: No. It sits on top of them, connecting data across tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: What if the AI is wrong?&lt;/strong&gt;
A: You are in control. Ask follow-up questions, steer investigations, or escalate to AWS Support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How secure is it?&lt;/strong&gt;
A: Very. Read-only by default, IAM-controlled, data stays in your account.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Works with non-AWS tools?&lt;/strong&gt;
A: Yes. Integrates with Datadog, Dynatrace, New Relic, Splunk, GitHub, GitLab, ServiceNow, Slack.&lt;/p&gt;

&lt;h2 id=&quot;next-steps&quot;&gt;Next Steps&lt;/h2&gt;

&lt;p&gt;After testing:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Connect production&lt;/strong&gt; - Create Agent Space for real environment&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Enable auto-triggers&lt;/strong&gt; - Set up ServiceNow/PagerDuty webhooks&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Review recommendations&lt;/strong&gt; - Implement prevention suggestions&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Expand scope&lt;/strong&gt; - Connect multiple AWS accounts&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;files-in-this-repo&quot;&gt;Files in This Repo&lt;/h2&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;aws-devops-agent-demo/
├── README.md                 # This guide
├── lambda-test.tf            # Terraform: Lambda and CloudWatch alarm
├── lambda_test.py            # Test Lambda function (division by zero)
├── lambda-test.sh            # Automation script for deployment
├── .gitignore                # Git ignore file
└── screenshots/              # Step-by-step screenshots of the demo
    ├── 01-terraform-deploy.png
    ├── 02-terraform-output.png
    ├── 03-devops-agent-console.png
    ├── 04-create-agent-space.png
    ├── 05-cloud-capabilities.png
    ├── 06-lambda-errors-generated.png
    ├── 07-cloudwatch-alarm-triggered.png
    ├── 08-incident-response-dashboard.png
    ├── 10-investigation-details-modal.png
    ├── 11-investigation-in-progress.png
    ├── 12-investigation-completed.png
    ├── 13-investigation-summary.png
    ├── 14-mitigation-plan.png
    └── 15-terraform-destroy.png
&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id=&quot;what-is-automated-vs-manual&quot;&gt;What is Automated vs Manual?&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Automated via Terraform:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Lambda function with intentional error&lt;/li&gt;
  &lt;li&gt;CloudWatch alarm monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Manual via AWS Console:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Agent Space creation&lt;/li&gt;
  &lt;li&gt;Cloud Capabilities configuration (Primary source setup + IAM role auto-creation)&lt;/li&gt;
  &lt;li&gt;Agent Space deletion (which automatically removes auto-created IAM roles)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why Manual?&lt;/strong&gt; The Agent Space requires Primary source configuration through the console to ensure the AI agent can properly access AWS resources during investigations. The AWS CLI cannot currently configure this correctly. When you delete the Agent Space, AWS automatically cleans up the auto-created IAM roles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;About This Article&lt;/strong&gt; This article and accompanying automation scripts were developed with assistance from Claude Code(Anthropic). All code has been tested in my personal AWS environment and verified against the official AWS DevOps Agent User Guide.&lt;/p&gt;

&lt;h2 id=&quot;resources&quot;&gt;Resources&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://docs.aws.amazon.com/devops-agent/&quot;&gt;AWS DevOps Agent User Guide&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;lambda-test.tf&quot;&gt;Terraform Configuration&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;lambda_test.py&quot;&gt;Lambda Test Function&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Fri, 05 Dec 2025 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/aws-devops-agent-preview</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/aws-devops-agent-preview</guid>
        
        <category>AWS</category>
        
        <category>DevOps</category>
        
        <category>AI</category>
        
        <category>Lambda</category>
        
        <category>CloudWatch</category>
        
        <category>Incident-Response</category>
        
        <category>SRE</category>
        
        
      </item>
    
      <item>
        <title>DynamoDB Just Made Your Life Easier: Multi-Attribute Composite Keys Explained</title>
        <description>&lt;p&gt;AWS just dropped a feature on November 19, 2025 that is going to save you from one of DynamoDB’s most annoying workarounds: &lt;strong&gt;multi-attribute composite keys for Global Secondary Indexes (GSIs)&lt;/strong&gt;. Let me show you why this matters with a real-world example.&lt;/p&gt;

&lt;h2 id=&quot;important-this-is-a-gsi-feature&quot;&gt;Important: This is a GSI Feature&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Critical Clarification:&lt;/strong&gt; This new capability applies to &lt;strong&gt;Global Secondary Indexes (GSIs) only&lt;/strong&gt; - NOT to your base table’s primary key. Your base table still uses the traditional structure of a single partition key + optional single sort key. However, when you create GSIs on your table, you can now use up to 4 partition key attributes and 4 sort key attributes!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/11/dynamodb-multi-attribute-2.png&quot; alt=&quot;dynamodb-multi-attribute-2&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;the-scenario-e-commerce-order-tracking&quot;&gt;The Scenario: E-Commerce Order Tracking&lt;/h2&gt;

&lt;p&gt;Imagine you are building an order management system. You need to query orders by:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Customer ID&lt;/strong&gt; + &lt;strong&gt;Order Date&lt;/strong&gt; + &lt;strong&gt;Order Status&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Seems simple, right? Wrong. Until now, this was a pain.&lt;/p&gt;

&lt;h2 id=&quot;the-old-way-aka-the-painful-way&quot;&gt;The Old Way (aka The Painful Way)&lt;/h2&gt;

&lt;p&gt;Before this update, you had two bad options:&lt;/p&gt;

&lt;h3 id=&quot;option-1-concatenate-fields-yuck&quot;&gt;Option 1: Concatenate Fields (Yuck!)&lt;/h3&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;GSI Configuration:
Partition Key: customer_id
Sort Key: date_status &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;concatenated: &lt;span class=&quot;s2&quot;&gt;&quot;2024-11-24_SHIPPED&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This meant:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Extra fields cluttering your table&lt;/li&gt;
  &lt;li&gt;String manipulation everywhere in your code&lt;/li&gt;
  &lt;li&gt;Maintenance nightmares when requirements change&lt;/li&gt;
  &lt;li&gt;Code that looks like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sortKey = &lt;/code&gt;${date}_${status}``&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;option-2-full-table-scan-nope&quot;&gt;Option 2: Full Table Scan (Nope!)&lt;/h3&gt;

&lt;p&gt;Just scan the entire table filtering by all three fields. Slow, expensive, and scales terribly.&lt;/p&gt;

&lt;h2 id=&quot;the-new-way-hello-beautiful&quot;&gt;The New Way (Hello, Beautiful!)&lt;/h2&gt;

&lt;p&gt;Now you can do this with your GSI:&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Global Secondary Index Configuration:
  Partition Key: customer_id
  Sort Key 1: order_date
  Sort Key 2: order_status
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Up to 4 partition key fields and 4 sort key fields per GSI!&lt;/strong&gt;&lt;/p&gt;

&lt;h2 id=&quot;what-this-means-for-you&quot;&gt;What This Means for You&lt;/h2&gt;

&lt;div class=&quot;language-javascript highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;// Before: String concatenation madness&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;sortKey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;orderDate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;status&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// After: Clean, intuitive queries on your GSI&lt;/span&gt;
&lt;span class=&quot;nx&quot;&gt;queryParams&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;IndexName&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;dl&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;CustomerOrdersIndex&lt;/span&gt;&lt;span class=&quot;dl&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;partitionKey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;customerId&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;sortKey1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;orderDate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;sortKey2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nx&quot;&gt;status&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;real-benefits&quot;&gt;Real Benefits:&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Cleaner Code&lt;/strong&gt;: No more string concatenation hacks&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Better Performance&lt;/strong&gt;: Query (not scan) with multiple attributes on GSIs&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Easier Maintenance&lt;/strong&gt;: Add/remove query patterns without refactoring&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Native Support&lt;/strong&gt;: Let DynamoDB handle the complexity&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Native Data Types&lt;/strong&gt;: Keep numbers as numbers, dates as dates - no string conversion needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/11/dynamodb-multi-attribute-1.png&quot; alt=&quot;dynamodb-multi-attribute-1&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;quick-example&quot;&gt;Quick Example&lt;/h2&gt;

&lt;p&gt;Let us say you need to find all orders for customer “C123” placed on “2024-11-24” with status “PENDING”:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# Had to concatenate
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sort_key&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;2024-11-24_PENDING&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;IndexName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;CustomerOrdersIndex&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;KeyConditionExpression&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;customer_id = :cid AND date_status = :ds&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ExpressionAttributeValues&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;:cid&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;C123&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;:ds&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sort_key&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# Clean and intuitive with multi-attribute GSI
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;response&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;IndexName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;CustomerOrdersIndex&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;KeyConditionExpression&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;customer_id = :cid AND order_date = :date AND order_status = :status&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ExpressionAttributeValues&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;:cid&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;C123&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;:date&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;2024-11-24&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
        &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;:status&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;PENDING&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;pro-tips&quot;&gt;Pro Tips&lt;/h2&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Do Not Go Crazy&lt;/strong&gt;: Just because you CAN add 8 fields does not mean you SHOULD. GSIs consume additional storage and throughput.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Watch Your Capacity&lt;/strong&gt;: Each GSI needs its own read/write capacity units. Plan accordingly.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Eventually Consistent&lt;/strong&gt;: Remember, GSIs are eventually consistent. The more fields, the longer it might take to sync.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Query Pattern Rules&lt;/strong&gt;:&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;All partition key attributes must use equality (=) conditions&lt;/li&gt;
      &lt;li&gt;Range conditions (&amp;lt;, &amp;gt;, BETWEEN) only work on the &lt;strong&gt;last&lt;/strong&gt; sort key attribute&lt;/li&gt;
      &lt;li&gt;You can not skip sort keys - use them left-to-right (SK1, or SK1+SK2, or SK1+SK2+SK3)&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Base Table vs GSI&lt;/strong&gt;: Your base table’s primary key structure has not changed - this feature is exclusively for GSIs to give you more flexible query patterns!&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;the-bottom-line&quot;&gt;The Bottom Line&lt;/h2&gt;

&lt;p&gt;This is one of those updates that makes you wonder, “How did we live without this?” If you have been dealing with concatenated fields or complex workarounds in your GSIs, it is time to refactor and simplify.&lt;/p&gt;

&lt;p&gt;Your future self (and your teammates) will thank you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ready to try it?&lt;/strong&gt; Head to your DynamoDB console and create a GSI with multiple attributes. It is available now in all AWS regions at no additional cost beyond standard GSI pricing!&lt;/p&gt;
</description>
        <pubDate>Tue, 25 Nov 2025 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/dynamodb-multi-attribute-composite-keys</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/dynamodb-multi-attribute-composite-keys</guid>
        
        <category>AWS</category>
        
        <category>DynamoDB</category>
        
        
      </item>
    
      <item>
        <title>How Putting CDN After Your Reverse Proxy Creates a Single Point of Failure</title>
        <description>&lt;p&gt;This is a universal architecture anti-pattern that affects teams across all cloud providers and technology stacks. Whether you are using nginx, HAProxy, Envoy or cloud load balancers, the problem is the same: placing a CDN after your reverse proxy instead of before it defeats the CDN’s distributed architecture.&lt;/p&gt;

&lt;p&gt;Let us understand why this happens and why it is dangerous, regardless of your technology choices.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/11/cdn-anit-pattern-img1.png&quot; alt=&quot;cdn-anit-pattern-img1&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;why-this-architecture-fails-the-fundamental-problem&quot;&gt;Why This Architecture Fails: The Fundamental Problem&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Your reverse proxy operates from a limited IP address space&lt;/strong&gt; : Even with auto-scaling, multi-zone deployment and load balancing, your reverse proxy cluster runs on a finite set of IP addresses within your data center or cloud VPC. These IPs are geographically concentrated.&lt;/p&gt;

    &lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  &lt;span class=&quot;c&quot;&gt;# Example: &lt;/span&gt;
  nginx cluster with 10 nodes nginx-1: 10.0.1.5 nginx-2: 10.0.1.8 nginx-3: 10.0.2.12 ... nginx-10: 10.0.3.45 
    
  All IPs from the same /16 or /24 subnet 
  All IPs &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;the same geographic region 
  Even with 100 nodes, still limited IP space!
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;CDN sees your entire proxy cluster as a single client location&lt;/strong&gt;: CDNs route traffic based on source IP geolocation. When all requests originate from your proxy’s data center (even from multiple IPs), the CDN’s routing algorithm treats this as one geographic location requesting content.&lt;/p&gt;

    &lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  &lt;span class=&quot;c&quot;&gt;# Normal (user-facing CDN)&lt;/span&gt;
  User &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;Tokyo → CDN routes to Tokyo PoP
    
  &lt;span class=&quot;c&quot;&gt;# This anti-pattern &lt;/span&gt;
  All traffic from proxy IPs &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;e.g. US-East&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; → CDN routes ALL to US-East PoP
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;100% of your traffic flows through ONE CDN Point of Presence&lt;/strong&gt;: Instead of distributing globally across hundreds of edge locations, all your traffic is routed to the single PoP nearest to your reverse proxy cluster.&lt;/p&gt;

    &lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;  &lt;span class=&quot;c&quot;&gt;# CORRECT: User-facing CDN &lt;/span&gt;
  Origin User &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;Tokyo → Tokyo PoP  
  Origin User &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;London → London PoP 
  Origin User &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;New York → Virginia PoP 
  Origin User &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;Sydney → Sydney PoP 
    
  Origin Result: Distributed across 4 PoPs ✅ 

  &lt;span class=&quot;c&quot;&gt;# WRONG: CDN behind reverse proxy  &lt;/span&gt;
  All Users → Reverse Proxy &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;US-East IPs&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; → Virginia PoP ONLY 
    
  Backend Result: Single PoP &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; Single Point of Failure ❌
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;When that one PoP fails → 100% of users are impacted&lt;/strong&gt;: Your carefully architected multi-zone reverse proxy becomes irrelevant. If the single CDN PoP experiences:&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;Network congestion&lt;/li&gt;
      &lt;li&gt;Hardware failure&lt;/li&gt;
      &lt;li&gt;Software bug&lt;/li&gt;
      &lt;li&gt;DDoS attack&lt;/li&gt;
      &lt;li&gt;Maintenance downtime&lt;/li&gt;
    &lt;/ul&gt;

    &lt;p&gt;ALL your users worldwide experience an outage&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;why-teams-build-this-by-accident&quot;&gt;Why Teams Build This (By Accident)&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Organic evolution&lt;/strong&gt;: Started with Users → Proxy → Backend, then added “caching” without understanding CDN routing&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Wrong mental model&lt;/strong&gt;: Treating CDN as a cache layer (like Redis) instead of an edge network&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Legacy migration&lt;/strong&gt;: Lifted-and-shifted on-prem architecture without redesign&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Quick fix that stuck&lt;/strong&gt;: “Backend is slow, let us add caching here!” without proper architecture review&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;the-correct-aws-architectures&quot;&gt;The Correct AWS Architectures&lt;/h3&gt;

&lt;p&gt;Now that we understand the problem, let us look at three correct ways to implement caching and content delivery in AWS. Each pattern solves specific use cases and eliminates the single point of failure.&lt;/p&gt;

&lt;h4 id=&quot;elasticache-for-internal-caching&quot;&gt;ElastiCache for Internal Caching&lt;/h4&gt;

&lt;p&gt;Move caching inside your VPC using ElastiCache (Redis or Memcached). This provides distributed caching with true multi-AZ high availability, without any external routing layer to become a bottleneck.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/11/cdn-anit-pattern-img2.png&quot; alt=&quot;cdn-anit-pattern-img2&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why This Works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;No external routing layer to become a bottleneck&lt;/li&gt;
  &lt;li&gt;Sub-millisecond latency within VPC&lt;/li&gt;
  &lt;li&gt;True multi-AZ high availability with automatic failover&lt;/li&gt;
  &lt;li&gt;Fine-grained cache control in application code&lt;/li&gt;
  &lt;li&gt;ElastiCache Cluster Mode for horizontal scaling&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;cloudfront-in-front-of-alb-user-facing&quot;&gt;CloudFront in Front of ALB (User-Facing)&lt;/h4&gt;

&lt;p&gt;Place CloudFront where it belongs: directly facing users. This restores the CDN’s global distribution, provides DDoS protection, and delivers edge caching benefits without creating routing bottlenecks.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/11/cdn-anit-pattern-img3.png&quot; alt=&quot;cdn-anit-pattern-img3&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why This Works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Users route to nearest CloudFront edge → low latency for all&lt;/li&gt;
  &lt;li&gt;True global distribution across hundreds of locations&lt;/li&gt;
  &lt;li&gt;Built-in DDoS protection with AWS Shield Standard&lt;/li&gt;
  &lt;li&gt;SSL/TLS termination at edge reduces origin load&lt;/li&gt;
  &lt;li&gt;No single point of failure in routing layer&lt;/li&gt;
  &lt;li&gt;Cache static AND dynamic content at edge&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;lambdaedge-for-edge-computing&quot;&gt;Lambda@Edge for Edge Computing&lt;/h4&gt;

&lt;p&gt;Execute routing, A/B testing and composition logic at CloudFront edge locations using Lambda@Edge. This can eliminate the reverse proxy layer entirely for certain workloads.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/11/cdn-anit-pattern-img4.png&quot; alt=&quot;cdn-anit-pattern-img4&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why This Works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Logic executes at hundreds of edge locations (ultra-low latency)&lt;/li&gt;
  &lt;li&gt;No centralized reverse proxy to become a bottleneck&lt;/li&gt;
  &lt;li&gt;Dynamic routing, A/B testing, auth at edge&lt;/li&gt;
  &lt;li&gt;Can eliminate ALB costs for some workloads&lt;/li&gt;
  &lt;li&gt;Geo-based content delivery&lt;/li&gt;
  &lt;li&gt;Request/response manipulation at edge&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;quick-decision-guide&quot;&gt;Quick Decision Guide&lt;/h3&gt;

&lt;h4 id=&quot;choose-elasticache-when&quot;&gt;Choose &lt;strong&gt;ElastiCache&lt;/strong&gt; when&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;✅ You need internal caching within your VPC&lt;/li&gt;
  &lt;li&gt;✅ Sub-millisecond latency is critical&lt;/li&gt;
  &lt;li&gt;✅ You want full control over cache logic&lt;/li&gt;
  &lt;li&gt;✅ Session management across microservices&lt;/li&gt;
  &lt;li&gt;✅ Database query result caching&lt;/li&gt;
  &lt;li&gt;❌ Do not need global distribution&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;choose-cloudfront--alb-when&quot;&gt;Choose &lt;strong&gt;CloudFront + ALB&lt;/strong&gt; when&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;✅ You have a global user base&lt;/li&gt;
  &lt;li&gt;✅ You need DDoS protection&lt;/li&gt;
  &lt;li&gt;✅ You serve static or semi-static content&lt;/li&gt;
  &lt;li&gt;✅ SSL/TLS termination at edge is desired&lt;/li&gt;
  &lt;li&gt;✅ Cost-effective solution needed&lt;/li&gt;
  &lt;li&gt;❌ Do not need complex edge logic&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;choose-lambdaedge-when&quot;&gt;Choose &lt;strong&gt;Lambda@Edge&lt;/strong&gt; when&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;✅ You need complex logic at the edge&lt;/li&gt;
  &lt;li&gt;✅ A/B testing or personalization required&lt;/li&gt;
  &lt;li&gt;✅ Geographic content routing needed&lt;/li&gt;
  &lt;li&gt;✅ Authentication/authorization at edge&lt;/li&gt;
  &lt;li&gt;✅ You want to eliminate ALB for some workloads&lt;/li&gt;
  &lt;li&gt;❌ Do not mind higher complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;common-objection-what-about-multi-region-proxy-clusters&quot;&gt;Common Objection: What About Multi-Region Proxy Clusters?&lt;/h3&gt;

&lt;p&gt;A common question arises: “If I deploy my proxy clusters in multiple regions (US-East, EU-West, AP-Southeast), doesn’t that solve the single point of failure problem?”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Short answer: It reduces the blast radius but does not eliminate the anti-pattern.&lt;/strong&gt;&lt;/p&gt;

&lt;h4 id=&quot;what-multi-region-proxies-give-you&quot;&gt;What Multi-Region Proxies Give You&lt;/h4&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Multi-region proxy setup&lt;/span&gt;
US Users → US Proxy Cluster → CDN Virginia PoP → Backend
EU Users → EU Proxy Cluster → CDN London PoP → Backend
Asia Users → Asia Proxy Cluster → CDN Singapore PoP → Backend
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;At first glance, this seems better—you are no longer routing all traffic through a single CDN PoP. Each regional proxy cluster routes to its nearest CDN location.&lt;/p&gt;

&lt;h4 id=&quot;what-still-remains-wrong&quot;&gt;What Still Remains Wrong&lt;/h4&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Traffic still flows through your infrastructure first&lt;/strong&gt;
    &lt;ul&gt;
      &lt;li&gt;Users must reach your proxy before getting any CDN benefits&lt;/li&gt;
      &lt;li&gt;Adds unnecessary latency: User → Your Proxy → CDN → Backend&lt;/li&gt;
      &lt;li&gt;The CDN cannot optimize routing based on actual user location&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;You are duplicating what the CDN already does natively&lt;/strong&gt;
    &lt;ul&gt;
      &lt;li&gt;CDNs have hundreds of edge locations with intelligent routing&lt;/li&gt;
      &lt;li&gt;You are building a 3-region routing layer when CDN offers hundreds of locations&lt;/li&gt;
      &lt;li&gt;Your proxies become an expensive, manual version of CDN anycast&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;CDN routing is based on proxy location, not user location&lt;/strong&gt;
    &lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# The Problem&lt;/span&gt;
User &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;Mumbai → Routes to Asia Proxy &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;Singapore&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; → CDN Singapore PoP
&lt;span class=&quot;c&quot;&gt;# CDN cannot optimize: Maybe Tokyo PoP would be faster for this user&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Correct Approach&lt;/span&gt;
User &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;Mumbai → CloudFront routes to closest PoP &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;Mumbai/Chennai&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; → Origin
&lt;span class=&quot;c&quot;&gt;# CDN intelligently selects from hundreds of locations&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Cost and operational complexity&lt;/strong&gt;
    &lt;ul&gt;
      &lt;li&gt;Running multi-region proxy infrastructure is expensive&lt;/li&gt;
      &lt;li&gt;Manual failover configuration between regions&lt;/li&gt;
      &lt;li&gt;More moving parts = more failure modes&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h4 id=&quot;the-correct-multi-region-approach&quot;&gt;The Correct Multi-Region Approach&lt;/h4&gt;

&lt;p&gt;Instead of multi-region proxies, use CloudFront with native multi-region support:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CloudFront + Origin Groups (Automatic Failover):&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;User Anywhere → CloudFront &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;automatic routing to nearest PoP&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
              → Primary Origin &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;US-East ALB&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
              → Secondary Origin &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;EU-West ALB&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;automatic failover]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;CloudFront handles global routing automatically&lt;/li&gt;
  &lt;li&gt;Origin Groups provide automatic failover between regions&lt;/li&gt;
  &lt;li&gt;No proxy infrastructure to manage&lt;/li&gt;
  &lt;li&gt;Users always route to their nearest edge location&lt;/li&gt;
  &lt;li&gt;Lower latency, lower cost, higher availability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The fundamental issue is not single vs. multiple regions—it is placing the CDN after your infrastructure instead of in front of it.&lt;/strong&gt; Multi-region proxies add cost and complexity while still defeating the CDN’s core purpose.&lt;/p&gt;

&lt;h3 id=&quot;key-takeaways&quot;&gt;Key Takeaways&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The anti-pattern is universal&lt;/strong&gt;: Placing a CDN after your reverse proxy (whether it is nginx, HAProxy, ALB or API Gateway) defeats the CDN’s distributed architecture and creates a hidden single point of failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In AWS specifically&lt;/strong&gt;: CloudFront must be user-facing to work correctly. Choose ElastiCache for internal caching needs, CloudFront in front of ALB for global content delivery or Lambda@Edge for edge computing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix is architectural&lt;/strong&gt;: This is not about tweaking configurations—it is about placing components in the right order. CDNs belong between users and your infrastructure, not between your infrastructure components.&lt;/p&gt;
</description>
        <pubDate>Sat, 08 Nov 2025 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/cdn-placement-antipattern</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/cdn-placement-antipattern</guid>
        
        <category>AWS</category>
        
        <category>CloudFront</category>
        
        <category>ALB</category>
        
        <category>CDN</category>
        
        <category>Architecture</category>
        
        <category>Anti-Pattern</category>
        
        <category>ElastiCache</category>
        
        <category>Lambda@Edge</category>
        
        <category>Microservices</category>
        
        
      </item>
    
      <item>
        <title>The Game-Changer: Docker MCP Catalog and Toolkit</title>
        <description>&lt;p&gt;I will be honest – setting up MCP (Model Context Protocol) servers for AI agents has been a pain. You would spend time digging through documentation, manually editing JSON config files, and hoping you got the syntax right. Docker changed that with their MCP Toolkit, and it is pretty clever.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/10/docker-mcp-conceptual-infographic.png&quot; alt=&quot;Docker MCP Toolkit&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;what-is-mcp-and-why-should-you-care&quot;&gt;What Is MCP and Why Should You Care?&lt;/h3&gt;

&lt;p&gt;If you are unfamiliar, MCP is Anthropic’s protocol that lets AI agents (like Claude) interact with external services. Think of it as a standardized way for your AI to read Slack messages, create GitHub issues, fetch YouTube transcripts, or query databases. Before this, each integration required its own setup dance.&lt;/p&gt;

&lt;p&gt;The problem was not the concept – it was the configuration overhead. Every MCP server required manual JSON editing in your Claude config, and if you wanted to try multiple servers, well… you would be editing that file a lot.&lt;/p&gt;

&lt;h3 id=&quot;what-docker-actually-built&quot;&gt;What Docker Actually Built&lt;/h3&gt;

&lt;p&gt;Docker Desktop now has a built-in &lt;a href=&quot;https://docs.docker.com/ai/mcp-catalog-and-toolkit/&quot;&gt;MCP Toolkit&lt;/a&gt; that fundamentally changes how this works. Here is what makes it different:&lt;/p&gt;

&lt;h3 id=&quot;the-catalog-interface&quot;&gt;The Catalog Interface&lt;/h3&gt;

&lt;p&gt;Instead of hunting down MCP servers on GitHub and figuring out how to configure them, you get a curated catalog in Docker Desktop. It shows you what is popular, what each server does, and you can add them with a single click.&lt;/p&gt;

&lt;p&gt;The catalog handles the JSON configuration automatically. Adding a server updates the config file for any connected clients – Claude Desktop, Cursor, whatever you are using—no more manual editing.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/10/docker-mcp-toolkit-catalog.png&quot; alt=&quot;Docker MCP Toolkit Catalog&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;how-the-containers-work&quot;&gt;How the Containers Work&lt;/h3&gt;

&lt;p&gt;This is the part that makes sense from a Docker perspective. Each MCP server runs in its own container, but Docker’s implementation is more innovative than you might expect:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Containers only spin up when a tool is actually called&lt;/li&gt;
  &lt;li&gt;They shut down automatically when the task completes&lt;/li&gt;
  &lt;li&gt;When idle, they consume zero memory&lt;/li&gt;
  &lt;li&gt;You get the isolation benefits of containers without the overhead of running everything 24/7&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So if you have 10 MCP servers configured but only use one of them, you are only running one container now. It is efficient.&lt;/p&gt;

&lt;h3 id=&quot;authentication-that-does-not-suck&quot;&gt;Authentication That Does Not Suck&lt;/h3&gt;

&lt;p&gt;Here is something that usually takes forever: OAuth flows. The catalog has built-in OAuth support for services like GitHub. You click to authenticate, and it handles the token dance. You are done. For API-key-based services, there is a straightforward interface to add your credentials.&lt;/p&gt;

&lt;p&gt;Compare that to manually managing environment variables or config files. Yeah, this is better.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/10/docker-mcp-toolkit-oauth.png&quot; alt=&quot;Docker MCP Toolkit OAuth&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;whats-actually-available&quot;&gt;What’s Actually Available&lt;/h3&gt;

&lt;p&gt;The catalog has the servers you would expect if you have been following the MCP ecosystem:&lt;/p&gt;

&lt;h4 id=&quot;core-productivity-tools&quot;&gt;Core Productivity Tools&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;YouTube – grab transcripts, summarize videos&lt;/li&gt;
  &lt;li&gt;Slack – read channels, post messages (helpful for monitoring or notifications)&lt;/li&gt;
  &lt;li&gt;GitHub – create issues, read repos, manage PRs&lt;/li&gt;
  &lt;li&gt;Notion, Obsidian – knowledge base integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;development-tools&quot;&gt;Development Tools&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Database connectors (PostgreSQL, SQLite, etc.)&lt;/li&gt;
  &lt;li&gt;File system access&lt;/li&gt;
  &lt;li&gt;Memory/cache systems like ChromaDB&lt;/li&gt;
  &lt;li&gt;Fetch (for web scraping and HTTP requests)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is also Gordon, Docker’s built-in test agent. It is in beta, but it is useful for quickly checking if an MCP server is working before you try using it with your actual workflow.&lt;/p&gt;

&lt;h3 id=&quot;a-real-workflow-example&quot;&gt;A Real Workflow Example&lt;/h3&gt;

&lt;p&gt;Let me give you a practical example of why this matters. Say you are researching a technical topic:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Use the YouTube MCP server to pull transcripts from conference talks&lt;/li&gt;
  &lt;li&gt;Have Claude summarize the key points&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/10/docker-mcp-toolkit-demo.png&quot; alt=&quot;Docker MCP Toolkit Demo&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Before the catalog, setting up those five MCP servers meant editing JSON configs, debugging path issues, and restarting things a few times. Now it is 10 minutes of clicking through the catalog.&lt;/p&gt;

&lt;h3 id=&quot;setting-it-up&quot;&gt;Setting It Up&lt;/h3&gt;

&lt;p&gt;The actual setup is straightforward:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Make sure you have Docker Desktop installed&lt;/li&gt;
  &lt;li&gt;Enable beta features in Settings → Beta features  → Enable Docker MCP Toolkit&lt;/li&gt;
  &lt;li&gt;Open the MCP Catalog from the Docker Desktop’s MCP Toolkit section&lt;/li&gt;
  &lt;li&gt;Browse servers and click “Add MCP Server” for what you need&lt;/li&gt;
  &lt;li&gt;Configure any API keys or OAuth in the configuration section of the MCP server&lt;/li&gt;
  &lt;li&gt;In your MCP Tpplkit –&amp;gt; Clients section, click to connect to MCP clients&lt;/li&gt;
  &lt;li&gt;Restart your MCP clients&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The whole thing takes less time than it took me to write this section.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/10/docker-mcp-toolkit-clients.png&quot; alt=&quot;Docker MCP Toolkit Clients&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;for-custom-integrations&quot;&gt;For Custom Integrations&lt;/h3&gt;

&lt;p&gt;If you are building your own agents or using frameworks like N8N or Python-based systems, Docker has open-sourced the MCP Gateway. It lets you orchestrate MCP servers through HTTP with streamable protocol support.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/10/docker-mcp-toolkit-gateway.png&quot; alt=&quot;Docker MCP Toolkit Gateway&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;bottom-line&quot;&gt;Bottom Line&lt;/h3&gt;

&lt;p&gt;The Docker MCP Toolkit is worth checking out if you use AI agents for anything beyond basic chat. It automates the tedious parts of MCP server management while providing the control and isolation that containers provide.&lt;/p&gt;

&lt;p&gt;The fact that it is built into Docker Desktop means there is one less tool to install, one less service to manage, and one less thing to forget when switching between projects.&lt;/p&gt;

&lt;p&gt;Give it a try. Set up a few servers, test them with Gordon/Claude, and see if it fits your workflow. Worst case, you waste 15 minutes. Best case, you never manually edit an MCP config file again.&lt;/p&gt;
</description>
        <pubDate>Sat, 18 Oct 2025 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/docker-mcp-catalog-and-toolkit</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/docker-mcp-catalog-and-toolkit</guid>
        
        <category>Docker</category>
        
        <category>MCP</category>
        
        <category>AI</category>
        
        <category>Claude AI</category>
        
        
      </item>
    
      <item>
        <title>Kafka Crash Course: Learn with a Parent&apos;s Return to Office Mandate Use Case</title>
        <description>&lt;p&gt;Companies are mandating return-to-office. Parents now face a coordination challenge:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;School bus drops kids at 3:15 PM at the community bus stop&lt;/li&gt;
  &lt;li&gt;Parents need to be there, but meetings run over&lt;/li&gt;
  &lt;li&gt;Group chats don’t work - messages get buried, no confirmation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real scenario:&lt;/strong&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;3:10 PM - Sarah&apos;s meeting runs over
3:11 PM - Posts in group chat: &quot;Can someone watch Jake?&quot;
3:15 PM - Bus arrives, no response yet
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Neighbors want to help. They just need a reliable system.&lt;/p&gt;

&lt;h3 id=&quot;why-kafka-fits-this-use-case&quot;&gt;Why Kafka Fits This Use Case&lt;/h3&gt;

&lt;h4 id=&quot;before-tightly-coupled-services&quot;&gt;Before: Tightly Coupled Services&lt;/h4&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;Parent App → Notification Service → Database → Neighbor App
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Problems:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Notification service crashes = everything stops&lt;/li&gt;
  &lt;li&gt;Parent waits for entire chain to respond&lt;/li&gt;
  &lt;li&gt;Neighbor offline = message lost forever&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;with-kafka-decoupled&quot;&gt;With Kafka: Decoupled&lt;/h4&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;Parent App → Kafka ← Neighbor Apps
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Parent sends alert, doesn’t wait&lt;/li&gt;
  &lt;li&gt;Message stored safely in Kafka&lt;/li&gt;
  &lt;li&gt;Neighbors read when ready (even if offline before)&lt;/li&gt;
  &lt;li&gt;Multiple neighbors can all see it&lt;/li&gt;
  &lt;li&gt;Add new features without breaking existing ones&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of Kafka as a bulletin board. Pin a message, walk away. Everyone sees it. First person to help responds.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/09/kafka-01.png&quot; alt=&quot;Kafka Architecture Diagram&quot; /&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;lets-build-it&quot;&gt;Let’s Build It&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What we need:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Docker (to run Kafka)&lt;/li&gt;
  &lt;li&gt;Python (to write producer/consumer)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;virtual-environment-setup&quot;&gt;Virtual Environment Setup&lt;/h4&gt;

&lt;ol&gt;
  &lt;li&gt;Create the project folder and navigate to it:&lt;/li&gt;
&lt;/ol&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;mkdir &lt;/span&gt;bus-stop-kafka
&lt;span class=&quot;nb&quot;&gt;cd &lt;/span&gt;bus-stop-kafka
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;ol&gt;
  &lt;li&gt;Create a virtual environment:&lt;/li&gt;
&lt;/ol&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;python3 &lt;span class=&quot;nt&quot;&gt;-m&lt;/span&gt; venv venv
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;ol&gt;
  &lt;li&gt;Activate the virtual environment:&lt;/li&gt;
&lt;/ol&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;source &lt;/span&gt;venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;ol&gt;
  &lt;li&gt;Install librdkafka (required C library for macOS):&lt;/li&gt;
&lt;/ol&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;brew &lt;span class=&quot;nb&quot;&gt;install &lt;/span&gt;librdkafka
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;ol&gt;
  &lt;li&gt;Upgrade pip and install dependencies:&lt;/li&gt;
&lt;/ol&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;pip &lt;span class=&quot;nb&quot;&gt;install&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--upgrade&lt;/span&gt; pip
pip &lt;span class=&quot;nb&quot;&gt;install &lt;/span&gt;confluent-kafka
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The virtual environment is now set up and isolated from your system Python installation.&lt;/p&gt;

&lt;h4 id=&quot;start-kafka&quot;&gt;Start Kafka&lt;/h4&gt;

&lt;p&gt;Create &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker-compose.yml&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;na&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;3.8&apos;&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;services&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;kafka&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;confluentinc/cp-kafka:7.8.3&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;container_name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;kafka&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;ports&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;9092:9092&quot;&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;environment&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;KAFKA_KRAFT_MODE&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;true&quot;&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;CLUSTER_ID&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;bus-stop-demo&quot;&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;KAFKA_NODE_ID&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;KAFKA_PROCESS_ROLES&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;broker,controller&quot;&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;KAFKA_LISTENERS&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093&quot;&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;KAFKA_ADVERTISED_LISTENERS&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;PLAINTEXT://localhost:9092&quot;&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;KAFKA_CONTROLLER_LISTENER_NAMES&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;CONTROLLER&quot;&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;KAFKA_CONTROLLER_QUORUM_VOTERS&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1@kafka:9093&quot;&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;KAFKA_LOG_DIRS&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;/var/lib/kafka/data&quot;&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;volumes&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;kafka-data:/var/lib/kafka/data&lt;/span&gt;

&lt;span class=&quot;na&quot;&gt;volumes&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;kafka-data&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Start it:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker-compose up &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;sleep &lt;/span&gt;30  &lt;span class=&quot;c&quot;&gt;# Wait for Kafka to start&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;producer-sarah-sends-alert&quot;&gt;Producer (Sarah Sends Alert)&lt;/h4&gt;

&lt;p&gt;Create &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;producer.py&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;confluent_kafka&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Producer&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;json&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# Connect to Kafka
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;producer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Producer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;bootstrap.servers&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;localhost:9092&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# Create alert message
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;alert&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;parent_name&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Sarah&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;child_name&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Jake&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;location&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Oak Street Bus Stop&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;message&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Meeting ran over, will be 10 mins late&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# Send to Kafka topic
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;producer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;produce&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;topic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;bus-stop-alerts&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;           &lt;span class=&quot;c1&quot;&gt;# Topic name
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;json&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;dumps&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;alert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;encode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;   &lt;span class=&quot;c1&quot;&gt;# Convert to bytes
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;producer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;flush&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# Ensure it&apos;s sent
&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;✅ Alert sent: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;alert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;parent_name&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt; needs help&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Run it (ensure your virtual environment is activated):&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;python producer.py
&lt;span class=&quot;c&quot;&gt;# Output: ✅ Alert sent: Sarah needs help&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Connected to Kafka at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;localhost:9092&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Created JSON message with alert details&lt;/li&gt;
  &lt;li&gt;Sent to topic called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bus-stop-alerts&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Kafka stored it&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 id=&quot;consumer-mike-receives-alert&quot;&gt;Consumer (Mike Receives Alert)&lt;/h4&gt;

&lt;p&gt;Create &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;consumer.py&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;confluent_kafka&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Consumer&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;json&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# Connect to Kafka
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;consumer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Consumer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;
    &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;bootstrap.servers&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;localhost:9092&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;group.id&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;neighbors&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;              &lt;span class=&quot;c1&quot;&gt;# Consumer group
&lt;/span&gt;    &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;auto.offset.reset&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;earliest&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;       &lt;span class=&quot;c1&quot;&gt;# Read from beginning
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;consumer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;subscribe&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;bus-stop-alerts&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
&lt;span class=&quot;nf&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;🔔 Listening for alerts...&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;try&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;msg&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;consumer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;poll&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# Check every second
&lt;/span&gt;        
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;msg&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;is&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;continue&lt;/span&gt;
        
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;msg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
            &lt;span class=&quot;nf&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Error: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;msg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;continue&lt;/span&gt;
        
        &lt;span class=&quot;c1&quot;&gt;# Got a message!
&lt;/span&gt;        &lt;span class=&quot;n&quot;&gt;alert&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;json&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;loads&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;msg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;decode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
        
        &lt;span class=&quot;nf&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;🚨 &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;alert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;parent_name&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt; needs help!&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;   Child: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;alert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;child_name&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;   Location: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;alert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;location&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;nf&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;   Message: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;alert&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;message&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;except&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;KeyboardInterrupt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;nf&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;Stopped&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;finally&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;consumer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;close&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Run it (ensure your virtual environment is activated):&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;python consumer.py
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Output:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;🔔 Listening for alerts...

🚨 Sarah needs help!
   Child: Jake
   Location: Oak Street Bus Stop
   Message: Meeting ran over, will be 10 mins late
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Consumer connected to Kafka&lt;/li&gt;
  &lt;li&gt;Subscribed to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bus-stop-alerts&lt;/code&gt; topic&lt;/li&gt;
  &lt;li&gt;Read the message Sarah sent&lt;/li&gt;
  &lt;li&gt;Keeps running, waiting for more&lt;/li&gt;
&lt;/ol&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;understanding-kafka-concepts&quot;&gt;Understanding Kafka Concepts&lt;/h3&gt;

&lt;h4 id=&quot;topics&quot;&gt;Topics&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Like folders for messages&lt;/li&gt;
  &lt;li&gt;We used: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bus-stop-alerts&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Organizes different types of messages&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;producers&quot;&gt;Producers&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Send messages to topics&lt;/li&gt;
  &lt;li&gt;Don’t wait for consumers&lt;/li&gt;
  &lt;li&gt;Don’t know who will read it&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;consumers&quot;&gt;Consumers&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Read messages from topics&lt;/li&gt;
  &lt;li&gt;Can start from beginning or latest&lt;/li&gt;
  &lt;li&gt;Keep polling for new messages&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;consumer-groups&quot;&gt;Consumer Groups&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;Multiple consumers with same &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;group.id&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Kafka distributes messages among them&lt;/li&gt;
  &lt;li&gt;Load balancing automatically&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;try-this-messages-persist&quot;&gt;Try This: Messages Persist&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Shows:&lt;/strong&gt; Messages don’t disappear&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Start consumer, then stop it (Ctrl+C)&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Send 3 alerts:&lt;/p&gt;

    &lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;python producer.py
python producer.py
python producer.py
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;Start consumer again&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Consumer shows all 3 alerts!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; If Mike’s phone was off when Sarah sent alert, he still sees it when phone turns back on.&lt;/p&gt;

&lt;h4 id=&quot;important-consumer-offset-tracking&quot;&gt;Important: Consumer Offset Tracking&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Question:&lt;/strong&gt; “If I sent 1 alert earlier and 3 alerts now, why don’t I see all 4 alerts?”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Answer:&lt;/strong&gt; Kafka tracks where each consumer group left off reading using &lt;strong&gt;offsets&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here’s what happens:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;First run: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;producer.py&lt;/code&gt; sends alert #1&lt;/li&gt;
  &lt;li&gt;First run: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;consumer.py&lt;/code&gt; reads alert #1, Kafka marks “neighbors group read up to offset 0”&lt;/li&gt;
  &lt;li&gt;Second run: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;producer.py&lt;/code&gt; sends alerts #2, #3, #4&lt;/li&gt;
  &lt;li&gt;Second run: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;consumer.py&lt;/code&gt; only shows #2, #3, #4 (skips #1 because it was already read)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is a &lt;strong&gt;feature&lt;/strong&gt;, not a bug! Imagine if neighbors saw every alert from the past month every time they checked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To see ALL messages from the beginning:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Option 1 - Change consumer group name (line 197 in consumer.py):&lt;/p&gt;
&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;group.id&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;neighbors-v2&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# New group = starts fresh
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Option 2 - Delete the consumer group offset tracking:&lt;/p&gt;
&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker &lt;span class=&quot;nb&quot;&gt;exec&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-it&lt;/span&gt; kafka kafka-consumer-groups &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--bootstrap-server&lt;/span&gt; localhost:9092 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--delete&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--group&lt;/span&gt; neighbors
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;try-this-multiple-neighbors&quot;&gt;Try This: Multiple Neighbors&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Shows:&lt;/strong&gt; Multiple consumers share work&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Open 3 terminals&lt;/li&gt;
  &lt;li&gt;Run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;python consumer.py&lt;/code&gt; in each (with venv activated)&lt;/li&gt;
  &lt;li&gt;Send alerts from 4th terminal&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Each consumer gets different messages (load balancing)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; Multiple neighbors at bus stop, all see alerts, first one responds.&lt;/p&gt;

&lt;h4 id=&quot;important-partitions-enable-load-balancing&quot;&gt;Important: Partitions Enable Load Balancing&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Question:&lt;/strong&gt; “All messages go to one consumer. Is load balancing actually working?”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Answer:&lt;/strong&gt; With the default setup (1 partition), load balancing &lt;strong&gt;cannot work&lt;/strong&gt;. Here’s why:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/09/kafka-02.png&quot; alt=&quot;Kafka Partitions Enable Load Balancing Diagram&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Partition Rule:&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Maximum parallel consumers = Number of partitions
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;By default, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bus-stop-alerts&lt;/code&gt; has &lt;strong&gt;1 partition&lt;/strong&gt;, so:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Consumer #1 gets partition 0 (receives all messages)&lt;/li&gt;
  &lt;li&gt;Consumer #2 gets nothing (no partitions left)&lt;/li&gt;
  &lt;li&gt;Consumer #3 gets nothing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;To see actual load balancing:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Delete the topic:
    &lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker &lt;span class=&quot;nb&quot;&gt;exec&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-it&lt;/span&gt; kafka kafka-topics &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--bootstrap-server&lt;/span&gt; localhost:9092 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--delete&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--topic&lt;/span&gt; bus-stop-alerts
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;Recreate with 3 partitions:
    &lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker &lt;span class=&quot;nb&quot;&gt;exec&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-it&lt;/span&gt; kafka kafka-topics &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--bootstrap-server&lt;/span&gt; localhost:9092 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--create&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--topic&lt;/span&gt; bus-stop-alerts &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--partitions&lt;/span&gt; 3 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;--replication-factor&lt;/span&gt; 1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;Run 3 consumers in separate terminals:
    &lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;python consumer.py  &lt;span class=&quot;c&quot;&gt;# Terminal 1&lt;/span&gt;
python consumer.py  &lt;span class=&quot;c&quot;&gt;# Terminal 2&lt;/span&gt;
python consumer.py  &lt;span class=&quot;c&quot;&gt;# Terminal 3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;Send multiple alerts:
    &lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;python producer.py  &lt;span class=&quot;c&quot;&gt;# Run this 6+ times&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Now you’ll see:&lt;/strong&gt; Messages distributed across all 3 consumers!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt; More partitions = more parallelism. This is how Kafka scales to handle massive throughput.&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;the-power-of-kafka&quot;&gt;The Power of Kafka&lt;/h3&gt;

&lt;h4 id=&quot;real-world-flow&quot;&gt;Real-World Flow&lt;/h4&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;Sarah (3:10 PM)
  ↓ sends alert
Kafka (stores it)
  ↓ notifies consumers
Mike (3:11 PM) - sees alert
Lisa (3:11 PM) - sees alert
David (3:12 PM) - phone was locked, sees it now
  ↓
Mike responds &quot;I&apos;ll watch Jake&quot;
  ↓ sends confirmation through Kafka
Sarah (3:12 PM) - sees confirmation
&lt;/code&gt;&lt;/pre&gt;

&lt;h4 id=&quot;why-this-architecture-works&quot;&gt;Why This Architecture Works&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Decoupling:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Services don’t talk directly&lt;/li&gt;
  &lt;li&gt;Add/remove services without breaking others&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Persistence:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Messages stored on disk&lt;/li&gt;
  &lt;li&gt;Survive crashes and restarts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scalability:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Add more consumers = faster processing&lt;/li&gt;
  &lt;li&gt;Add more producers = handle more load&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Reliability:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;One service down? Others keep working&lt;/li&gt;
  &lt;li&gt;Messages don’t get lost&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;real-world-use-cases&quot;&gt;Real-World Use Cases&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Same pattern, different use cases:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;E-commerce:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Order placed → Kafka&lt;/li&gt;
  &lt;li&gt;Payment service charges card&lt;/li&gt;
  &lt;li&gt;Inventory service updates stock&lt;/li&gt;
  &lt;li&gt;Email service sends confirmation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Uber:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Ride requested → Kafka&lt;/li&gt;
  &lt;li&gt;Driver matching finds nearby driver&lt;/li&gt;
  &lt;li&gt;Pricing calculates fare&lt;/li&gt;
  &lt;li&gt;Notifications alert driver&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Your bus stop:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Alert sent → Kafka&lt;/li&gt;
  &lt;li&gt;Notification service alerts neighbors&lt;/li&gt;
  &lt;li&gt;Database logs the event&lt;/li&gt;
  &lt;li&gt;Analytics tracks usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;All use the same Kafka pattern you just learned.&lt;/strong&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;common-questions&quot;&gt;Common Questions&lt;/h3&gt;

&lt;p&gt;“Why not just use a database?”&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Database: Consumer constantly polls “any new data?”&lt;/li&gt;
  &lt;li&gt;Kafka: Consumer waits, Kafka notifies when ready&lt;/li&gt;
  &lt;li&gt;Result: Real-time, less load&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;“Why not just use REST API?”&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;REST: Consumer must be online NOW&lt;/li&gt;
  &lt;li&gt;Kafka: Consumer reads when ready&lt;/li&gt;
  &lt;li&gt;Result: More reliable, works offline&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;“When should I use Kafka?”&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;✅ High message volume&lt;/li&gt;
  &lt;li&gt;✅ Multiple systems need same data&lt;/li&gt;
  &lt;li&gt;✅ Can’t lose messages&lt;/li&gt;
  &lt;li&gt;✅ Need message history&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;what-you-built&quot;&gt;What You Built&lt;/h3&gt;

&lt;pre&gt;&lt;code class=&quot;language-txt&quot;&gt;bus-stop-kafka/
├── docker-compose.yml  # Kafka setup
├── producer.py         # Send alerts
├── consumer.py         # Receive alerts
├── venv/               # Virtual environment
├── .gitignore          # Git ignore file
└── README.md           # Project documentation
&lt;/code&gt;&lt;/pre&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;summary&quot;&gt;Summary&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;You learned:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;What Kafka is (message broker)&lt;/li&gt;
  &lt;li&gt;Why it’s useful (decoupling, persistence)&lt;/li&gt;
  &lt;li&gt;How to produce messages&lt;/li&gt;
  &lt;li&gt;How to consume messages&lt;/li&gt;
  &lt;li&gt;Consumer groups concept&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;You built:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Working producer that sends alerts&lt;/li&gt;
  &lt;li&gt;Working consumer that receives alerts&lt;/li&gt;
  &lt;li&gt;Everything runs locally with Docker&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;You can now:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Explain Kafka to anyone&lt;/li&gt;
  &lt;li&gt;Build event-driven systems&lt;/li&gt;
  &lt;li&gt;Apply this to other use cases&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h3 id=&quot;resources&quot;&gt;Resources&lt;/h3&gt;

&lt;p&gt;📦 &lt;strong&gt;Code:&lt;/strong&gt; &lt;a href=&quot;https://github.com/sprider/bus-stop-kafka&quot;&gt;github.com/sprider/bus-stop-kafka&lt;/a&gt;&lt;br /&gt;
📚 &lt;strong&gt;Learn More:&lt;/strong&gt; &lt;a href=&quot;https://kafka.apache.org/documentation/&quot;&gt;Kafka Docs&lt;/a&gt;&lt;br /&gt;
🎥 &lt;strong&gt;Watch:&lt;/strong&gt; &lt;a href=&quot;https://www.youtube.com/watch?v=B7CwU_tNYIE&quot;&gt;Nana’s Kafka Video&lt;/a&gt;&lt;/p&gt;
</description>
        <pubDate>Sat, 13 Sep 2025 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/kafka-crash-course-bus-stop-demo</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/kafka-crash-course-bus-stop-demo</guid>
        
        <category>Kafka</category>
        
        <category>Docker</category>
        
        <category>Python</category>
        
        <category>Event Driven</category>
        
        <category>Distributed Systems</category>
        
        
      </item>
    
      <item>
        <title>How to Build a Bible MCP Server: Complete Guide to Creating Custom AI Tools</title>
        <description>&lt;p&gt;A few months ago, I found something that changed my views on AI tools. It’s called &lt;a href=&quot;https://modelcontextprotocol.io/&quot;&gt;MCP (Model Context Protocol)&lt;/a&gt;; it allows AI models to connect to external tools and data sources.&lt;/p&gt;

&lt;h2 id=&quot;what-is-model-context-protocol-mcp-and-why-should-you-care&quot;&gt;What is Model Context Protocol (MCP) and Why Should You Care?&lt;/h2&gt;

&lt;p&gt;Think of MCP as the glue that binds different AIs into a unified network. Before MCP, if I wanted Claude to help me with Bible study, I had to copy and paste verses, look up references manually, or jump between apps. It was cumbersome and interrupted my flow.&lt;/p&gt;

&lt;p&gt;With MCP, I can create a custom server that gives &lt;a href=&quot;https://claude.ai/&quot;&gt;Claude AI&lt;/a&gt; direct access to Bible data—verses, cross-references, commentaries, and more. It’s like having a personal research assistant who never gets tired.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/08/bible-mcp-server.png&quot; alt=&quot;mcp-claude-in-action&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;step-by-step-guide-building-a-bible-mcp-server-on-cloudflare-workers&quot;&gt;Step-by-Step Guide: Building a Bible MCP Server on Cloudflare Workers&lt;/h2&gt;

&lt;p&gt;I am a hands-on solutions architect, and I wanted to make something useful for my daily Bible study. Here’s how it went down:&lt;/p&gt;

&lt;h3 id=&quot;identifying-the-problem-why-build-a-custom-bible-ai-tool&quot;&gt;Identifying the Problem: Why Build a Custom Bible AI Tool?&lt;/h3&gt;

&lt;p&gt;Every morning, I read scripture and take notes. I often wondered, What other verses relate to this theme?. But looking this up meant opening multiple tabs, losing my place, and breaking my focus.&lt;/p&gt;

&lt;p&gt;I thought, What if Claude could just know this stuff?&lt;/p&gt;

&lt;h3 id=&quot;how-to-build-your-first-mcp-server-technical-implementation&quot;&gt;How to Build Your First MCP Server: Technical Implementation&lt;/h3&gt;

&lt;p&gt;The beauty of building MCP servers is that you don’t need to be an expert programmer. I used &lt;a href=&quot;https://workers.cloudflare.com/&quot;&gt;Cloudflare Workers&lt;/a&gt; because they are free for small projects and easy to set up for custom AI integrations.&lt;/p&gt;

&lt;p&gt;My server exposes two consolidated MCP tools (updated from an earlier 6-tool design):&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;bible_content&lt;/strong&gt; — Search verses, get a single verse, passage, or full chapter (4 actions in one tool)&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;bible_reference&lt;/strong&gt; — List books or chapters to navigate Bible structure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each tool supports a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;response_format&lt;/code&gt; option (concise/detailed) to control token usage—I applied the principles from my &lt;a href=&quot;https://blog.josephvelliah.com/ai-tool-optimization-guide-mcp-server-case-study&quot;&gt;AI Tool Optimization Guide&lt;/a&gt; to reduce tool count by 67% and token usage by 60–70%.&lt;/p&gt;

&lt;p&gt;The MCP server code is surprisingly simple. Model Context Protocol handles all the complex communication, so I just focused on the Bible API integration and data logic.&lt;/p&gt;

&lt;h3 id=&quot;deploying-mcp-server-on-cloudflare-workers-free-and-fast&quot;&gt;Deploying MCP Server on Cloudflare Workers: Free and Fast&lt;/h3&gt;

&lt;p&gt;Here is where it gets interesting. I deployed my server to Cloudflare Workers, and now it runs all the time without my involvement. There’s no server maintenance, no hosting fees (thanks to Cloudflare’s generous free tier), and it works quickly because of Cloudflare’s global CDN network.&lt;/p&gt;

&lt;p&gt;Then I connected it to Claude AI through the MCP protocol integration, and suddenly, my AI assistant became a personalized Bible study companion.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/08/0-mcp-server-testing.png&quot; alt=&quot;mcp-server-testing&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/08/3-mcp-claude-dev-settings.png&quot; alt=&quot;mcp-claude-dev-settings&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/08/6a-mcp-claude-in-action.png&quot; alt=&quot;mcp-claude-in-action&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;mcp-server-benefits-real-world-ai-integration-results&quot;&gt;MCP Server Benefits: Real-World AI Integration Results&lt;/h2&gt;

&lt;p&gt;Now, when I study, I can ask Claude questions like:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;How did Jesus feed 5,000 people?&lt;/li&gt;
  &lt;li&gt;What is the context around Romans 8:28?&lt;/li&gt;
  &lt;li&gt;Compare this verse across different translations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude doesn’t just give me generic answers; it pulls real data from my server and provides exactly what I need.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/08/6b-mcp-claude-in-action.png&quot; alt=&quot;mcp-claude-in-action&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;beyond-bible-study-mcp-use-cases-for-custom-ai-development&quot;&gt;Beyond Bible Study: MCP Use Cases for Custom AI Development&lt;/h2&gt;

&lt;p&gt;What excites me most is not just my Bible server. It’s the concept behind MCP. We are transitioning from AI language models that act as isolated information bubbles to AI assistants engaging with real-world data and APIs through custom MCP servers.&lt;/p&gt;

&lt;p&gt;Imagine connecting your AI to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Your company’s internal database&lt;/li&gt;
  &lt;li&gt;Your personal calendar and task manager&lt;/li&gt;
  &lt;li&gt;Stock market APIs for real-time trading info&lt;/li&gt;
  &lt;li&gt;Your smart home devices&lt;/li&gt;
  &lt;li&gt;Medical databases for health research&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The options are endless, and the barrier to entry is surprisingly low.&lt;/p&gt;

&lt;h2 id=&quot;mcp-development-best-practices-lessons-from-building-ai-tools&quot;&gt;MCP Development Best Practices: Lessons from Building AI Tools&lt;/h2&gt;

&lt;p&gt;Building this MCP server for Bible study taught me several important lessons about custom AI development:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Start small&lt;/strong&gt;: I didn’t try to build everything all at once. My first version just returned single verses. Then I added search. Iteration is your friend.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Think in tasks, not endpoints&lt;/strong&gt;: I later consolidated six tools into two by grouping related actions (search, verse, passage, chapter into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bible_content&lt;/code&gt;). Adding a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;response_format&lt;/code&gt; parameter (concise vs. detailed) cut token usage by 60–70%. See my &lt;a href=&quot;https://blog.josephvelliah.com/ai-tool-optimization-guide-mcp-server-case-study&quot;&gt;AI Tool Optimization Guide&lt;/a&gt; for the full approach.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;MCP does the heavy lifting&lt;/strong&gt;: I spent much more time thinking about the Bible data structure than the protocol. MCP simplifies all the connection challenges.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Free tier is powerful&lt;/strong&gt;: I built something useful without spending a dime using Cloudflare Workers and free Bible APIs.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Documentation matters&lt;/strong&gt;: When I got stuck, the MCP documentation and community examples saved me hours of debugging.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;advanced-mcp-features-scaling-your-custom-ai-server&quot;&gt;Advanced MCP Features: Scaling Your Custom AI Server&lt;/h2&gt;

&lt;p&gt;I’ve already optimized the server for token efficiency: tools are consolidated by task, and a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;response_format&lt;/code&gt; parameter lets the agent request concise or detailed responses. Next, I’m considering commentary integration or connecting to my personal notes and highlights. The great thing about MCP is that adding new features doesn’t require rebuilding everything—I just extend the existing tools.&lt;/p&gt;

&lt;h2 id=&quot;the-future-of-ai-customization-why-mcp-matters-for-developers&quot;&gt;The Future of AI Customization: Why MCP Matters for Developers&lt;/h2&gt;

&lt;p&gt;We are witnessing the rise of personalized AI. Not AI that knows everything about everyone, but AI that knows exactly what you need it to know. MCP makes this possible by letting us build connections between AI models and our specific data sources.&lt;/p&gt;

&lt;p&gt;My Bible MCP server is just one example, but it represents something larger: the opening up of AI customization. You don’t need to work at a tech company to empower your AI assistant. You just need an idea and a weekend.&lt;/p&gt;

&lt;p&gt;If you’re interested in building your own MCP server, start with something you care about. For me, it was Bible study. For you, it might be cooking recipes, fitness tracking, or managing your book collection. The technical side is the easy part—the harder part is deciding how you want your AI to assist you.&lt;/p&gt;

&lt;p&gt;Once you experience an AI assistant that truly understands your specific needs and can access your data, you won’t want to go back to generic chatbots.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;em&gt;You can find my &lt;a href=&quot;https://github.com/sprider/cloudflare-mcp-server-bible&quot;&gt;Bible MCP server code on GitHub&lt;/a&gt; if you want to see how it works or build something similar. For more MCP examples, check out the &lt;a href=&quot;https://modelcontextprotocol.io/docs&quot;&gt;official MCP documentation&lt;/a&gt; and &lt;a href=&quot;https://github.com/modelcontextprotocol/servers&quot;&gt;MCP server examples&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
</description>
        <pubDate>Sun, 24 Aug 2025 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/bible-mcp-server</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/bible-mcp-server</guid>
        
        <category>MCP server</category>
        
        <category>Model Context Protocol</category>
        
        <category>API</category>
        
        <category>Claude AI</category>
        
        <category>Cloudflare Workers</category>
        
        <category>AI</category>
        
        <category>LLM</category>
        
        <category>Faith</category>
        
        
      </item>
    
      <item>
        <title>AI Not Working Consistently? Here is How to Control Your Results</title>
        <description>&lt;p&gt;Imagine you are using a highend espresso machine at work. Sometimes you get the perfect cup(rich, smooth, and exactly the right strength). Other times even when you seem to follow the same process, you end up with bitter, weak, or overpowering coffee. You would probably think the machine is unreliable, right?&lt;/p&gt;

&lt;p&gt;This is what business professionals face with AI tools every day. You ask for a marketing email and receive something brilliant. But ask again with the same prompt, and suddenly it sounds robotic. The issue is not that AI is unreliable; it is that most people do not realize there are hidden settings controlling every response.&lt;/p&gt;

&lt;p&gt;Just like that espresso machine has temperature controls, grind settings, and pressure adjustments you might not notice, AI tools have seven key parameters that act as invisible control knobs. Once you understand what these knobs do, you can stop getting random results and start producing consistently excellent AI responses.&lt;/p&gt;

&lt;p&gt;Today, we are going to uncover these seven hidden controls(LLM parameters) and show you how to adjust them. You would not need any technical expertise—just practical examples that will change your AI results from unpredictable to reliable and consistent.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/07/llm-parameters.svg&quot; alt=&quot;llm-parameters&quot; /&gt;&lt;/p&gt;
</description>
        <pubDate>Sat, 12 Jul 2025 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/how-to-control-ai-results</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/how-to-control-ai-results</guid>
        
        <category>LLM</category>
        
        <category>AI</category>
        
        
      </item>
    
      <item>
        <title>Streamline Location-Relevant Answers with SharePoint, Amazon Nova and Bedrock</title>
        <description>&lt;p&gt;Global companies often face challenges in providing employees with location-relevant policies. For instance, leave policies in the USA differ significantly from those in India. However, when documents are stored together in systems like SharePoint without proper filtering, employees may waste time searching or risk following incorrect policies. The unfiltered content in Amazon Bedrock creates problems for its knowledge base by producing incorrect answers.&lt;/p&gt;

&lt;h2 id=&quot;the-solution-metadata-filtering-with-sharepoint-and-amazon-bedrock&quot;&gt;The Solution: Metadata Filtering with SharePoint and Amazon Bedrock&lt;/h2&gt;

&lt;p&gt;By integrating &lt;strong&gt;Amazon Bedrock Knowledge Bases&lt;/strong&gt; with &lt;strong&gt;SharePoint&lt;/strong&gt; and leveraging metadata filtering, companies can create intelligent Retrieval-Augmented Generation (RAG) systems. These systems automatically retrieve relevant policy documents based on location filters so employees get the right location-relevant information.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/04/aws-bedrock-kb-sharepoint.svg&quot; alt=&quot;aws-bedrock-kb-sharepoint&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;how-it-works-in-simple-terms&quot;&gt;How It Works (In Simple Terms)&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Organize Documents in SharePoint&lt;/strong&gt;: Assign metadata (e.g., country-relevant tags) to each document.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Connect SharePoint to Amazon Bedrock&lt;/strong&gt;: Sync SharePoint as a data source for Amazon Bedrock Knowledge Bases.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Apply Metadata Filters&lt;/strong&gt;: Use filters to retrieve only location-relevant content when employees query the system.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;real-world-example-leave-policies&quot;&gt;Real-World Example: Leave Policies&lt;/h2&gt;

&lt;p&gt;Consider leave policies for the USA and India:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;USA Policy&lt;/strong&gt;: Based on ACME Corporation’s USA Employee Leave Policy, employees receive different types of leave: Vacation Leave (0-2 years of service: 10 days/80 hours), Sick Leave - 5 days (40 hours) per calendar year. Additionally, employees receive paid holidays (11 days), bereavement leave, and jury duty leave. Eligible employees may receive up to 12 weeks for parental leave.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;India Policy&lt;/strong&gt;: According to ACME Corporation India’s leave policy, you are entitled to the following types of leave: Privilege/Earned Leave: 24 days per year, Sick/Casual Leave: 12 days per calendar year. Optional Holidays: 2 days per year. The policy includes other types of leave such as Maternity Leave: 26 weeks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Disclaimer: The leave policies uploaded to SharePoint for this demonstration were created using AI. The AI generated policies are for illustrative purposes only.&lt;/p&gt;

&lt;p&gt;Using metadata filtering:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Employees in the USA see only the USA policy.&lt;/li&gt;
  &lt;li&gt;Employees in India see only the India policy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This eliminates confusion and ensures compliance.&lt;/p&gt;

&lt;h2 id=&quot;implementation-steps&quot;&gt;Implementation Steps&lt;/h2&gt;

&lt;h3 id=&quot;add-metadata-to-your-sharepoint-documents&quot;&gt;Add metadata to your SharePoint documents&lt;/h3&gt;

&lt;p&gt;First, ensure your documents have the right metadata in SharePoint:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;We will use the default Title column in your SharePoint document library&lt;/li&gt;
  &lt;li&gt;Assign “Leave_Policy_USA” or “Leave_Policy_India” to the appropriate documents&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/04/aws-bedrock-sp-0.png&quot; alt=&quot;aws-bedrock-sp-0&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;set-up-a-connection-between-sharepoint-and-amazon-bedrock&quot;&gt;Set up a connection between SharePoint and Amazon Bedrock&lt;/h3&gt;

&lt;p&gt;Next, &lt;a href=&quot;https://docs.aws.amazon.com/bedrock/latest/userguide/sharepoint-data-source-connector.html&quot;&gt;set up&lt;/a&gt; a connection between SharePoint and Amazon Bedrock:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;In AWS console, create a new Knowledge Base&lt;/li&gt;
  &lt;li&gt;Select SharePoint as your data source&lt;/li&gt;
  &lt;li&gt;Set up SharePoint App-Only authentication to connect to SharePoint&lt;/li&gt;
  &lt;li&gt;Sync the data source to begin indexing content from SharePoint&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/04/aws-bedrock-sp-5.png&quot; alt=&quot;aws-bedrock-sp-5&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/04/aws-bedrock-sp-10.png&quot; alt=&quot;aws-bedrock-sp-10&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/04/aws-bedrock-sp-11.png&quot; alt=&quot;aws-bedrock-sp-11&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Note: I’m still exploring how custom metadata columns can be used for unstructured data formats. If I find a solution, I’ll create a separate blog post. For now, we’ll focus on using the out-of-the-box metadata fields generated by the OpenSearch collection.&lt;/p&gt;

&lt;h3 id=&quot;test-metadata-filtering-using-sample-queries-to-ensure-accuracy&quot;&gt;Test metadata filtering using sample queries to ensure accuracy&lt;/h3&gt;

&lt;p&gt;Let us test a few questions both with and without filters to see how the selected model generates responses. This will help demonstrate the difference in relevance and accuracy when metadata filtering is used. For this example, I’ve used the Nova Pro 1.0 model to generate the responses.&lt;/p&gt;

&lt;h3 id=&quot;no-filter&quot;&gt;No Filter&lt;/h3&gt;

&lt;p&gt;As you can see, the answers are a mix of both USA and India policies, with chunks being pulled from documents for both regions.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/04/aws-bedrock-sp-6.png&quot; alt=&quot;aws-bedrock-sp-6&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;with-x-amz-bedrock-kb-title--leave_policy_usa-filter&quot;&gt;With x-amz-bedrock-kb-title ^ Leave_Policy_USA Filter&lt;/h3&gt;

&lt;p&gt;With the filter x-amz-bedrock-kb-title ^ Leave_Policy_USA, the response is clearly relevant to the USA, showing only the relevant policy for that region.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/04/aws-bedrock-sp-7.png&quot; alt=&quot;aws-bedrock-sp-7&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;with-x-amz-bedrock-kb-title--leave_policy_india-filter&quot;&gt;With x-amz-bedrock-kb-title ^ Leave_Policy_India Filter&lt;/h3&gt;

&lt;p&gt;With the filter x-amz-bedrock-kb-title ^ Leave_Policy_India, the response is clearly relevant to the India, showing only the relevant policy for that region.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/04/aws-bedrock-sp-8.png&quot; alt=&quot;aws-bedrock-sp-8&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;benefits-of-metadata-filtering&quot;&gt;Benefits of Metadata Filtering&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Accurate Information&lt;/strong&gt;: Employees access policies relevant to their region.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Time-Saving&lt;/strong&gt;: Reduces time spent sifting through irrelevant documents.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Improved Compliance&lt;/strong&gt;: Ensures employees follow the correct policies.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Centralized Management&lt;/strong&gt;: All policies remain in one system for easy updates.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Combining SharePoint’s document management capabilities with Amazon Bedrock’s metadata filtering creates a powerful solution for global organizations. This approach simplifies policy management and ensures employees receive accurate, location-relevant information without requiring complex coding or major system changes.&lt;/p&gt;
</description>
        <pubDate>Sat, 19 Apr 2025 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/sharepoint-amazon-bedrock</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/sharepoint-amazon-bedrock</guid>
        
        <category>Amazon Bedrock</category>
        
        <category>SharePoint</category>
        
        <category>RAG</category>
        
        <category>Knowledge Base</category>
        
        
      </item>
    
      <item>
        <title>Docker Model Runner: Run AI Models Locally with Seamless Integration</title>
        <description>&lt;p&gt;In the fast-paced world of AI development, tools that simplify the process of running and integrating AI models locally are in high demand. Docker’s latest beta feature, Model Runner, offers developers a seamless way to work with AI models directly within their existing Docker environment. This article explores the features, benefits, and practical applications of Docker Model Runner, making it an essential resource for developers looking to optimize their AI workflows.&lt;/p&gt;

&lt;h2 id=&quot;what-is-docker-model-runner&quot;&gt;What is Docker Model Runner?&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Docker Model Runner&lt;/strong&gt; is a beta &lt;a href=&quot;https://docs.docker.com/desktop/features/model-runner/&quot;&gt;feature&lt;/a&gt; designed for &lt;strong&gt;Docker Desktop&lt;/strong&gt; users, enabling them to download, run, and manage &lt;strong&gt;AI models locally&lt;/strong&gt;. By pulling models from &lt;strong&gt;Docker Hub&lt;/strong&gt;, storing them locally, and loading them into memory only when needed, it optimizes system resources. For developers already familiar with Docker’s containerization tools, Model Runner provides a streamlined experience with OpenAI-compatible APIs for easy integration into applications.&lt;/p&gt;

&lt;h3 id=&quot;key-benefits-of-docker-model-runner&quot;&gt;Key Benefits of Docker Model Runner&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Local AI model management&lt;/strong&gt;: Pull, run, and remove models directly from the command line.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Resource optimization&lt;/strong&gt;: Models are only loaded at runtime and unloaded when idle.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;OpenAI-compatible APIs&lt;/strong&gt;: Simplify integration with existing applications.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Familiar workflows&lt;/strong&gt;: Leverage Docker commands you already know.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;key-features-of-docker-model-runner&quot;&gt;Key Features of Docker Model Runner&lt;/h2&gt;

&lt;p&gt;To make the most of Docker Model Runner, here are its standout features:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Pull AI models&lt;/strong&gt; directly from Docker Hub.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Run models locally&lt;/strong&gt; using simple commands.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Manage local models&lt;/strong&gt; with options to add, list, or remove them.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Interact with models&lt;/strong&gt; via prompts or chat mode.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Optimize resource usage&lt;/strong&gt;, ensuring efficient memory management.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Access OpenAI-compatible APIs&lt;/strong&gt;, enabling seamless integration into your applications.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These features make Docker Model Runner a game-changer for developers aiming to build custom AI assistants or agents.&lt;/p&gt;

&lt;h2 id=&quot;how-to-get-started-with-docker-model-runner&quot;&gt;How to Get Started with Docker Model Runner&lt;/h2&gt;

&lt;h3 id=&quot;prerequisites&quot;&gt;Prerequisites&lt;/h3&gt;

&lt;p&gt;To start using Docker Model Runner, you’ll need:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Docker Desktop version 4.40 or later&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;A Mac with Apple Silicon (currently supported)&lt;/li&gt;
  &lt;li&gt;Beta features enabled in Docker Desktop under “Features in development”&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;basic-commands&quot;&gt;Basic Commands&lt;/h3&gt;

&lt;p&gt;Here’s a quick guide to essential commands:&lt;/p&gt;

&lt;p&gt;Check if Model Runner is active&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker model status
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/04/docker-model-status.png&quot; alt=&quot;docker-model-status&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Pull a model from Docker Hub&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker model pull ai/smollm2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/04/docker-model-pull.png&quot; alt=&quot;docker-model-pull&quot; /&gt;&lt;/p&gt;

&lt;p&gt;List downloaded models&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker model list
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/04/docker-model-list.png&quot; alt=&quot;docker-model-list&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Run a model with a single prompt&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker model run ai/smollm2 &lt;span class=&quot;s2&quot;&gt;&quot;What is Kubernetes?&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/04/docker-model-run.png&quot; alt=&quot;docker-model-run&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Remove a model&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker model &lt;span class=&quot;nb&quot;&gt;rm &lt;/span&gt;ai/smollm2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/04/docker-model-rm.png&quot; alt=&quot;docker-model-rm&quot; /&gt;&lt;/p&gt;

&lt;p&gt;These commands allow you to efficiently manage and interact with AI models directly from your terminal.&lt;/p&gt;

&lt;h2 id=&quot;building-ai-assistants-with-docker-model-runner&quot;&gt;Building AI Assistants with Docker Model Runner&lt;/h2&gt;

&lt;p&gt;One of the most exciting use cases for Docker Model Runner is building custom AI assistants. Developers can integrate these assistants into applications using OpenAI-compatible APIs. Here’s how you can access these APIs:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;From Within Containers&lt;/strong&gt;:&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;   &lt;span class=&quot;c&quot;&gt;#!/bin/sh&lt;/span&gt;

   curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
      &lt;span class=&quot;nt&quot;&gt;-H&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Content-Type: application/json&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
      &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;{
         &quot;model&quot;: &quot;ai/smollm2&quot;,
         &quot;messages&quot;: [
               {
                  &quot;role&quot;: &quot;system&quot;,
                  &quot;content&quot;: &quot;You are a helpful assistant.&quot;
               },
               {
                  &quot;role&quot;: &quot;user&quot;,
                  &quot;content&quot;: &quot;What is Kubernetes?&quot;
               }
         ]
      }&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;From the Host (Unix Socket)&lt;/strong&gt;:&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;   &lt;span class=&quot;c&quot;&gt;#!/bin/sh&lt;/span&gt;

   curl &lt;span class=&quot;nt&quot;&gt;--unix-socket&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$HOME&lt;/span&gt;/.docker/run/docker.sock &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
      localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
      &lt;span class=&quot;nt&quot;&gt;-H&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Content-Type: application/json&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
      &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;{
         &quot;model&quot;: &quot;ai/smollm2&quot;,
         &quot;messages&quot;: [
               {
                  &quot;role&quot;: &quot;system&quot;,
                  &quot;content&quot;: &quot;You are a helpful assistant.&quot;
               },
               {
                  &quot;role&quot;: &quot;user&quot;,
                  &quot;content&quot;: &quot;What is Kubernetes?&quot;
               }
         ]
      }&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;From the host using TCP&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;If you prefer to interact with the API directly from your host machine using TCP rather than a Docker socket, you can enable this functionality. TCP support can be activated either   through the Docker Desktop graphical interface or by using the Docker Desktop command line with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sh docker desktop enable model-runner --tcp &amp;lt;port&amp;gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Once TCP support is enabled, you can communicate with the API through localhost using either your specified port number or the default port, following the same request format shown in the previous examples.&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;   &lt;span class=&quot;c&quot;&gt;#!/bin/sh&lt;/span&gt;

   curl http://localhost:12434/engines/llama.cpp/v1/chat/completions &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
      &lt;span class=&quot;nt&quot;&gt;-H&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Content-Type: application/json&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
      &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;{
         &quot;model&quot;: &quot;ai/smollm2&quot;,
         &quot;messages&quot;: [
               {
                  &quot;role&quot;: &quot;system&quot;,
                  &quot;content&quot;: &quot;You are a helpful assistant.&quot;
               },
               {
                  &quot;role&quot;: &quot;user&quot;,
                  &quot;content&quot;: &quot;What is Kubernetes?&quot;
               }
         ]
      }&apos;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For hands-on examples, check out the official &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;https://github.com/docker/hello-genai.git&lt;/code&gt; repository on GitHub. It includes sample applications in Python, Node.js, and Go.&lt;/p&gt;

&lt;h2 id=&quot;where-to-find-models&quot;&gt;Where to Find Models&lt;/h2&gt;

&lt;p&gt;Docker provides an extensive collection of pre-trained AI models on its &lt;strong&gt;Gen AI Catalog&lt;/strong&gt; at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;https://hub.docker.com/catalogs/gen-ai&lt;/code&gt;. Popular options include:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;SmolLM2&lt;/strong&gt;: Tiny LLM built for speed, edge devices, and local development.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Llama Models&lt;/strong&gt;: Available in various sizes for different use cases.&lt;/li&gt;
  &lt;li&gt;Other optimized models tailored for specific applications.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This centralized hub simplifies finding and deploying the right model for your needs.&lt;/p&gt;

&lt;h2 id=&quot;known-limitations&quot;&gt;Known Limitations&lt;/h2&gt;

&lt;p&gt;While Docker Model Runner shows great promise, it’s important to note some current limitations:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Lack of safeguards for oversized models that may exceed system resources.&lt;/li&gt;
  &lt;li&gt;Chat interface may still launch even if the model pull fails.&lt;/li&gt;
  &lt;li&gt;Progress reporting during model pulls can be inconsistent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These issues are expected to improve as the feature evolves beyond its beta phase.&lt;/p&gt;

&lt;h2 id=&quot;why-choose-docker-model-runner&quot;&gt;Why Choose Docker Model Runner?&lt;/h2&gt;

&lt;p&gt;For developers already working within the Docker ecosystem, Model Runner offers several compelling advantages:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Unified platform&lt;/strong&gt;: Manage both containers and AI models in one environment.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Familiar commands&lt;/strong&gt;: No steep learning curve for existing Docker users.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Resource efficiency&lt;/strong&gt;: Load models only when needed to save memory.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Seamless integration&lt;/strong&gt;: Easily connect AI capabilities to your applications via OpenAI-compatible APIs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By leveraging these benefits, developers can enhance their productivity while simplifying their workflows.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;As artificial intelligence becomes increasingly integral to modern applications, tools like Docker Model Runner are paving the way for more accessible and efficient development processes. With its ability to integrate seamlessly into existing workflows while optimizing resource usage, this beta feature holds immense potential for developers and DevOps engineers alike.&lt;/p&gt;

&lt;p&gt;Start exploring Docker Model Runner today and take your AI development workflow to the next level!&lt;/p&gt;
</description>
        <pubDate>Mon, 07 Apr 2025 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/docker_model_runner_intro</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/docker_model_runner_intro</guid>
        
        <category>Docker</category>
        
        <category>AI</category>
        
        <category>LLM</category>
        
        
      </item>
    
      <item>
        <title>DeepSeek vs. OpenAI: The AI Race Heats Up !</title>
        <description>&lt;p&gt;The world is buzzing about how DeepSeek has outperformed OpenAI on various benchmarks. This got me thinking that AI is evolving at an incredible speed, with endless opportunities. But beyond competition, what excites me is how AI can meaningfully solve real-world problems.&lt;/p&gt;

&lt;h2 id=&quot;-a-personal-story&quot;&gt;💡 A Personal Story&lt;/h2&gt;

&lt;p&gt;Due to my job, I have traveled to various cities in India and abroad. No matter where I go, one of the first things I look for is a church to attend. Over the years, I have noticed a familiar pattern: churches want to engage members meaningfully, and members wish to contribute through volunteering (kids’ ministry, small groups, outreach, etc.). However, most churches still rely on weekly announcements and flyers to match people with opportunities.&lt;/p&gt;

&lt;p&gt;That is when I thought: Can AI help? 🤔&lt;/p&gt;

&lt;h2 id=&quot;-introducing-ministry-matcher&quot;&gt;🎯 Introducing Ministry Matcher&lt;/h2&gt;

&lt;p&gt;I built an AI-driven hobby project “Ministry Matcher” to connect church members with service opportunities based on their backgrounds and interests. Instead of waiting for announcements, members can explore personalized recommendations in seconds!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/02/mm-app1.png&quot; alt=&quot;postman&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To simplify this hobby project, I wrapped the logic using Python and OpenAI’s chat completion API with custom prompt instructions(which can be changed at any time) in a Docker image and deployed it using the Azure Container Apps.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/02/mm-app2.png&quot; alt=&quot;postman&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2025/02/mm-app3.png&quot; alt=&quot;postman&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The best part? This concept can be applied beyond churches&lt;/p&gt;

&lt;p&gt;✅ New employees in a company finding the right team activities&lt;/p&gt;

&lt;p&gt;✅ New customers in a bank discovering tailored services&lt;/p&gt;

&lt;p&gt;✅ Community groups onboarding new members seamlessly&lt;/p&gt;

&lt;p&gt;🌟 AI is more than just a race between models. It is about impact. Let us use technology to make life easier and more meaningful.&lt;/p&gt;

&lt;p&gt;I would love to hear your thoughts! What are some ways AI can help in community engagement? 💙&lt;/p&gt;
</description>
        <pubDate>Sat, 01 Feb 2025 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/deepseek-vs-openai-the-ai-race-heats-up</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/deepseek-vs-openai-the-ai-race-heats-up</guid>
        
        <category>AI</category>
        
        <category>DeepSeek</category>
        
        <category>OpenAI</category>
        
        <category>Faith</category>
        
        <category>AI</category>
        
        <category>Community</category>
        
        <category>Azure</category>
        
        <category>Docker</category>
        
        
      </item>
    
      <item>
        <title>Modernizing Bot Infrastructure: A Kubernetes Success Story</title>
        <description>&lt;p&gt;I led a project transforming our scattered bot infrastructure to Kubernetes. With bots spread across multiple servers and tech stacks, our teams faced maintenance challenges and rising costs.&lt;/p&gt;

&lt;h2 id=&quot;-the-challenge&quot;&gt;🎲 The challenge&lt;/h2&gt;

&lt;p&gt;Bots were created for various projects using different tech stacks and deployed across multiple servers. It created a complex system with:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Inconsistent deployment processes&lt;/li&gt;
  &lt;li&gt;Varied maintenance requirements&lt;/li&gt;
  &lt;li&gt;Redundant infrastructure costs&lt;/li&gt;
  &lt;li&gt;Limited scalability options&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;how-we-tackled&quot;&gt;How we tackled&lt;/h2&gt;

&lt;p&gt;💪 Here is how we tackled it at a high level using the Assess, Mobilize, and Modernize framework:&lt;/p&gt;

&lt;h3 id=&quot;-assess-aws-application-discovery-service-ads-revealed-crucial-insights&quot;&gt;🔍 Assess: AWS Application Discovery Service (ADS) revealed crucial insights&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Mapped bot dependencies across different environments&lt;/li&gt;
  &lt;li&gt;Identified resource utilization overlap&lt;/li&gt;
  &lt;li&gt;Uncovered opportunities to standardize common functionalities&lt;/li&gt;
  &lt;li&gt;Created detailed migration paths for each bot’s unique requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;️-mobilize-established-our-kubernetes-foundation&quot;&gt;🏗️ Mobilize: Established our Kubernetes foundation&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Prepared an existing Kubernetes cluster for hosting bot applications&lt;/li&gt;
  &lt;li&gt;Created standardized templates for bot containerization&lt;/li&gt;
  &lt;li&gt;Conducted hands-on workshops for team upskilling&lt;/li&gt;
  &lt;li&gt;Implemented centralized monitoring and logging&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;-modernize-executed-our-transformation&quot;&gt;⚡ Modernize: Executed our transformation&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Refactored bots into containerized applications&lt;/li&gt;
  &lt;li&gt;Established automated testing and validation&lt;/li&gt;
  &lt;li&gt;Deployed the bots via DevSecOps pipelines&lt;/li&gt;
  &lt;li&gt;Monitored and refined deployed resources&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;-key-learnings&quot;&gt;📕 Key Learnings&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Using AWS Application Discovery Service helped us understand how our systems were connected and being used, which guided our migration planning&lt;/li&gt;
  &lt;li&gt;The team adoption process depended on enabling workshops and documentation&lt;/li&gt;
  &lt;li&gt;Standardized templates accelerated the containerization process&lt;/li&gt;
  &lt;li&gt;Ongoing feedback loops played a crucial role in improving our migration approach&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;-impact&quot;&gt;🎯 Impact&lt;/h2&gt;

&lt;p&gt;The migration changed our operations. Deployment cycles shrank from hours to minutes. We cut our monthly spending by 60%. Our new infrastructure maintains consistent uptime with zero-downtime deployments as standard practice.&lt;/p&gt;

&lt;p&gt;The impact extended beyond just technical enhancements. Because of this change in our work culture, our development cycles moved faster, inspiring innovation throughout our projects. Teams that used to work separately started collaborating regularly by exchanging knowledge and resources.&lt;/p&gt;

&lt;p&gt;🤝 Would love to hear your modernization story! What challenges have you encountered so far?&lt;/p&gt;
</description>
        <pubDate>Sun, 26 Jan 2025 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/modernizing-bot-infrastructure-a-kubernetes-success-story</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/modernizing-bot-infrastructure-a-kubernetes-success-story</guid>
        
        <category>Cloud Computing</category>
        
        <category>AWS</category>
        
        <category>Docker</category>
        
        <category>Kubernetes</category>
        
        <category>DevOps</category>
        
        <category>Software Engineering</category>
        
        <category>Cloud Computing</category>
        
        <category>Migration</category>
        
        
      </item>
    
      <item>
        <title>Hidden Powers of Kubernetes Sidecar Containers: Beyond the Basics</title>
        <description>&lt;p&gt;While most Kubernetes engineers know what sidecar containers are and why they are necessary for logging and service mesh, these useful little components have other less obvious capabilities that are not well known. Let’s look at some powerful features that could revolutionize your containerized applications.&lt;/p&gt;

&lt;h2 id=&quot;shared-filesystem-superpowers&quot;&gt;Shared Filesystem Superpowers&lt;/h2&gt;

&lt;p&gt;This enables real-time file watching, dynamic SSL renewal, and real-time log processing. Imagine configuring hot reload or running security scans without touching the primary application through shared volume mounts, allowing sidecar containers to communicate with the main container’s filesystem.&lt;/p&gt;

&lt;p&gt;This example demonstrates filesystem sharing between containers in a pod using an emptyDir volume. The main nginx container and a sidecar container share access to the same volume, where the sidecar writes a timestamp every 10 seconds that nginx then serves as its index page. This showcases how containers within a pod can communicate through a shared filesystem.&lt;/p&gt;

&lt;p&gt;shared-filesystem.yaml&lt;/p&gt;

&lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;na&quot;&gt;apiVersion&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;v1&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;kind&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;Pod&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;metadata&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;shared-filesystem-demo&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;spec&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;containers&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;main-app&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;nginx:latest&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;volumeMounts&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;shared-data&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;mountPath&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;/usr/share/nginx/html&lt;/span&gt;
  &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;sidecar-config&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;busybox&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;/bin/sh&quot;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-c&quot;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;while&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;true;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;do&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;/data/index.html;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sleep&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;10;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;done&quot;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;volumeMounts&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;shared-data&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;mountPath&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;/data&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;volumes&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;shared-data&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;emptyDir&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Deploy the pod&lt;/span&gt;
kubectl apply &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; shared-filesystem.yaml

&lt;span class=&quot;c&quot;&gt;# Watch the main container&apos;s filesystem changes&lt;/span&gt;
kubectl &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;shared-filesystem-demo &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; main-app &lt;span class=&quot;nt&quot;&gt;--&lt;/span&gt; /bin/sh &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;while true; do clear; cat /usr/share/nginx/html/index.html; sleep 2; done&apos;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Verify the content is updating&lt;/span&gt;
kubectl port-forward shared-filesystem-demo 8080:80
curl localhost:8080
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;the-init-sidecar-pattern&quot;&gt;The Init Sidecar Pattern&lt;/h2&gt;

&lt;p&gt;The init sidecar pattern—where a sidecar container in the pod spec finishes its execution before the main container can start—is less well-known. This pattern is particularly useful for dynamic resource configuration and runtime dependency injection, offering more flexibility than traditional init containers.&lt;/p&gt;

&lt;p&gt;This Kubernetes manifest does sequential configuration management through three containers: an init container sets up an initial config file, then the main container reads this file every 30 seconds, while simultaneously a sidecar container updates the same file with timestamps every 60 seconds. All containers share a common emptyDir volume to enable this file-based communication.&lt;/p&gt;

&lt;p&gt;init-sidecar.yaml&lt;/p&gt;

&lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;na&quot;&gt;apiVersion&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;v1&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;kind&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;Pod&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;metadata&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;init-sidecar-demo&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;spec&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;initContainers&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;init-config&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;busybox&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sh&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-c&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;echo&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Initial&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;config&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;/config/config.ini&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;volumeMounts&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;config-vol&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;mountPath&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;/config&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;containers&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;main-app&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;ubuntu:latest&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sh&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-c&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;while&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;true;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;do&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;cat&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;/app/config.ini;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sleep&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;30;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;done&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;volumeMounts&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;config-vol&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;mountPath&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;/app&lt;/span&gt;
  &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;sidecar-config-updater&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;busybox&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sh&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-c&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;while&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;true;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;do&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;echo&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Updated&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;$(date)&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;/config/config.ini;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sleep&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;60;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;done&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;volumeMounts&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;config-vol&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;mountPath&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;/config&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;volumes&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;config-vol&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;emptyDir&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Deploy and monitor&lt;/span&gt;
kubectl apply &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; init-sidecar.yaml
kubectl logs init-sidecar-demo &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; main-app
kubectl logs init-sidecar-demo &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; sidecar-config-updater
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;process-level-communication&quot;&gt;Process-Level Communication&lt;/h2&gt;

&lt;p&gt;When configured with shareProcessNamespace: true, you can send UNIX signals from the sidecar to processes within the main container. With standard container interactions, you won’t be able to perform graceful shutdowns, health management, and other forms of sophisticated debugging.&lt;/p&gt;

&lt;p&gt;This manifest demonstrates inter-container process communication using Linux signals. With shared process namespace enabled, the signal-sender container sends a SIGHUP signal every 10 seconds to the main container, which is configured with a trap handler to respond with “Received SIGHUP!” message when it receives the signal, while also displaying the date every 5 seconds.&lt;/p&gt;

&lt;p&gt;signal-demo.yaml&lt;/p&gt;

&lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;na&quot;&gt;apiVersion&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;v1&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;kind&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;Pod&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;metadata&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;signal-demo&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;spec&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;shareProcessNamespace&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# This enables processes to see each other across containers&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;containers&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;main-app&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;busybox&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;/bin/sh&quot;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-c&quot;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;|&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;echo &quot;Starting main process...&quot;&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;# trap command sets up a signal handler&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;# When SIGHUP is received, it will execute &apos;echo &quot;Received SIGHUP!&quot;&apos;&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;trap &apos;echo &quot;Received SIGHUP!&quot;&apos; HUP&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;while true; do&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;date&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;sleep 5&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;done&lt;/span&gt;
  &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;signal-sender&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;busybox&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;/bin/sh&quot;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-c&quot;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;|&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;sleep 2&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;while true; do&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;echo &quot;Sending SIGHUP to main process...&quot;&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;# pkill finds and sends signals to processes based on their name/pattern&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;# -HUP: sends SIGHUP signal&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;# -f: matches against full command line&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;# &quot;Starting main process&quot;: pattern to match (from the echo in main-app)&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;pkill -HUP -f &quot;Starting main process&quot;&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;sleep 10&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Deploy the signal demo&lt;/span&gt;
kubectl apply &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; signal-demo.yaml

&lt;span class=&quot;c&quot;&gt;# Watch the logs from both containers&lt;/span&gt;
kubectl logs signal-demo &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; main-app &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt;
kubectl logs signal-demo &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; signal-sender &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;dynamic-configuration-management&quot;&gt;Dynamic Configuration Management&lt;/h2&gt;

&lt;p&gt;Even after the pod has been created, sidecar containers can receive and alter environment variables from the Kubernetes Downward API. Combining this with the shared filesystem capability allows you to perform runtime secret rotation and other adaptive container techniques.&lt;/p&gt;

&lt;p&gt;The manifest creates a pod where a sidecar container updates the pod’s version label every 30 seconds using the current timestamp, while the main container continuously reads this version through a Downward API volume mount and prints it alongside the current time. The RBAC configuration grants the pod permissions to modify its own labels.&lt;/p&gt;

&lt;p&gt;env-inherit.yaml&lt;/p&gt;

&lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;na&quot;&gt;apiVersion&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;v1&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;kind&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;Pod&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;metadata&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;env-inherit-demo&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;labels&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;version&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1.0.0&quot;&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;spec&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;serviceAccountName&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;pod-labeler&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;containers&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;main-app&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;ubuntu:latest&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;/bin/bash&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-c&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;|&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;while true; do &lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;echo &quot;Current time: $(date)&quot;&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;echo &quot;APP_VERSION=$(cat /etc/podinfo/version)&quot;&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;sleep 10&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;done&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;volumeMounts&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;podinfo&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;mountPath&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;/etc/podinfo&lt;/span&gt;
  &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;version-updater&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;bitnami/kubectl&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;/bin/bash&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-c&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;|&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;while true; do&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;NEW_VERSION=&quot;$(date +%H-%M-%S)&quot;&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;echo &quot;Updating version to $NEW_VERSION&quot;&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;kubectl label pod $POD_NAME version=$NEW_VERSION --overwrite&lt;/span&gt;
          &lt;span class=&quot;s&quot;&gt;sleep 30&lt;/span&gt;
        &lt;span class=&quot;s&quot;&gt;done&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;POD_NAME&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;valueFrom&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;fieldRef&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;na&quot;&gt;fieldPath&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;metadata.name&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;volumes&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;podinfo&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;downwardAPI&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;items&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;version&quot;&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;fieldRef&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
          &lt;span class=&quot;na&quot;&gt;fieldPath&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;metadata.labels[&apos;version&apos;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;rbac.yaml&lt;/p&gt;

&lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;na&quot;&gt;apiVersion&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;v1&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;kind&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;ServiceAccount&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;metadata&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;pod-labeler&lt;/span&gt;
&lt;span class=&quot;nn&quot;&gt;---&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;apiVersion&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;rbac.authorization.k8s.io/v1&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;kind&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;Role&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;metadata&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;pod-labeler&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;rules&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
&lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;apiGroups&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;pods&quot;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;verbs&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;get&quot;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;patch&quot;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;nn&quot;&gt;---&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;apiVersion&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;rbac.authorization.k8s.io/v1&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;kind&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;RoleBinding&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;metadata&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;pod-labeler&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;subjects&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
&lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;kind&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;ServiceAccount&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;pod-labeler&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;roleRef&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;kind&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;Role&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;pod-labeler&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;apiGroup&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;rbac.authorization.k8s.io&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# First create the RBAC resources&lt;/span&gt;
kubectl apply &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; rbac.yaml

&lt;span class=&quot;c&quot;&gt;# Then create the pod&lt;/span&gt;
kubectl apply &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; env-inherit.yaml

&lt;span class=&quot;c&quot;&gt;# Watch the main-app logs to see the version changes&lt;/span&gt;
kubectl logs env-inherit-demo &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; main-app &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# In another terminal, you can watch the version-updater logs&lt;/span&gt;
kubectl logs env-inherit-demo &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; version-updater &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# You can also verify the label changes&lt;/span&gt;
kubectl get pod env-inherit-demo &lt;span class=&quot;nt&quot;&gt;--show-labels&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;resource-management&quot;&gt;Resource Management&lt;/h2&gt;

&lt;p&gt;Kubernetes’ Quality of Service (QoS) features are used by sidecar containers and the  primary container to manage the CPU and memory allocations of the sidecar containers and the primary container. These  features can cause resources to be dynamically redistributed leading to improved efficiency of the cluster. This is useful for  optimizing cloud application performance and helps with cost effective resource management.&lt;/p&gt;

&lt;p&gt;The manifest creates a pod with three containers sharing a process namespace: an nginx container, a monitoring container that displays system stats every 5 seconds, and a load generator running multiple dd commands. Each container has specific CPU and memory limits, demonstrating Kubernetes resource management and container resource isolation.&lt;/p&gt;

&lt;p&gt;resource-demo.yaml&lt;/p&gt;

&lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;na&quot;&gt;apiVersion&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;v1&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;kind&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;Pod&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;metadata&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;resource-demo&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;spec&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;shareProcessNamespace&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;containers&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;main-app&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;nginx:latest&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;requests&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;memory&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;64Mi&quot;&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;cpu&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;250m&quot;&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;limits&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;memory&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;128Mi&quot;&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;cpu&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;500m&quot;&lt;/span&gt;
  &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;resource-monitor&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;busybox&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sh&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-c&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;while&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;true;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;do&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;echo&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;---$(date)---&quot;;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;top&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-b&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-n&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-n&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;20;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sleep&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;5;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;done&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;requests&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;memory&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;32Mi&quot;&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;cpu&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;100m&quot;&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;limits&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;memory&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;64Mi&quot;&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;cpu&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;200m&quot;&lt;/span&gt;
  &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;load-generator&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;image&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;busybox&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;sh&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;-c&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&apos;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;while&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;true;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;do&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;for&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;$(seq&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;4);&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;do&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;dd&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;if=/dev/zero&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;of=/dev/null&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;bs=1M&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;count=1024&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;done;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;wait;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s&quot;&gt;done&apos;&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;resources&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;requests&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;cpu&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;100m&quot;&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;memory&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;32Mi&quot;&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;limits&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;cpu&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;200m&quot;&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;memory&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;64Mi&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Deploy the new version&lt;/span&gt;
kubectl apply &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; resource-demo.yaml

&lt;span class=&quot;c&quot;&gt;# Watch the resource monitor output&lt;/span&gt;
kubectl logs resource-demo &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; resource-monitor &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;wrap-up&quot;&gt;Wrap-Up&lt;/h2&gt;

&lt;p&gt;These capabilities make sidecars very powerful and should be used properly. One must design for security and failure scenarios because sharing filesystems and process namespaces creates a strong coupling between containers. The key is balancing these advanced features with maintainable and reliable architectures.&lt;/p&gt;

&lt;p&gt;Thus, by understanding and properly using these underutilized features, one can develop more complex yet efficient and manageable container-based applications. As you use these features, remember that with great power comes great responsibility—use them wisely and document them for your team.&lt;/p&gt;
</description>
        <pubDate>Sun, 29 Dec 2024 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/hidden-powers-of-kubernetes-sidecar-containers</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/hidden-powers-of-kubernetes-sidecar-containers</guid>
        
        <category>AI</category>
        
        <category>Kubernetes</category>
        
        <category>Docker</category>
        
        <category>OpenAI</category>
        
        <category>Python</category>
        
        <category>DevOps</category>
        
        
      </item>
    
      <item>
        <title>Intelligent Kubernetes Event Summarizer: A Step-by-Step Guide with a Demo</title>
        <description>&lt;p&gt;Kubernetes is powerful but it also can be complicated due to the number of events and logs it generates. It is crucial to have the tools in place that enable you to summarise these events quickly when someone points out the root cause of an issue amongst all those logs or error traces. This article proposes a intelligent Kubernetes event summarizer that utilizes OpenAI’s language model to summarize complex event logs into abstract human-readable form. Such a tool can also help the operations team save time and respond to issues faster.&lt;/p&gt;

&lt;h2 id=&quot;what-this-tool-does&quot;&gt;What This Tool Does?&lt;/h2&gt;

&lt;p&gt;This tool connects to your kubernetes cluster pulls out pod events and uses OpenAI language model under the hood to summarize them in a concise format. Teams can easily access the summarization output as it is exposed through RESTful API.&lt;/p&gt;

&lt;h3 id=&quot;key-features&quot;&gt;Key Features&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Gets events reported about the pod from Kubernetes cluster.&lt;/li&gt;
  &lt;li&gt;Summarizes events using OpenAI API.&lt;/li&gt;
  &lt;li&gt;Has a REST API to access event summary&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;technical-architecture&quot;&gt;Technical Architecture&lt;/h2&gt;

&lt;h3 id=&quot;overview&quot;&gt;Overview&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;This tutorial works with kubernetes v1.30.2 to set up a local development environment with Docker Desktop 4.35.1 (173168). This version is a simple set up for experimentation.&lt;/li&gt;
  &lt;li&gt;I use the Python Kubernetes client to connect to the cluster and Flask as my web framework for creating the API.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2024/11/kubernetes_event_summarizer.png&quot; alt=&quot;kubernetes_event_summarizer&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;components-explained&quot;&gt;Components Explained&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Kubernetes Cluster: The source of pod events.&lt;/li&gt;
  &lt;li&gt;OpenAI API: Summarizes the events into human-readable text.&lt;/li&gt;
  &lt;li&gt;Flask App: Serves as an API that returns the event summary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note for Other Clusters:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;If you use a different cluster setup (e.g., Minikube, EKS, GKE, or AKS), make sure to handle authentication correctly. The line &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config.load_kube_config()&lt;/code&gt; in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;app.py&lt;/code&gt; loads the default kubeconfig file for authentication. Adjustments may be needed to match your cluster’s authentication method.&lt;/li&gt;
  &lt;li&gt;For in-cluster deployments, use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;config.load_incluster_config()&lt;/code&gt; to load the cluster’s service account credentials for proper authentication.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;demo-walkthrough&quot;&gt;Demo Walkthrough&lt;/h2&gt;

&lt;h3 id=&quot;setting-up-the-environment&quot;&gt;Setting Up the Environment&lt;/h3&gt;

&lt;p&gt;Clone the Repository:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone https://github.com/sprider/k8s-event-summarizer.git
&lt;span class=&quot;nb&quot;&gt;cd &lt;/span&gt;k8s-event-summarizer
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Create a Virtual Environment and Install Dependencies:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;python3 &lt;span class=&quot;nt&quot;&gt;-m&lt;/span&gt; venv venv
&lt;span class=&quot;nb&quot;&gt;source &lt;/span&gt;venv/bin/activate
pip3 &lt;span class=&quot;nb&quot;&gt;install&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-r&lt;/span&gt; app/requirements.txt
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Set Up Environment Variables:&lt;/p&gt;

&lt;p&gt;Create a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.env&lt;/code&gt; file in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;app/&lt;/code&gt; directory and add the OpenAI API key:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;OPENAI_API_KEY=your_openai_api_key
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;running-the-flask-app-locally&quot;&gt;Running the Flask App Locally&lt;/h3&gt;

&lt;p&gt;Start the Flask App:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;python3 app/app.py
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Access the API:&lt;/p&gt;

&lt;p&gt;Open a browser or use Postman to visit &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;http://localhost:8000/summarize/&amp;lt;pod-name&amp;gt;&lt;/code&gt; to get summary for a specific pod.&lt;/p&gt;

&lt;h3 id=&quot;using-docker&quot;&gt;Using Docker&lt;/h3&gt;

&lt;p&gt;Build the Docker Image:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker build &lt;span class=&quot;nt&quot;&gt;-t&lt;/span&gt; k8s-event-summarizer &lt;span class=&quot;nb&quot;&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Run the Docker Container:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker run &lt;span class=&quot;nt&quot;&gt;-p&lt;/span&gt; 8000:8000 &lt;span class=&quot;nt&quot;&gt;--env-file&lt;/span&gt; app/.env k8s-event-summarizer
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;deploying-a-test-pod-in-your-cluster&quot;&gt;Deploying a Test Pod in Your Cluster&lt;/h2&gt;

&lt;p&gt;To generate events for testing, use the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;test-pod.yaml&lt;/code&gt; file provided in the repository:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl apply &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; test-pod.yaml
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This will create a test pod that can trigger events, which the event summarizer can then process and summarize.&lt;/p&gt;

&lt;h2 id=&quot;accessing-summarized-events-via-the-api&quot;&gt;Accessing Summarized Events via the API&lt;/h2&gt;

&lt;p&gt;Use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;curl&lt;/code&gt; or Postman to send a GET request to the API endpoint:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;curl http://localhost:8000/summarize/test-pod
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Replace &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;test-pod&lt;/code&gt; with the name of the pod for which you want to retrieve event summary.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2024/11/postman-output.png&quot; alt=&quot;postman&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;understanding-the-code-structure&quot;&gt;Understanding the Code Structure&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;app.py&lt;/code&gt;: The main Python script that sets up the Flask API, connects to the Kubernetes cluster, fetches events, and sends them to OpenAI for summarization.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;requirements.txt&lt;/code&gt;: Lists all the Python dependencies needed for the app.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.env&lt;/code&gt;: Stores environment variables such as the OpenAI API key.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dockerfile&lt;/code&gt;: Contains the instructions to build and run the app as a Docker container.&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;test-pod.yaml&lt;/code&gt;: A sample YAML file for creating a test pod in the Kubernetes cluster to generate events.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;wrap-up&quot;&gt;Wrap-Up&lt;/h2&gt;

&lt;p&gt;Kubernetes Event Summarizer — it helps in summarizing the events from the logs by creating human-friendly summaries of pod events. It can be customized to cover other use cases as well, for instance to monitor events of deployments and services and hence its very flexible. This project is a basic version which can be extended to address multiple operational requirements. This little tool can also be a game changer for your DevOps workflow if coupled with alerting systems or dashboards.&lt;/p&gt;

&lt;p&gt;While this demo focuses on summarizing pod events, the same approach can be applied to monitoring other types of events, such as deployment events, service and ingress events, and more.&lt;/p&gt;

&lt;p&gt;In real-world scenarios, clusters can generate millions of events. A practical next step would be to ingest these events into a vector database with embeddings, allowing the tool to function as a question-and-answer system. This would enable users to ask specific questions, with NLP capabilities extracting key attributes and calling relevant methods, resulting in a more versatile system that covers a wide range of use cases.&lt;/p&gt;

&lt;p&gt;I hope you found this article helpful!&lt;/p&gt;
</description>
        <pubDate>Sun, 10 Nov 2024 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/intelligent-kubernetes-event-summarizer</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/intelligent-kubernetes-event-summarizer</guid>
        
        <category>AI</category>
        
        <category>Kubernetes</category>
        
        <category>Docker</category>
        
        <category>OpenAI</category>
        
        <category>Python</category>
        
        <category>DevOps</category>
        
        
      </item>
    
      <item>
        <title>Leveraging eBPF for Container Network Monitoring with Cilium</title>
        <description>&lt;p&gt;In modern Kubernetes environments, monitoring network traffic and securing containers can be challenging due to their dynamic nature. Cilium, powered by eBPF(Extended Berkeley Packet Filter), offers high-performance visibility and security for containerized applications. This article will guide you through setting up Cilium on a Docker Desktop-based Kubernetes cluster to monitor network traffic and detect suspicious outbound connections.&lt;/p&gt;

&lt;h2 id=&quot;what-is-cilium-and-ebpf&quot;&gt;What is Cilium and eBPF?&lt;/h2&gt;

&lt;p&gt;Cilium is an open-source networking and security project that leverages eBPF to provide efficient and flexible network observability and enforcement. eBPF runs programs directly in the Linux kernel, enabling fine-grained monitoring of network activity with minimal performance overhead.&lt;/p&gt;

&lt;h2 id=&quot;setting-up-cilium-on-docker-desktop-kubernetes&quot;&gt;Setting Up Cilium on Docker Desktop Kubernetes&lt;/h2&gt;

&lt;p&gt;Before we dive into monitoring traffic, let’s get Cilium up and running on your Docker Desktop Kubernetes setup.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Ensure Kubernetes is enabled in Docker Desktop:&lt;/li&gt;
  &lt;li&gt;Open Docker Desktop settings and enable Kubernetes. After Kubernetes starts, verify with:&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl config current-context
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Install Cilium using Helm:&lt;/p&gt;

&lt;p&gt;Helm is the easiest way to install Cilium in Kubernetes. First, add the Cilium Helm repository:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;helm repo add cilium https://helm.cilium.io/
helm repo update
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then install Cilium:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;helm &lt;span class=&quot;nb&quot;&gt;install &lt;/span&gt;cilium cilium/cilium &lt;span class=&quot;nt&quot;&gt;--namespace&lt;/span&gt; kube-system
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Verify Cilium is running:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl get pods &lt;span class=&quot;nt&quot;&gt;-n&lt;/span&gt; kube-system
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Check Cilium’s status:&lt;/p&gt;

&lt;p&gt;After installation, run the following to ensure Cilium is functioning correctly:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl &lt;span class=&quot;nb&quot;&gt;exec&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-n&lt;/span&gt; kube-system &lt;span class=&quot;nt&quot;&gt;-it&lt;/span&gt; cilium-xxxxx &lt;span class=&quot;nt&quot;&gt;--&lt;/span&gt; cilium status
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;monitoring-network-traffic-with-cilium&quot;&gt;Monitoring Network Traffic with Cilium&lt;/h2&gt;

&lt;p&gt;Now that Cilium is installed, you can begin monitoring network traffic. Cilium’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;monitor&lt;/code&gt; command gives you detailed, real-time visibility into all packets flowing through your cluster.&lt;/p&gt;

&lt;p&gt;Start monitoring network traffic:&lt;/p&gt;

&lt;p&gt;To see live network flows, run:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cilium monitor
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You’ll start seeing information about network packets, such as source and destination IPs, ports, and protocols. This visibility is powered by eBPF, which runs directly in the kernel for low-latency, high-performance monitoring.&lt;/p&gt;

&lt;h2 id=&quot;simulating-suspicious-outbound-connection&quot;&gt;Simulating Suspicious Outbound Connection&lt;/h2&gt;

&lt;p&gt;Now let’s simulate a suspicious outbound network connection to demonstrate how Cilium can be used to monitor and potentially block unexpected traffic.&lt;/p&gt;

&lt;p&gt;Run a container making an outbound HTTP request:&lt;/p&gt;

&lt;p&gt;To simulate suspicious behavior, run a pod that makes an HTTP request to an external server (e.g., &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;example.com&lt;/code&gt;):&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl run external-connect &lt;span class=&quot;nt&quot;&gt;--image&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;curlimages/curl &lt;span class=&quot;nt&quot;&gt;--restart&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;Never &lt;span class=&quot;nt&quot;&gt;--stdin&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--tty&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--command&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--&lt;/span&gt; curl http://example.com
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Monitor traffic with Cilium:&lt;/p&gt;

&lt;p&gt;With the pod making a connection, use Cilium’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;monitor&lt;/code&gt; command to capture and analyze the network flow:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cilium monitor &lt;span class=&quot;nt&quot;&gt;-t&lt;/span&gt; flow
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You should see output like:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;12:34:56.789 10.244.1.4 -&amp;gt; 93.184.216.34 HTTP
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This shows an HTTP request from the pod (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;10.244.1.4&lt;/code&gt;) to the external IP (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;93.184.216.34&lt;/code&gt;, which resolves to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;example.com&lt;/code&gt;). While this may be harmless in some cases, in a real-world scenario, unexpected outbound traffic like this could be a sign of data exfiltration, malware activity, or an unauthorized connection attempt.&lt;/p&gt;

&lt;h2 id=&quot;applying-network-policies-to-control-outbound-traffic&quot;&gt;Applying Network Policies to Control Outbound Traffic&lt;/h2&gt;

&lt;p&gt;To enforce security, you can apply network policies in Cilium to control which services or pods are allowed to make outbound connections.&lt;/p&gt;

&lt;p&gt;Create a network policy that restricts outbound traffic:&lt;/p&gt;

&lt;p&gt;Here’s an example policy that blocks all outbound traffic except within the same namespace:&lt;/p&gt;

&lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;na&quot;&gt;apiVersion&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;kind&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;NetworkPolicy&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;metadata&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;deny-all-except-same-namespace&lt;/span&gt;
&lt;span class=&quot;na&quot;&gt;spec&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;podSelector&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;{}&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;policyTypes&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;Ingress&lt;/span&gt;
    &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;Egress&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;egress&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;to&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;podSelector&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;{}&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;ingress&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;from&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;na&quot;&gt;podSelector&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;pi&quot;&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Apply the network policy:&lt;/p&gt;

&lt;p&gt;Apply the policy using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;kubectl&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl apply &lt;span class=&quot;nt&quot;&gt;-f&lt;/span&gt; deny-all-except-same-namespace.yaml
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;With this policy in place, any attempt by a pod to send traffic outside its namespace will be blocked, including the suspicious outbound request to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;example.com&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Monitor policy violations:&lt;/p&gt;

&lt;p&gt;If the pod tries to make a connection outside the allowed network, Cilium will log a policy violation, helping you identify unauthorized traffic attempts:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cilium monitor &lt;span class=&quot;nt&quot;&gt;--type&lt;/span&gt; drop
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;cleaning-up&quot;&gt;Cleaning Up&lt;/h2&gt;

&lt;p&gt;After the demo, you can clean up the resources:&lt;/p&gt;

&lt;p&gt;Delete the test pod:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;kubectl delete pod external-connect
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Uninstall Cilium (if no longer needed):&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;helm uninstall cilium &lt;span class=&quot;nt&quot;&gt;--namespace&lt;/span&gt; kube-system
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;By using Cilium with eBPF on Docker Desktop’s Kubernetes cluster, you can gain real-time visibility into network traffic, making it easier to detect suspicious outbound connections. With Cilium, monitoring becomes more efficient and detailed, providing granular insights into network flows. By applying network policies, you can also proactively block unauthorized outbound traffic, adding an extra layer of security to your containerized environment.&lt;/p&gt;

&lt;p&gt;Cilium’s eBPF-based approach allows for high-performance monitoring and security enforcement with minimal overhead, making it an excellent choice for Kubernetes environments, whether for local demos or production use.&lt;/p&gt;
</description>
        <pubDate>Thu, 10 Oct 2024 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/test</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/test</guid>
        
        <category>AI</category>
        
        <category>Kubernetes</category>
        
        <category>Docker</category>
        
        <category>OpenAI</category>
        
        <category>Python</category>
        
        <category>DevOps</category>
        
        
      </item>
    
      <item>
        <title>Building a Unified Bible Platform: Q&amp;A, Insights, and Ministry Matching</title>
        <description>&lt;p&gt;In today’s ever-evolving technological landscape, there are infinite possibilities to delve into and interact with Bible verses. I myself have recently evolved my personal endeavor from a single Q&amp;amp;A platform into a comprehensive unified application that combines three powerful Bible-focused services: AI-powered Q&amp;amp;A specifically tailored to the King James Version (KJV), curated insights browsing, and ministry recommendation matching.&lt;/p&gt;

&lt;h2 id=&quot;extracting-the-kjv-bible&quot;&gt;Extracting the KJV Bible&lt;/h2&gt;

&lt;p&gt;The first step was to acquire the content. Thanks to the &lt;a href=&quot;https://api.bible&quot;&gt;API&lt;/a&gt;, extracting the full text of the KJV Bible was a straightforward process. This API offers a structured format of the Bible verses, making the extraction process seamless.&lt;/p&gt;

&lt;p&gt;Note: Bible verses taken from the King James Version (KJV) and sourced from api.bible (bible id: de4e12af7f28f599-02).&lt;/p&gt;

&lt;h2 id=&quot;generating-embeddings-with-openai&quot;&gt;Generating Embeddings with OpenAI&lt;/h2&gt;

&lt;p&gt;After obtaining the Bible verses, the next challenge was understanding and contextualizing them. I employed OpenAI’s text-embedding-ada-002 &lt;a href=&quot;https://platform.openai.com/docs/guides/embeddings/use-cases&quot;&gt;model&lt;/a&gt;. This model, renowned for its capabilities, transforms textual data into embeddings - numerical representations of the text that capture the essence and context of the content.&lt;/p&gt;

&lt;h2 id=&quot;storing-embeddings-in-pinecone&quot;&gt;Storing Embeddings in Pinecone&lt;/h2&gt;

&lt;p&gt;Quickly retrieving relevant embeddings was crucial to creating a responsive Q&amp;amp;A interface. &lt;a href=&quot;https://www.pinecone.io/&quot;&gt;Pinecone&lt;/a&gt;, a vector search service, provided the solution. After generating embeddings from the Bible verses, I stored them in a Pinecone index. Pinecone index ensured efficient and quick retrieval during user interactions.&lt;/p&gt;

&lt;h2 id=&quot;creating-the-unified-platform-with-python-flask&quot;&gt;Creating the Unified Platform with Python Flask&lt;/h2&gt;

&lt;p&gt;With the backend prepared, it was time to focus on the user interface. Python’s Flask framework was the tool of choice, now enhanced with a modular Blueprint architecture. Flask, known for its lightweight nature and flexibility, was perfect for setting up a unified platform that houses three distinct services:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Bible Q&amp;amp;A Service&lt;/strong&gt; - The original AI-powered question answering&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Bible Insights Service&lt;/strong&gt; - Browse curated insights from Bible books and chapters&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Ministry Matching Service&lt;/strong&gt; - Church ministry recommendation system using OpenAI&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;leveraging-openai-for-response-generation&quot;&gt;Leveraging OpenAI for Response Generation&lt;/h2&gt;

&lt;p&gt;When users posed questions, the system needed to provide meaningful responses. By forwarding the user’s question to OpenAI, the &lt;a href=&quot;https://platform.openai.com/docs/guides/gpt/chat-completions-api&quot;&gt;model&lt;/a&gt; via &lt;a href=&quot;https://python.langchain.com/docs/integrations/vectorstores/pinecone&quot;&gt;LangChain’s Pinecone vector store&lt;/a&gt; would sift through the stored embeddings in Pinecone, corresponding to each Bible verse. This would pinpoint verses with content most relevant to the question.&lt;/p&gt;

&lt;p&gt;Additionally, the ministry matching service leverages OpenAI’s conversational capabilities to provide personalized ministry recommendations based on user interests and calling.&lt;/p&gt;

&lt;h2 id=&quot;displaying-results&quot;&gt;Displaying Results&lt;/h2&gt;

&lt;p&gt;The unified platform now displays results from all three services through a cohesive interface. Users can seamlessly navigate between AI-powered Q&amp;amp;A, browse curated biblical insights, or discover ministry opportunities - all within a single application with shared navigation and consistent styling.&lt;/p&gt;

&lt;h2 id=&quot;modern-deployment-with-azure-container-apps&quot;&gt;Modern Deployment with Azure Container Apps&lt;/h2&gt;

&lt;p&gt;Moving beyond traditional hosting, I deployed the unified application using Azure Container Apps, which provides several advantages over previous AWS EC2 deployment:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Serverless scaling&lt;/strong&gt;: Automatically scales from 1-10 replicas based on demand&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Managed SSL certificates&lt;/strong&gt;: Automatic HTTPS with custom domain support&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Container-based deployment&lt;/strong&gt;: Docker containers for consistent environments&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Integrated monitoring&lt;/strong&gt;: Built-in logging and health checks&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Cost optimization&lt;/strong&gt;: Pay-per-use model with efficient resource utilization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The unified platform is now live at &lt;a href=&quot;https://uba.josephvelliah.com&quot;&gt;https://uba.josephvelliah.com&lt;/a&gt;, showcasing all three services in one cohesive experience.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/posts/2023/09/uba.png&quot; alt=&quot;uba&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The platform has evolved to include additional services beyond the original Q&amp;amp;A functionality. Curated insights are now integrated directly into the unified platform as the “Bible Insights” service, making scripture more accessible and engaging for users. The containerized deployment on Azure Container Apps ensures smooth operation and automatic scaling based on user demand.&lt;/p&gt;

&lt;h2 id=&quot;enhanced-technical-architecture&quot;&gt;Enhanced Technical Architecture&lt;/h2&gt;

&lt;p&gt;The unified platform demonstrates modern application design principles:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Modular Services&lt;/strong&gt;: Clean separation of concerns with independent service functionality&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Graceful Degradation&lt;/strong&gt;: Services work independently; missing configuration doesn’t break the app&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Security Features&lt;/strong&gt;: Rate limiting, input sanitization, and Content Security Policy headers&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Production Ready&lt;/strong&gt;: Health checks, structured logging, and monitoring capabilities&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Docker Containerization&lt;/strong&gt;: Consistent deployment across environments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In conclusion, evolving from a single Bible Q&amp;amp;A application to a unified platform combining OpenAI’s powerful models, Pinecone’s efficient storage, curated insights, and ministry matching capabilities resulted in a comprehensive Bible study resource. It demonstrates how technology can enhance our understanding of Bible verses while providing practical tools for spiritual growth and community engagement, all brought closer to the modern user through thoughtful platform design.&lt;/p&gt;

&lt;p&gt;Visit the live unified platform at &lt;a href=&quot;https://uba.josephvelliah.com&quot;&gt;https://uba.josephvelliah.com&lt;/a&gt; to experience all three services.&lt;/p&gt;
</description>
        <pubDate>Thu, 10 Oct 2024 00:00:00 +0000</pubDate>
        <link>https://blog.josephvelliah.com/unified-bible-platform-with-ai-services</link>
        <guid isPermaLink="true">https://blog.josephvelliah.com/unified-bible-platform-with-ai-services</guid>
        
        <category>Faith</category>
        
        <category>API</category>
        
        <category>OpenAI</category>
        
        <category>Machine Learning</category>
        
        <category>Pinecone</category>
        
        <category>Vector Database</category>
        
        <category>Python</category>
        
        <category>Flask</category>
        
        <category>Azure Container Apps</category>
        
        <category>Faith</category>
        
        <category>Docker</category>
        
        
      </item>
    
  </channel>
</rss>