Ask, see, and probe the network — from one chat.
NetBot joins three things you used to do separately. It answers questions about the live VWAN, shows you what's healthy and what isn't, and runs real network tests from disposable probe VMs sitting on every VNet — all under a human-in-the-loop approval gate.
What NetBot does, and what it isn't.
NetBot is the conversational front door for diagnosing and (carefully) acting on PBFS's network. It does three things, and only those three.
-
Ask questions about the live network"Why is wesuite-prod getting 502s?" · "Who's been talking to the Private Endpoint at 10.100.48.x in the last hour?"
-
See network health at a glanceWAF posture, top denied flows, DNS failures, firewall status — refreshed every 30 seconds.
-
Run real probes from real vantage pointsPing, traceroute, DNS lookups, port scans, packet captures — from disposable VMs that already live on every VWAN-connected VNet. Every probe needs a human click.
Who it's for
| Role | What NetBot saves them |
|---|---|
| On-call engineer | Triaging an outage at 2am without opening six portal tabs to answer one question. |
| Network engineer | Vetting a change with a quick probe from a real source-VNet vantage. |
| Security analyst | Joined firewall + flow log + WAF rule context in one query. |
| Project lead | "Is everything healthy?" answered without opening Sentinel. |
What it isn't
- Not a configuration tool. You can't change firewall rules, NSGs, or DNS records from inside NetBot. Future admin actions will be gated behind a separate group with mandatory MFA and full audit; today, that surface doesn't exist.
- Not a replacement for Sentinel or Grafana for sustained monitoring. Use it for ad-hoc questions and event-driven probing.
- Not magic. It can be wrong. Treat the chat output as a senior engineer's first guess — verify before acting.
Getting started.
Sign in
Go to netbot.cloud.pyebarkerfs.com. You'll be redirected to Microsoft sign-in; use your PBFS account. You need to be a member of the sg-netbot-users Entra group — your manager or the platform team can add you. First sign-in takes about ten seconds; subsequent visits within an hour are silent.
Layout
NetBot has a header, a tab bar, and a content area. The header carries the wordmark on the left and a Help menu and Sign out link on the right. Tabs sit just below.
- Overview — default landing. Live network-health dashboard.
- Chat — conversational troubleshooting with the AI agent.
- Probes — form-driven probe execution under approval.
The Overview tab — a single pane for network health.
Overview is what loads first. It's the answer to "is anything on fire right now?" — refreshed every thirty seconds.
Top to bottom
- KPI row — four big-number cards: firewall status, App Gateways in Prevention mode, DNS failures in the last hour, probe VMs running.
- VWAN App Gateways — WAF Posture — every AG with its operational state, WAF mode (Prevention / Detection), and the WAF policy attached. Highlights gateways with no policy or in Detection-only mode.
- Top Denied at Firewall — top source IPs generating denies, and top destination IPs being denied to. Click a row to seed a chat question about it.
- Top WAF Hits by Client — top external client IPs and the AG hostname they're hitting most.
- DNS Health — count of NXDOMAIN responses plus the top failing FQDNs.
- Footer — last refresh timestamp and a "Refresh now" link.
Workflow · Are we under attack? Glance at the KPI row. If Top WAF Hits is dominated by a single client IP and that AG has been logging 4xx codes for five or more minutes, switch to the Chat tab and ask "summarize WAF blocks for that AG in the last 30 minutes."
| Gateway | State | WAF Mode | Policy |
|---|---|---|---|
| tagquest-prod-appgw-fnvw | ● Running | Prevention | tq-prod-waf-policy |
| wesuite-prod-appgw-bxlm | ● Running | Detection | ws-prod-waf-policy |
| avd-prod-fleetingress-mmpc | ● Running | No policy | — |
| 10.100.48.42 | 2,341 denies | → 8.8.8.8 |
| 10.100.51.17 | 847 denies | → 13.107.6.152 |
| 10.100.18.5 | 412 denies | → 23.23.x.x |
| Client IP | Hits | Top AG |
|---|---|---|
| 203.0.113.42 | 3,108 | tagquest-prod |
| 198.51.100.7 | 1,022 | wesuite-prod |
The Chat tab — talk to the agent.
Chat is where you ask. The agent reads logs, queries Resource Graph, and shows its work. It runs Claude Sonnet 4.6 via Azure AI Foundry, with about thirty tools wired into the live network.
What it can reach
- Resource Graph — list and find VMs, VNets, NSGs, App Gateways, public IPs, private endpoints.
- Azure Monitor & Log Analytics — firewall logs, WAF logs, AG access logs, NSG flow logs, DNS resolver logs.
- Topology — VNet peerings, route tables, effective routes, "find the VNet by CIDR," "which AG fronts which app."
- WAF & firewall policy — list rules, find rules, check what's enabled.
- Probes — propose ping, traceroute, DNS, TCP, port scan, HTTPS, and packet capture. These always need approval — see §5.
What you can do with it
- Follow-ups understand prior turns. "…and what about port 443 on the same target?" works.
- Conversation persists across page reloads. Click New chat in the tab toolbar to start fresh.
- Markdown and Mermaid. Network diagrams render inline when the agent thinks they help.
- File attachments. Paperclip accepts images (Vision reads screenshots and CLI captures) and packet captures (
.pcap,.cap,.pcapng— analyzed viapcap_summary). - Starter prompts appear on an empty chat — "Show me VWAN health," "Top WAF blocks last hour."
What it's good at
- "Why is <app> slow?" — walks the path: AG status → WAF mode → backend pool health → flow logs → probe-VM ping if needed.
- "Who's been talking to 10.100.X.Y?" — flow-log query joined with NSG and firewall context.
- "Summarize WAF blocks for tagquest-prod in the last 30 minutes." — aggregated rule hits + top clients.
What it's not yet good at
- Multi-step workflows that need "wait and check again in ten minutes." Sessions don't pause and resume mid-tool-chain. Use a probe instead.
- Anything that requires writing to Azure — not in scope.
- Speculative architecture decisions. It can describe state but won't recommend changes.
tagquest-prod-appgw-fnvw is "Running" — true at L4 — but the backend pool health probe is failing for two of three backends.The Probes tab — real tests, with a human gate.
Probes are network tests run from disposable VMs that sit on every VWAN-connected VNet. They give you ground-truth reachability from a real source vantage — answering questions logs can't.
The catalog
| Probe | What it does | Typical answer |
|---|---|---|
| Ping | ICMP reachability | Can A see B at all? |
| Traceroute | Hop chain to a target | Where is traffic dying? |
| DNS lookup | Resolve an FQDN via dig +short | Is private DNS resolving from this VNet? |
| TCP port | Single-port reachability check | Is port 443 open from A to B? |
| Port scan | Up to 32 ports in one shot | What's listening here from this vantage? |
| HTTPS | TLS handshake + HTTP status | Is the cert valid and the service up? |
| Packet capture | Network Watcher pcap → downloadable .cap | What's actually on the wire? |
How a probe runs — the human-in-the-loop gate
This is the most important thing to understand about NetBot.
- You fill in a form.Target, port, source VM. The source VM is explicit so the probe runs from the network vantage you intend.
- You click Submit.The form sends a natural-language prompt to the agent.
- The agent proposes a probe.It reconstructs the command from a templated whitelist. You see the exact command before you click anything else.
- An approval card appears."NetBot wants to run a ping probe from net-test-plat-nonprod to 10.100.18.5 (count 4). Approve / Deny."
- You click Approve.The probe runs — typically 30 to 90 seconds. Packet captures take their advertised duration plus a ~30 second flush.
- The result appears.stdout, stderr, exit code, executed command. For pcaps: a Download .cap link and an Analyze capture button that ships the file into the embedded chat for summary.
The probe VMs
Twelve of them, named net-test-{env}-{slot}, one per VWAN-connected VNet across AVD, TagQuest, WeSuite, Workday, PAYG-routing, and Platform. Small Linux VMs (~$8/month each), deallocated by default, auto-stop after 30 minutes of idle. Approved probes auto-wake them in about sixty seconds.
Workflow · Can the AVD desktop reach SQL? Probes tab → Port scan → sourcenet-test-avd-prod→ target = SQL server private IP → ports1433, 5432, 3389. Approve. The result tells you in thirty seconds whether ports are open, filtered, or refused — from the actual AVD VNet. Definitive, no logs archaeology.
Attachments — images and packet captures.
Both the Chat tab and the Probes-tab embedded chat support file attachments. Click the paperclip in the composer.
| Type | Max size | What happens |
|---|---|---|
| Image | 10 MB | .png, .jpg, .webp, .gif — inlined to Claude as a base64 image block. Vision describes and OCRs it. |
| Packet capture | 10 MB | .pcap, .cap, .pcapng — stored in blob; the agent calls pcap_summary, which returns top protocols, top conversations, top destinations. |
Up to four files per message. Each becomes a chip above the text input. Click × on a chip to remove it. Files are dropped from the tray after sending.
Privacy on attachments
- Images and pcaps live in PBFS-owned blob storage (
netbotec2c3340,chat-uploadscontainer). - A 7-day lifecycle policy auto-deletes them.
- Read URLs are short-lived (1 hour) user-delegation SAS — they expire even if leaked.
- Don't upload secrets-bearing material. Chat history is persisted server-side and recoverable in the audit log.
Analyze capture — the zero-upload path
After a packet_capture probe completes, the result card has an Analyze capture button. Clicking it server-side copies the pcap from probe storage into chat-uploads and stages it as an attachment in the embedded chat. No download/re-upload roundtrip; the agent reads it the moment you press Send.
Feedback & support — the Help menu.
NetBot is actively developed. The header's Help dropdown has two flows that file work directly into the team's ADO backlog.
| Flow | Lands as | Use it for |
|---|---|---|
| Feature request | User Story under Feature #14697 (Epic #11437 Coding Agents) |
Any improvement to NetBot or the broader tooling environment. |
| Open a ticket | User Story under Feature #14698 (Epic #7443 Networking) |
Network-ops requests: new firewall rules, DNS zones, TLS certs, VNet peerings, public IPs. |
Both open the same simple modal — title plus free-text description. The backend automatically attaches a triage block containing your UPN, the page URL you were on, your NetBot session id, and an excerpt of your last ten chat turns — so triage doesn't have to ask "where were you when this happened." A toast confirms the new work item id and links to it.
Common workflows.
Quick triage — under sixty seconds
- Land on Overview.KPI red? Click into the affected card.
- WAF Posture has a "Detection only" callout?That's the row to investigate first.
- Top Denied shows a familiar internal IP?Switch to Chat. "Why is <IP> denied to <destination>?"
Is this a network problem? — under five minutes
- Chat:"Show me firewall denies for <my host> in the last 10 minutes."
- If empty, Probes:Ping from the closest probe VM to the target.
- If ping works but app fails, Probes:TCP port check on the actual application port.
- If TCP works but the app still fails:It's the app, not the network.
Investigate an attack signal
- Overview → Top WAF Hits.One client IP dominating? Click through.
- Chat:"Pull WAF rule hits for <IP> in the last hour, grouped by rule id."
- Probes → packet captureFrom the AG's backend VNet, filtered to that source IP, 30 seconds.
- Click Analyze capture."Summarize this capture — what's the attacker doing?"
Verify a change you just made
- Wait sixty seconds.Let routes converge.
- Probes → ping or tracerouteFrom the affected VNet's probe VM.
- Chat:"Compare denied flows for <host> in the last 5 minutes vs. an hour ago."
Frequently asked.
Can NetBot break anything?
No mutations are wired today. Probes run from disposable VMs and only execute commands that match a fixed allowlist (ping, traceroute, dig, nc, curl, mtr, Network Watcher pcap). Every probe needs your explicit Approve click. Future "admin" actions will be a separate gated feature with mandatory MFA + audit.
Who can use NetBot?
Anyone in the sg-netbot-users Entra group. Membership requests go through your manager or the platform team.
Where does my chat history go?
Sessions are stored in PBFS-owned Azure Table Storage in the platform subscription. Only your own user can read your sessions; the agent fetches by (user_oid, session_id). Sessions persist across page reloads but are not shared between users.
How fresh is the data?
Resource Graph queries are seconds-fresh. Firewall, WAF, and flow logs land in Sentinel within ~2 minutes. The Overview tab refreshes every 30 seconds. A probe is real-time as of the moment it runs.
How accurate is the AI?
The agent is right most of the time on factual queries grounded in live data, and occasionally wrong about causation. Treat its output as a senior engineer's first guess. The Overview tab and probe results are not LLM-generated — they're rendered straight from Azure Monitor and ARM, so they're authoritative.
What happens if the AI is rate-limited?
Per-user caps are 30 chats/minute, 200/hour, 1000/day. Feedback is 3/minute, 10/hour, 30/day. If you hit a cap, you get a clear "rate-limited" message and can retry once the window slides forward.
Can I share a probe result with a coworker?
Today, no — probe sessions are per-user. The fastest path is to open a ticket from the result and paste a description; the work item inherits your chat context automatically.
What if the agent proposes the wrong probe?
Click Deny. Re-prompt with corrections. Each Approve / Deny is logged in the audit table.
My capture didn't produce a blob — what now?
Network Watcher's flush window can occasionally miss the 75-second polling deadline on the busiest backends. Re-run the capture; if it fails twice, file a ticket via Help → Open a ticket and the platform team will check NW health.
Can I use it on mobile?
It works in mobile browsers but isn't optimized for small screens. Desk-first use is intended.
Limits & quotas.
| Limit | Value | Why |
|---|---|---|
| Chat — per minute | 30 | Burst protection |
| Chat — per hour | 200 | Sustained-use cap |
| Chat — per day | 1,000 | Per-user budget |
| Chat — org-wide per hour | 500 | LLM cost ceiling |
| Feedback — per minute | 3 | Anti-spam on ADO writes |
| Feedback — per day | 30 | Daily cap |
| Probes — concurrent per user | 1 in flight | Probe VM warm-up serializes |
| Probes — pcap duration | ≤ 60 s | Network Watcher hard cap |
| Attachments — files / message | 4 | Vision context budget |
| Attachments — bytes / file | 10 MB | Practical Anthropic API cap |
| Attachments — retention | 7 days | Lifecycle policy on chat-uploads |
| Probe pcaps — retention | 7 days | Same lifecycle |
Privacy, security & data handling.
What we store
- Sessions — chat history, tool calls, tool results, in Azure Table Storage (
auditlog+sessions). Encrypted at rest. - Attachments —
chat-uploadsblob container, 7-day TTL, user-delegation SAS read URLs (1h validity). - Audit rows — every chat turn, every probe approval/denial, every feedback submission, every rate-limit reject. Append-only.
- Probe pcaps — per-subscription
npcap*storage accounts, 7-day TTL.
Where the AI runs
The agent runs Claude Sonnet 4.6 via Azure AI Foundry in the david-mouks6j7-eastus2 deployment. Calls go through PBFS managed identity (netbot-ai-uami) — no API keys leave PBFS infrastructure. Foundry deployments have no data retention for prompts or completions.
What you should not put in chat
- Customer PII outside of normal PBFS support workflows.
- Bearer tokens, PATs, cloud credentials, private keys.
- Anything covered by a stricter retention or DLP policy.
When in doubt, treat NetBot like an internal Slack channel: the conversation is logged for support and audit.
Auth model
Sign-in via Entra ID (PBFS tenant). MFA required per the org's standard CA policy. Backend trusts every request that reaches it — APIM has already validated the JWT before forwarding. NetBot doesn't store passwords or session tokens itself. An expired session redirects to /login — no flash of UI, no leaked data.
Glossary.
- VWAN
- Azure Virtual WAN — the hub-and-spoke topology PBFS uses for cloud networking. NetBot focuses on VWAN-connected resources.
- Probe VM
- A small Linux VM that NetBot can run network tests from. Twelve of them, one per VWAN-connected VNet. Deallocated when idle.
- HITL
- "Human in the loop" — every probe needs a person to click Approve before it executes.
- WAF
- Web Application Firewall, attached to App Gateways. Modes are Detection (logs only) and Prevention (blocks).
- AG
- Azure Application Gateway — Layer-7 load balancer in front of public-facing apps.
- Flow log
- Per-flow record from Azure Network Watcher describing traffic: source IP, destination IP, port, action, bytes.
- Resource Graph
- A real-time queryable index of all Azure resources. NetBot uses it for inventory questions.
- NXDOMAIN
- DNS response code meaning "this name does not exist." Spikes can indicate misconfig or malware beaconing.
- Session
- One conversation thread with the agent. The Chat tab and Probes tab keep separate sessions.
- APIM
- Azure API Management — validates your Entra JWT before requests reach NetBot's backend.
- bff-auth
- The platform's standard OAuth client that handles sign-in for NetBot and other PBFS apps.
- ADO
- Azure DevOps — where Help-menu tickets and feature requests land as User Stories.