NetBot — User Guide

i.

What NetBot does, and what it isn't.

NetBot is the conversational front door for diagnosing and (carefully) acting on PBFS's network. It does three things, and only those three.

Ask questions about the live network

"Why is wesuite-prod getting 502s?" · "Who's been talking to the Private Endpoint at 10.100.48.x in the last hour?"
See network health at a glance

WAF posture, top denied flows, DNS failures, firewall status — refreshed every 30 seconds.
Run real probes from real vantage points

Ping, traceroute, DNS lookups, port scans, packet captures — from disposable VMs that already live on every VWAN-connected VNet. Every probe needs a human click.

Who it's for

Role	What NetBot saves them
On-call engineer	Triaging an outage at 2am without opening six portal tabs to answer one question.
Network engineer	Vetting a change with a quick probe from a real source-VNet vantage.
Security analyst	Joined firewall + flow log + WAF rule context in one query.
Project lead	"Is everything healthy?" answered without opening Sentinel.

What it isn't

Not a configuration tool. You can't change firewall rules, NSGs, or DNS records from inside NetBot. Future admin actions will be gated behind a separate group with mandatory MFA and full audit; today, that surface doesn't exist.
Not a replacement for Sentinel or Grafana for sustained monitoring. Use it for ad-hoc questions and event-driven probing.
Not magic. It can be wrong. Treat the chat output as a senior engineer's first guess — verify before acting.

ii.

Getting started.

Sign in

Go to netbot.cloud.pyebarkerfs.com. You'll be redirected to Microsoft sign-in; use your PBFS account. You need to be a member of the sg-netbot-users Entra group — your manager or the platform team can add you. First sign-in takes about ten seconds; subsequent visits within an hour are silent.

Layout

NetBot has a header, a tab bar, and a content area. The header carries the wordmark on the left and a Help menu and Sign out link on the right. Tabs sit just below.

Overview — default landing. Live network-health dashboard.
Chat — conversational troubleshooting with the AI agent.
Probes — form-driven probe execution under approval.

NetBot

PBFS network troubleshooter

Fig. 01Header and tab bar — present on every screen.

iii.

The Overview tab — a single pane for network health.

Overview is what loads first. It's the answer to "is anything on fire right now?" — refreshed every thirty seconds.

Top to bottom

KPI row — four big-number cards: firewall status, App Gateways in Prevention mode, DNS failures in the last hour, probe VMs running.
VWAN App Gateways — WAF Posture — every AG with its operational state, WAF mode (Prevention / Detection), and the WAF policy attached. Highlights gateways with no policy or in Detection-only mode.
Top Denied at Firewall — top source IPs generating denies, and top destination IPs being denied to. Click a row to seed a chat question about it.
Top WAF Hits by Client — top external client IPs and the AG hostname they're hitting most.
DNS Health — count of NXDOMAIN responses plus the top failing FQDNs.
Footer — last refresh timestamp and a "Refresh now" link.

Workflow · Are we under attack? Glance at the KPI row. If Top WAF Hits is dominated by a single client IP and that AG has been logging 4xx codes for five or more minutes, switch to the Chat tab and ask "summarize WAF blocks for that AG in the last 30 minutes."

Firewall

Healthy

cus-prod-fw

VWAN App Gateways

9 / 12

in Prevention

DNS Failures

1,247

NXDOMAIN / 1h

Probe VMs warm

2 / 12

running / total

VWAN App Gateways — WAF Posture 3 in Detection only

Gateway	State	WAF Mode	Policy
tagquest-prod-appgw-fnvw	● Running	Prevention	tq-prod-waf-policy
wesuite-prod-appgw-bxlm	● Running	Detection	ws-prod-waf-policy
avd-prod-fleetingress-mmpc	● Running	No policy	—

Top Denied at Firewall (last 1h)

By source IP — who's being denied

10.100.48.42	2,341 denies	→ 8.8.8.8
10.100.51.17	847 denies	→ 13.107.6.152
10.100.18.5	412 denies	→ 23.23.x.x

Top WAF Hits by Client (last 1h)

Client IP	Hits	Top AG
203.0.113.42	3,108	tagquest-prod
198.51.100.7	1,022	wesuite-prod

Refreshed 12s ago · Refresh now

Fig. 02Overview tab — KPI row, WAF posture, top denied, top WAF clients.

iv.

The Chat tab — talk to the agent.

Chat is where you ask. The agent reads logs, queries Resource Graph, and shows its work. It runs Claude Sonnet 4.6 via Azure AI Foundry, with about thirty tools wired into the live network.

What it can reach

Resource Graph — list and find VMs, VNets, NSGs, App Gateways, public IPs, private endpoints.
Azure Monitor & Log Analytics — firewall logs, WAF logs, AG access logs, NSG flow logs, DNS resolver logs.
Topology — VNet peerings, route tables, effective routes, "find the VNet by CIDR," "which AG fronts which app."
WAF & firewall policy — list rules, find rules, check what's enabled.
Probes — propose ping, traceroute, DNS, TCP, port scan, HTTPS, and packet capture. These always need approval — see §5.

What you can do with it

Follow-ups understand prior turns. "…and what about port 443 on the same target?" works.
Conversation persists across page reloads. Click New chat in the tab toolbar to start fresh.
Markdown and Mermaid. Network diagrams render inline when the agent thinks they help.
File attachments. Paperclip accepts images (Vision reads screenshots and CLI captures) and packet captures (.pcap, .cap, .pcapng — analyzed via pcap_summary).
Starter prompts appear on an empty chat — "Show me VWAN health," "Top WAF blocks last hour."

What it's good at

"Why is <app> slow?" — walks the path: AG status → WAF mode → backend pool health → flow logs → probe-VM ping if needed.
"Who's been talking to 10.100.X.Y?" — flow-log query joined with NSG and firewall context.
"Summarize WAF blocks for tagquest-prod in the last 30 minutes." — aggregated rule hits + top clients.

What it's not yet good at

Multi-step workflows that need "wait and check again in ten minutes." Sessions don't pause and resume mid-tool-chain. Use a probe instead.
Anything that requires writing to Azure — not in scope.
Speculative architecture decisions. It can describe state but won't recommend changes.

TagQuest · Production / overview

last 1h

● ALL SYSTEMS OPERATIONAL updated 18s ago

App Gateway

Running

tagquest-prod-appgw

Requests / min

2,418

↑ 4% vs 1h ago

Latency p95

142 ms

within SLO

Error rate

0.02%

within SLO

Requests over time

2xx · 3xx · 4xx · 5xx

                          −60m−45m−30m−15mnow
                        

Backend pool · tq-prod-backend

3 / 3 instances

● tq-prod-vm-01

10.100.18.5

● tq-prod-vm-02

10.100.18.6

● tq-prod-vm-03

10.100.18.7

What's wrong here? This dashboard says my app is healthy but users are getting 502s.

Looking at this. The dashboard you screenshotted is showing the AG tagquest-prod-appgw-fnvw is "Running" — true at L4 — but the backend pool health probe is failing for two of three backends.

Backend pool: tq-prod-backend ├ tq-prod-vm-01 ▲ healthy ├ tq-prod-vm-02 ✕ probe failed (HTTP 503) └ tq-prod-vm-03 ✕ probe failed (HTTP 503)

The 502s the users see are AG saying "no healthy backend." Want me to run a probe from the closest probe VM to confirm those backends are reachable on port 443?

dashboard.png142KB×

Ask about firewall logs, denied traffic, DNS lookups, topology…

📎

Fig. 03Chat tab — image attachment, agent response with backend pool diagnosis.

v.

The Probes tab — real tests, with a human gate.

Probes are network tests run from disposable VMs that sit on every VWAN-connected VNet. They give you ground-truth reachability from a real source vantage — answering questions logs can't.

The catalog

Probe	What it does	Typical answer
Ping	ICMP reachability	Can A see B at all?
Traceroute	Hop chain to a target	Where is traffic dying?
DNS lookup	Resolve an FQDN via `dig +short`	Is private DNS resolving from this VNet?
TCP port	Single-port reachability check	Is port 443 open from A to B?
Port scan	Up to 32 ports in one shot	What's listening here from this vantage?
HTTPS	TLS handshake + HTTP status	Is the cert valid and the service up?
Packet capture	Network Watcher pcap → downloadable .cap	What's actually on the wire?

How a probe runs — the human-in-the-loop gate

This is the most important thing to understand about NetBot.

You fill in a form.
Target, port, source VM. The source VM is explicit so the probe runs from the network vantage you intend.
You click Submit.
The form sends a natural-language prompt to the agent.
The agent proposes a probe.
It reconstructs the command from a templated whitelist. You see the exact command before you click anything else.
An approval card appears.
"NetBot wants to run a ping probe from net-test-plat-nonprod to 10.100.18.5 (count 4). Approve / Deny."
You click Approve.
The probe runs — typically 30 to 90 seconds. Packet captures take their advertised duration plus a ~30 second flush.
The result appears.
stdout, stderr, exit code, executed command. For pcaps: a Download .cap link and an Analyze capture button that ships the file into the embedded chat for summary.

The probe VMs

Twelve of them, named net-test-{env}-{slot}, one per VWAN-connected VNet across AVD, TagQuest, WeSuite, Workday, PAYG-routing, and Platform. Small Linux VMs (~$8/month each), deallocated by default, auto-stop after 30 minutes of idle. Approved probes auto-wake them in about sixty seconds.

Workflow · Can the AVD desktop reach SQL? Probes tab → Port scan → source net-test-avd-prod → target = SQL server private IP → ports 1433, 5432, 3389. Approve. The result tells you in thirty seconds whether ports are open, filtered, or refused — from the actual AVD VNet. Definitive, no logs archaeology.

Run a probe

Ping

ICMP reachability.

Port scan ✓

Multi-port reachability.

Traceroute

Hop chain.

Packet capture

Downloadable .cap.

Port scan

Source VM *

net-test-avd-prod

Target *

10.100.81.42

Ports *

1433, 5432, 3389

Embedded chat

NetBot wants to run a port_scan probe

Source VM: net-test-avd-prod

Target: 10.100.81.42

for p in 1433 5432 3389; do nc -z -w 3 10.100.81.42 $p done

Fig. 04Probes tab — form on the left, approval card in the embedded chat on the right.

vi.

Attachments — images and packet captures.

Both the Chat tab and the Probes-tab embedded chat support file attachments. Click the paperclip in the composer.

Type	Max size	What happens
Image	10 MB	`.png`, `.jpg`, `.webp`, `.gif` — inlined to Claude as a base64 image block. Vision describes and OCRs it.
Packet capture	10 MB	`.pcap`, `.cap`, `.pcapng` — stored in blob; the agent calls `pcap_summary`, which returns top protocols, top conversations, top destinations.

Up to four files per message. Each becomes a chip above the text input. Click × on a chip to remove it. Files are dropped from the tray after sending.

Privacy on attachments

Images and pcaps live in PBFS-owned blob storage (netbotec2c3340, chat-uploads container).
A 7-day lifecycle policy auto-deletes them.
Read URLs are short-lived (1 hour) user-delegation SAS — they expire even if leaked.
Don't upload secrets-bearing material. Chat history is persisted server-side and recoverable in the audit log.

Analyze capture — the zero-upload path

After a packet_capture probe completes, the result card has an Analyze capture button. Clicking it server-side copies the pcap from probe storage into chat-uploads and stages it as an attachment in the embedded chat. No download/re-upload roundtrip; the agent reads it the moment you press Send.

vii.

Feedback & support — the Help menu.

NetBot is actively developed. The header's Help dropdown has two flows that file work directly into the team's ADO backlog.

Flow	Lands as	Use it for
Feature request	User Story under Feature `#14697` (Epic `#11437 Coding Agents`)	Any improvement to NetBot or the broader tooling environment.
Open a ticket	User Story under Feature `#14698` (Epic `#7443 Networking`)	Network-ops requests: new firewall rules, DNS zones, TLS certs, VNet peerings, public IPs.

Both open the same simple modal — title plus free-text description. The backend automatically attaches a triage block containing your UPN, the page URL you were on, your NetBot session id, and an excerpt of your last ten chat turns — so triage doesn't have to ask "where were you when this happened." A toast confirms the new work item id and links to it.

NetBot

PBFS network troubleshooter

Help

Feature request

Improve NetBot or PBFS tooling

Open a ticket

Request a network-ops change

View user guide ↗

Fig. 05The Help dropdown in the header — two flows, both file directly into ADO.

Fig. 06Help → Open a ticket — lands directly in ADO with full chat context attached.

viii.

Common workflows.

Quick triage — under sixty seconds

Land on Overview.
KPI red? Click into the affected card.
WAF Posture has a "Detection only" callout?
That's the row to investigate first.
Top Denied shows a familiar internal IP?
Switch to Chat. "Why is <IP> denied to <destination>?"

Is this a network problem? — under five minutes

Chat:
"Show me firewall denies for <my host> in the last 10 minutes."
If empty, Probes:
Ping from the closest probe VM to the target.
If ping works but app fails, Probes:
TCP port check on the actual application port.
If TCP works but the app still fails:
It's the app, not the network.

Investigate an attack signal

Overview → Top WAF Hits.
One client IP dominating? Click through.
Chat:
"Pull WAF rule hits for <IP> in the last hour, grouped by rule id."
Probes → packet capture
From the AG's backend VNet, filtered to that source IP, 30 seconds.
Click Analyze capture.
"Summarize this capture — what's the attacker doing?"

Verify a change you just made

Wait sixty seconds.
Let routes converge.
Probes → ping or traceroute
From the affected VNet's probe VM.
Chat:
"Compare denied flows for <host> in the last 5 minutes vs. an hour ago."

ix.

Frequently asked.

Can NetBot break anything?

No mutations are wired today. Probes run from disposable VMs and only execute commands that match a fixed allowlist (ping, traceroute, dig, nc, curl, mtr, Network Watcher pcap). Every probe needs your explicit Approve click. Future "admin" actions will be a separate gated feature with mandatory MFA + audit.

Who can use NetBot?

Anyone in the sg-netbot-users Entra group. Membership requests go through your manager or the platform team.

Where does my chat history go?

Sessions are stored in PBFS-owned Azure Table Storage in the platform subscription. Only your own user can read your sessions; the agent fetches by (user_oid, session_id). Sessions persist across page reloads but are not shared between users.

How fresh is the data?

Resource Graph queries are seconds-fresh. Firewall, WAF, and flow logs land in Sentinel within ~2 minutes. The Overview tab refreshes every 30 seconds. A probe is real-time as of the moment it runs.

How accurate is the AI?

The agent is right most of the time on factual queries grounded in live data, and occasionally wrong about causation. Treat its output as a senior engineer's first guess. The Overview tab and probe results are not LLM-generated — they're rendered straight from Azure Monitor and ARM, so they're authoritative.

What happens if the AI is rate-limited?

Per-user caps are 30 chats/minute, 200/hour, 1000/day. Feedback is 3/minute, 10/hour, 30/day. If you hit a cap, you get a clear "rate-limited" message and can retry once the window slides forward.

Can I share a probe result with a coworker?

Today, no — probe sessions are per-user. The fastest path is to open a ticket from the result and paste a description; the work item inherits your chat context automatically.

What if the agent proposes the wrong probe?

Click Deny. Re-prompt with corrections. Each Approve / Deny is logged in the audit table.

My capture didn't produce a blob — what now?

Network Watcher's flush window can occasionally miss the 75-second polling deadline on the busiest backends. Re-run the capture; if it fails twice, file a ticket via Help → Open a ticket and the platform team will check NW health.

Can I use it on mobile?

It works in mobile browsers but isn't optimized for small screens. Desk-first use is intended.

x.

Limits & quotas.

Limit	Value	Why
Chat — per minute	30	Burst protection
Chat — per hour	200	Sustained-use cap
Chat — per day	1,000	Per-user budget
Chat — org-wide per hour	500	LLM cost ceiling
Feedback — per minute	3	Anti-spam on ADO writes
Feedback — per day	30	Daily cap
Probes — concurrent per user	1 in flight	Probe VM warm-up serializes
Probes — pcap duration	≤ 60 s	Network Watcher hard cap
Attachments — files / message	4	Vision context budget
Attachments — bytes / file	10 MB	Practical Anthropic API cap
Attachments — retention	7 days	Lifecycle policy on chat-uploads
Probe pcaps — retention	7 days	Same lifecycle

xi.

Privacy, security & data handling.

What we store

Sessions — chat history, tool calls, tool results, in Azure Table Storage (auditlog + sessions). Encrypted at rest.
Attachments — chat-uploads blob container, 7-day TTL, user-delegation SAS read URLs (1h validity).
Audit rows — every chat turn, every probe approval/denial, every feedback submission, every rate-limit reject. Append-only.
Probe pcaps — per-subscription npcap* storage accounts, 7-day TTL.

Where the AI runs

The agent runs Claude Sonnet 4.6 via Azure AI Foundry in the david-mouks6j7-eastus2 deployment. Calls go through PBFS managed identity (netbot-ai-uami) — no API keys leave PBFS infrastructure. Foundry deployments have no data retention for prompts or completions.

What you should not put in chat

Customer PII outside of normal PBFS support workflows.
Bearer tokens, PATs, cloud credentials, private keys.
Anything covered by a stricter retention or DLP policy.

When in doubt, treat NetBot like an internal Slack channel: the conversation is logged for support and audit.

Auth model

Sign-in via Entra ID (PBFS tenant). MFA required per the org's standard CA policy. Backend trusts every request that reaches it — APIM has already validated the JWT before forwarding. NetBot doesn't store passwords or session tokens itself. An expired session redirects to /login — no flash of UI, no leaked data.

xii.

Glossary.

VWAN: Azure Virtual WAN — the hub-and-spoke topology PBFS uses for cloud networking. NetBot focuses on VWAN-connected resources.
Probe VM: A small Linux VM that NetBot can run network tests from. Twelve of them, one per VWAN-connected VNet. Deallocated when idle.
HITL: "Human in the loop" — every probe needs a person to click Approve before it executes.
WAF: Web Application Firewall, attached to App Gateways. Modes are Detection (logs only) and Prevention (blocks).
AG: Azure Application Gateway — Layer-7 load balancer in front of public-facing apps.
Flow log: Per-flow record from Azure Network Watcher describing traffic: source IP, destination IP, port, action, bytes.
Resource Graph: A real-time queryable index of all Azure resources. NetBot uses it for inventory questions.
NXDOMAIN: DNS response code meaning "this name does not exist." Spikes can indicate misconfig or malware beaconing.
Session: One conversation thread with the agent. The Chat tab and Probes tab keep separate sessions.
APIM: Azure API Management — validates your Entra JWT before requests reach NetBot's backend.
bff-auth: The platform's standard OAuth client that handles sign-in for NetBot and other PBFS apps.
ADO: Azure DevOps — where Help-menu tickets and feature requests land as User Stories.

Ask, see, and probe the network — from one chat.

What NetBot does, and what it isn't.

Who it's for

What it isn't

Getting started.

Sign in

Layout

The Overview tab — a single pane for network health.

Top to bottom

The Chat tab — talk to the agent.

What it can reach

What you can do with it

What it's good at

What it's not yet good at

The Probes tab — real tests, with a human gate.

The catalog

How a probe runs — the human-in-the-loop gate

The probe VMs

Attachments — images and packet captures.

Privacy on attachments

Analyze capture — the zero-upload path

Feedback & support — the Help menu.

Common workflows.

Quick triage — under sixty seconds

Is this a network problem? — under five minutes

Investigate an attack signal

Verify a change you just made

Frequently asked.

Limits & quotas.

Privacy, security & data handling.

What we store

Where the AI runs

What you should not put in chat

Auth model

Glossary.