NetBot User Guide
End-user guide · PBFS network troubleshooter

Ask, see, and probe the network — from one chat.

NetBot joins three things you used to do separately. It answers questions about the live VWAN, shows you what's healthy and what isn't, and runs real network tests from disposable probe VMs sitting on every VNet — all under a human-in-the-loop approval gate.

Live at
netbot.cloud.pyebarkerfs.com
Access
sg-netbot-users
Audience
Network & IT operators
Posture
Read-only by design
i.

What NetBot does, and what it isn't.

NetBot is the conversational front door for diagnosing and (carefully) acting on PBFS's network. It does three things, and only those three.

  1. Ask questions about the live network
    "Why is wesuite-prod getting 502s?" · "Who's been talking to the Private Endpoint at 10.100.48.x in the last hour?"
  2. See network health at a glance
    WAF posture, top denied flows, DNS failures, firewall status — refreshed every 30 seconds.
  3. Run real probes from real vantage points
    Ping, traceroute, DNS lookups, port scans, packet captures — from disposable VMs that already live on every VWAN-connected VNet. Every probe needs a human click.

Who it's for

RoleWhat NetBot saves them
On-call engineerTriaging an outage at 2am without opening six portal tabs to answer one question.
Network engineerVetting a change with a quick probe from a real source-VNet vantage.
Security analystJoined firewall + flow log + WAF rule context in one query.
Project lead"Is everything healthy?" answered without opening Sentinel.

What it isn't

  • Not a configuration tool. You can't change firewall rules, NSGs, or DNS records from inside NetBot. Future admin actions will be gated behind a separate group with mandatory MFA and full audit; today, that surface doesn't exist.
  • Not a replacement for Sentinel or Grafana for sustained monitoring. Use it for ad-hoc questions and event-driven probing.
  • Not magic. It can be wrong. Treat the chat output as a senior engineer's first guess — verify before acting.
ii.

Getting started.

Sign in

Go to netbot.cloud.pyebarkerfs.com. You'll be redirected to Microsoft sign-in; use your PBFS account. You need to be a member of the sg-netbot-users Entra group — your manager or the platform team can add you. First sign-in takes about ten seconds; subsequent visits within an hour are silent.

Layout

NetBot has a header, a tab bar, and a content area. The header carries the wordmark on the left and a Help menu and Sign out link on the right. Tabs sit just below.

  • Overview — default landing. Live network-health dashboard.
  • Chat — conversational troubleshooting with the AI agent.
  • Probes — form-driven probe execution under approval.
NetBot
PBFS network troubleshooter
Fig. 01Header and tab bar — present on every screen.
iii.

The Overview tab — a single pane for network health.

Overview is what loads first. It's the answer to "is anything on fire right now?" — refreshed every thirty seconds.

Top to bottom

  • KPI row — four big-number cards: firewall status, App Gateways in Prevention mode, DNS failures in the last hour, probe VMs running.
  • VWAN App Gateways — WAF Posture — every AG with its operational state, WAF mode (Prevention / Detection), and the WAF policy attached. Highlights gateways with no policy or in Detection-only mode.
  • Top Denied at Firewall — top source IPs generating denies, and top destination IPs being denied to. Click a row to seed a chat question about it.
  • Top WAF Hits by Client — top external client IPs and the AG hostname they're hitting most.
  • DNS Health — count of NXDOMAIN responses plus the top failing FQDNs.
  • Footer — last refresh timestamp and a "Refresh now" link.
Workflow · Are we under attack? Glance at the KPI row. If Top WAF Hits is dominated by a single client IP and that AG has been logging 4xx codes for five or more minutes, switch to the Chat tab and ask "summarize WAF blocks for that AG in the last 30 minutes."
Firewall
Healthy
cus-prod-fw
VWAN App Gateways
9 / 12
in Prevention
DNS Failures
1,247
NXDOMAIN / 1h
Probe VMs warm
2 / 12
running / total
VWAN App Gateways — WAF Posture 3 in Detection only
GatewayStateWAF ModePolicy
tagquest-prod-appgw-fnvw RunningPreventiontq-prod-waf-policy
wesuite-prod-appgw-bxlm RunningDetectionws-prod-waf-policy
avd-prod-fleetingress-mmpc RunningNo policy
Top Denied at Firewall (last 1h)
By source IP — who's being denied
10.100.48.422,341 denies→ 8.8.8.8
10.100.51.17847 denies→ 13.107.6.152
10.100.18.5412 denies→ 23.23.x.x
Top WAF Hits by Client (last 1h)
Client IPHitsTop AG
203.0.113.423,108tagquest-prod
198.51.100.71,022wesuite-prod
Refreshed 12s ago · Refresh now
Fig. 02Overview tab — KPI row, WAF posture, top denied, top WAF clients.
iv.

The Chat tab — talk to the agent.

Chat is where you ask. The agent reads logs, queries Resource Graph, and shows its work. It runs Claude Sonnet 4.6 via Azure AI Foundry, with about thirty tools wired into the live network.

What it can reach

  • Resource Graph — list and find VMs, VNets, NSGs, App Gateways, public IPs, private endpoints.
  • Azure Monitor & Log Analytics — firewall logs, WAF logs, AG access logs, NSG flow logs, DNS resolver logs.
  • Topology — VNet peerings, route tables, effective routes, "find the VNet by CIDR," "which AG fronts which app."
  • WAF & firewall policy — list rules, find rules, check what's enabled.
  • Probes — propose ping, traceroute, DNS, TCP, port scan, HTTPS, and packet capture. These always need approval — see §5.

What you can do with it

  • Follow-ups understand prior turns. "…and what about port 443 on the same target?" works.
  • Conversation persists across page reloads. Click New chat in the tab toolbar to start fresh.
  • Markdown and Mermaid. Network diagrams render inline when the agent thinks they help.
  • File attachments. Paperclip accepts images (Vision reads screenshots and CLI captures) and packet captures (.pcap, .cap, .pcapng — analyzed via pcap_summary).
  • Starter prompts appear on an empty chat — "Show me VWAN health," "Top WAF blocks last hour."

What it's good at

  • "Why is <app> slow?" — walks the path: AG status → WAF mode → backend pool health → flow logs → probe-VM ping if needed.
  • "Who's been talking to 10.100.X.Y?" — flow-log query joined with NSG and firewall context.
  • "Summarize WAF blocks for tagquest-prod in the last 30 minutes." — aggregated rule hits + top clients.

What it's not yet good at

  • Multi-step workflows that need "wait and check again in ten minutes." Sessions don't pause and resume mid-tool-chain. Use a probe instead.
  • Anything that requires writing to Azure — not in scope.
  • Speculative architecture decisions. It can describe state but won't recommend changes.
TagQuest · Production / overview
last 1h
● ALL SYSTEMS OPERATIONAL updated 18s ago
App Gateway
Running
tagquest-prod-appgw
Requests / min
2,418
↑ 4% vs 1h ago
Latency p95
142 ms
within SLO
Error rate
0.02%
within SLO
Requests over time
2xx · 3xx · 4xx · 5xx
−60m−45m−30m−15mnow
Backend pool · tq-prod-backend
3 / 3 instances
● tq-prod-vm-01
10.100.18.5
● tq-prod-vm-02
10.100.18.6
● tq-prod-vm-03
10.100.18.7
What's wrong here? This dashboard says my app is healthy but users are getting 502s.
Looking at this. The dashboard you screenshotted is showing the AG tagquest-prod-appgw-fnvw is "Running" — true at L4 — but the backend pool health probe is failing for two of three backends.
Backend pool: tq-prod-backend ├ tq-prod-vm-01 ▲ healthy ├ tq-prod-vm-02 ✕ probe failed (HTTP 503) └ tq-prod-vm-03 ✕ probe failed (HTTP 503)
The 502s the users see are AG saying "no healthy backend." Want me to run a probe from the closest probe VM to confirm those backends are reachable on port 443?
dashboard.png142KB×
Ask about firewall logs, denied traffic, DNS lookups, topology…
📎
Fig. 03Chat tab — image attachment, agent response with backend pool diagnosis.
v.

The Probes tab — real tests, with a human gate.

Probes are network tests run from disposable VMs that sit on every VWAN-connected VNet. They give you ground-truth reachability from a real source vantage — answering questions logs can't.

The catalog

ProbeWhat it doesTypical answer
PingICMP reachabilityCan A see B at all?
TracerouteHop chain to a targetWhere is traffic dying?
DNS lookupResolve an FQDN via dig +shortIs private DNS resolving from this VNet?
TCP portSingle-port reachability checkIs port 443 open from A to B?
Port scanUp to 32 ports in one shotWhat's listening here from this vantage?
HTTPSTLS handshake + HTTP statusIs the cert valid and the service up?
Packet captureNetwork Watcher pcap → downloadable .capWhat's actually on the wire?

How a probe runs — the human-in-the-loop gate

This is the most important thing to understand about NetBot.

  1. You fill in a form.
    Target, port, source VM. The source VM is explicit so the probe runs from the network vantage you intend.
  2. You click Submit.
    The form sends a natural-language prompt to the agent.
  3. The agent proposes a probe.
    It reconstructs the command from a templated whitelist. You see the exact command before you click anything else.
  4. An approval card appears.
    "NetBot wants to run a ping probe from net-test-plat-nonprod to 10.100.18.5 (count 4). Approve / Deny."
  5. You click Approve.
    The probe runs — typically 30 to 90 seconds. Packet captures take their advertised duration plus a ~30 second flush.
  6. The result appears.
    stdout, stderr, exit code, executed command. For pcaps: a Download .cap link and an Analyze capture button that ships the file into the embedded chat for summary.

The probe VMs

Twelve of them, named net-test-{env}-{slot}, one per VWAN-connected VNet across AVD, TagQuest, WeSuite, Workday, PAYG-routing, and Platform. Small Linux VMs (~$8/month each), deallocated by default, auto-stop after 30 minutes of idle. Approved probes auto-wake them in about sixty seconds.

Workflow · Can the AVD desktop reach SQL? Probes tab → Port scan → source net-test-avd-prod → target = SQL server private IP → ports 1433, 5432, 3389. Approve. The result tells you in thirty seconds whether ports are open, filtered, or refused — from the actual AVD VNet. Definitive, no logs archaeology.
Run a probe
Ping
ICMP reachability.
Port scan ✓
Multi-port reachability.
Traceroute
Hop chain.
Packet capture
Downloadable .cap.
Port scan
net-test-avd-prod
10.100.81.42
1433, 5432, 3389
Embedded chat
NetBot wants to run a port_scan probe
Source VM: net-test-avd-prod
Target: 10.100.81.42
for p in 1433 5432 3389; do nc -z -w 3 10.100.81.42 $p done
Fig. 04Probes tab — form on the left, approval card in the embedded chat on the right.
vi.

Attachments — images and packet captures.

Both the Chat tab and the Probes-tab embedded chat support file attachments. Click the paperclip in the composer.

TypeMax sizeWhat happens
Image10 MB.png, .jpg, .webp, .gif — inlined to Claude as a base64 image block. Vision describes and OCRs it.
Packet capture10 MB.pcap, .cap, .pcapng — stored in blob; the agent calls pcap_summary, which returns top protocols, top conversations, top destinations.

Up to four files per message. Each becomes a chip above the text input. Click × on a chip to remove it. Files are dropped from the tray after sending.

Privacy on attachments

  • Images and pcaps live in PBFS-owned blob storage (netbotec2c3340, chat-uploads container).
  • A 7-day lifecycle policy auto-deletes them.
  • Read URLs are short-lived (1 hour) user-delegation SAS — they expire even if leaked.
  • Don't upload secrets-bearing material. Chat history is persisted server-side and recoverable in the audit log.

Analyze capture — the zero-upload path

After a packet_capture probe completes, the result card has an Analyze capture button. Clicking it server-side copies the pcap from probe storage into chat-uploads and stages it as an attachment in the embedded chat. No download/re-upload roundtrip; the agent reads it the moment you press Send.

vii.

Feedback & support — the Help menu.

NetBot is actively developed. The header's Help dropdown has two flows that file work directly into the team's ADO backlog.

FlowLands asUse it for
Feature request User Story under Feature #14697 (Epic #11437 Coding Agents) Any improvement to NetBot or the broader tooling environment.
Open a ticket User Story under Feature #14698 (Epic #7443 Networking) Network-ops requests: new firewall rules, DNS zones, TLS certs, VNet peerings, public IPs.

Both open the same simple modal — title plus free-text description. The backend automatically attaches a triage block containing your UPN, the page URL you were on, your NetBot session id, and an excerpt of your last ten chat turns — so triage doesn't have to ask "where were you when this happened." A toast confirms the new work item id and links to it.

NetBot
PBFS network troubleshooter
Help
Feature request
Improve NetBot or PBFS tooling
Open a ticket
Request a network-ops change
View user guide ↗
Fig. 05The Help dropdown in the header — two flows, both file directly into ADO.
Fig. 06Help → Open a ticket — lands directly in ADO with full chat context attached.
viii.

Common workflows.

Quick triage — under sixty seconds

  1. Land on Overview.
    KPI red? Click into the affected card.
  2. WAF Posture has a "Detection only" callout?
    That's the row to investigate first.
  3. Top Denied shows a familiar internal IP?
    Switch to Chat. "Why is <IP> denied to <destination>?"

Is this a network problem? — under five minutes

  1. Chat:
    "Show me firewall denies for <my host> in the last 10 minutes."
  2. If empty, Probes:
    Ping from the closest probe VM to the target.
  3. If ping works but app fails, Probes:
    TCP port check on the actual application port.
  4. If TCP works but the app still fails:
    It's the app, not the network.

Investigate an attack signal

  1. Overview → Top WAF Hits.
    One client IP dominating? Click through.
  2. Chat:
    "Pull WAF rule hits for <IP> in the last hour, grouped by rule id."
  3. Probes → packet capture
    From the AG's backend VNet, filtered to that source IP, 30 seconds.
  4. Click Analyze capture.
    "Summarize this capture — what's the attacker doing?"

Verify a change you just made

  1. Wait sixty seconds.
    Let routes converge.
  2. Probes → ping or traceroute
    From the affected VNet's probe VM.
  3. Chat:
    "Compare denied flows for <host> in the last 5 minutes vs. an hour ago."
ix.

Frequently asked.

Can NetBot break anything?

No mutations are wired today. Probes run from disposable VMs and only execute commands that match a fixed allowlist (ping, traceroute, dig, nc, curl, mtr, Network Watcher pcap). Every probe needs your explicit Approve click. Future "admin" actions will be a separate gated feature with mandatory MFA + audit.

Who can use NetBot?

Anyone in the sg-netbot-users Entra group. Membership requests go through your manager or the platform team.

Where does my chat history go?

Sessions are stored in PBFS-owned Azure Table Storage in the platform subscription. Only your own user can read your sessions; the agent fetches by (user_oid, session_id). Sessions persist across page reloads but are not shared between users.

How fresh is the data?

Resource Graph queries are seconds-fresh. Firewall, WAF, and flow logs land in Sentinel within ~2 minutes. The Overview tab refreshes every 30 seconds. A probe is real-time as of the moment it runs.

How accurate is the AI?

The agent is right most of the time on factual queries grounded in live data, and occasionally wrong about causation. Treat its output as a senior engineer's first guess. The Overview tab and probe results are not LLM-generated — they're rendered straight from Azure Monitor and ARM, so they're authoritative.

What happens if the AI is rate-limited?

Per-user caps are 30 chats/minute, 200/hour, 1000/day. Feedback is 3/minute, 10/hour, 30/day. If you hit a cap, you get a clear "rate-limited" message and can retry once the window slides forward.

Can I share a probe result with a coworker?

Today, no — probe sessions are per-user. The fastest path is to open a ticket from the result and paste a description; the work item inherits your chat context automatically.

What if the agent proposes the wrong probe?

Click Deny. Re-prompt with corrections. Each Approve / Deny is logged in the audit table.

My capture didn't produce a blob — what now?

Network Watcher's flush window can occasionally miss the 75-second polling deadline on the busiest backends. Re-run the capture; if it fails twice, file a ticket via Help → Open a ticket and the platform team will check NW health.

Can I use it on mobile?

It works in mobile browsers but isn't optimized for small screens. Desk-first use is intended.

x.

Limits & quotas.

LimitValueWhy
Chat — per minute30Burst protection
Chat — per hour200Sustained-use cap
Chat — per day1,000Per-user budget
Chat — org-wide per hour500LLM cost ceiling
Feedback — per minute3Anti-spam on ADO writes
Feedback — per day30Daily cap
Probes — concurrent per user1 in flightProbe VM warm-up serializes
Probes — pcap duration≤ 60 sNetwork Watcher hard cap
Attachments — files / message4Vision context budget
Attachments — bytes / file10 MBPractical Anthropic API cap
Attachments — retention7 daysLifecycle policy on chat-uploads
Probe pcaps — retention7 daysSame lifecycle
xi.

Privacy, security & data handling.

What we store

  • Sessions — chat history, tool calls, tool results, in Azure Table Storage (auditlog + sessions). Encrypted at rest.
  • Attachmentschat-uploads blob container, 7-day TTL, user-delegation SAS read URLs (1h validity).
  • Audit rows — every chat turn, every probe approval/denial, every feedback submission, every rate-limit reject. Append-only.
  • Probe pcaps — per-subscription npcap* storage accounts, 7-day TTL.

Where the AI runs

The agent runs Claude Sonnet 4.6 via Azure AI Foundry in the david-mouks6j7-eastus2 deployment. Calls go through PBFS managed identity (netbot-ai-uami) — no API keys leave PBFS infrastructure. Foundry deployments have no data retention for prompts or completions.

What you should not put in chat

  • Customer PII outside of normal PBFS support workflows.
  • Bearer tokens, PATs, cloud credentials, private keys.
  • Anything covered by a stricter retention or DLP policy.

When in doubt, treat NetBot like an internal Slack channel: the conversation is logged for support and audit.

Auth model

Sign-in via Entra ID (PBFS tenant). MFA required per the org's standard CA policy. Backend trusts every request that reaches it — APIM has already validated the JWT before forwarding. NetBot doesn't store passwords or session tokens itself. An expired session redirects to /login — no flash of UI, no leaked data.

xii.

Glossary.

VWAN
Azure Virtual WAN — the hub-and-spoke topology PBFS uses for cloud networking. NetBot focuses on VWAN-connected resources.
Probe VM
A small Linux VM that NetBot can run network tests from. Twelve of them, one per VWAN-connected VNet. Deallocated when idle.
HITL
"Human in the loop" — every probe needs a person to click Approve before it executes.
WAF
Web Application Firewall, attached to App Gateways. Modes are Detection (logs only) and Prevention (blocks).
AG
Azure Application Gateway — Layer-7 load balancer in front of public-facing apps.
Flow log
Per-flow record from Azure Network Watcher describing traffic: source IP, destination IP, port, action, bytes.
Resource Graph
A real-time queryable index of all Azure resources. NetBot uses it for inventory questions.
NXDOMAIN
DNS response code meaning "this name does not exist." Spikes can indicate misconfig or malware beaconing.
Session
One conversation thread with the agent. The Chat tab and Probes tab keep separate sessions.
APIM
Azure API Management — validates your Entra JWT before requests reach NetBot's backend.
bff-auth
The platform's standard OAuth client that handles sign-in for NetBot and other PBFS apps.
ADO
Azure DevOps — where Help-menu tickets and feature requests land as User Stories.