AI Tool Discovery Audit for IT Security in 30 Minutes

In 30 minutes, you can know which AI tools are actually in use in your company. Not the tools you think are in use. The tools your DNS and proxy logs say are in use.

That matters because most AI risk work starts too late. Teams draft policy first, then hunt for usage. Reverse it. Find the traffic. Rank the tools by volume. Then decide what to control.

This is a practical Wednesday audit. No new agent. No procurement cycle. Just queries against data you already have.

The 30-minute AI tool discovery audit

Pull three data sources:

DNS logs
Proxy or secure web gateway logs
Endpoint or MDM inventory

Start with DNS. It is usually the fastest path to a real answer. Search for these domains first:

chat.openai.com
chatgpt.com
claude.ai
gemini.google.com
copilot.microsoft.com
perplexity.ai
poe.com
character.ai
mistral.ai
cohere.ai
huggingface.co
replicate.com

If you use a SIEM, keep the first query blunt. You are not proving policy violations yet. You are establishing surface area.

index=dns domain IN (chat.openai.com, claude.ai, gemini.google.com) | stats count by user, domain

Expand that pattern to the full list above. If your DNS logs do not carry user identity, group by source IP first, then map IP to device owner from DHCP, EDR, or MDM.

What you want from the DNS pass:

Which domains appear at all
Which users or source IPs hit them most
Which business units are overrepresented
Which days and hours the traffic clusters around

Then move to proxy logs. DNS tells you intent. Proxy tells you whether there was actual session activity and how much.

sourcetype=proxy bytes>1000 destination IN (chat.openai.com, chatgpt.com) | table user, destination, bytes, timestamp

That bytes>1000 filter removes a lot of background noise, prefetching, and accidental page loads. Tune it if your proxy logs are sparse.

A stronger version looks like this:

index=proxy destination IN (chat.openai.com, chatgpt.com, claude.ai, gemini.google.com, copilot.microsoft.com, perplexity.ai, poe.com, character.ai, mistral.ai, cohere.ai, huggingface.co, replicate.com) | stats count sum(bytes) as total_bytes earliest(_time) as first_seen latest(_time) as last_seen by user, device, destination

Now you have a first-pass inventory.

Cross-reference with endpoint and MDM

This is where the audit becomes useful.

Take your top users and top devices from DNS or proxy logs and match them against endpoint telemetry. You want two answers:

Which user visited the tool
From which managed or unmanaged device
When the visits happened

Good MDM data will give you device ownership, serial number, OS version, browser status, and whether the machine is compliant. If you have Microsoft Intune, Jamf, Kandji, Workspace ONE, or similar, export a device list and join on hostname, serial, Azure AD device ID, or the last known IP.

The point is not forensic perfection. The point is knowing whether your traffic is coming from:

Managed corporate laptops
BYOD endpoints
Shared support machines
Developer workstations with local admin rights

That last category often matters most.

A compliance operator may see 200 visits to Claude and panic. But if 170 came from six engineering devices in one week, you have a contained rollout problem, not a company-wide one.

Don’t miss the shadow category

Public AI websites are the easy part. The shadow category is self-hosted or local LLM tooling.

Look for internal IP traffic and process evidence tied to ollama and LM Studio. In logs, that may show up as:

Internal connections to ports commonly used by local inference servers
Device process names such as ollama, ollama serve, or LM Studio
Browser access to internal endpoints on RFC1918 space such as http://10.x.x.x:11434
Repeated localhost or local subnet API calls from developer machines

This matters because local tooling often bypasses your normal SaaS review path. There is no obvious vendor login page, no new procurement record, and no external DNS hit if the model is being served inside your network.

If you have EDR, search process execution over the last 14 days. If you only have proxy and DNS, search for unusual browser or API traffic to internal IPs from power users. Self-hosted LLM use is common in engineering teams and almost invisible to policy-only controls.

Rank by volume, not by concern

Once you have the list, avoid the usual mistake.

Do not start with the tool that feels scariest. Start with the tool used most.

The most-used tools create the biggest exposure first. They generate the most prompts, the most copy-paste events, and the most chances for customer data, deal terms, source code, health information, or case notes to leave the browser.

So rank your results by:

Total sessions or DNS hits
Distinct users
Managed versus unmanaged devices
Repeat usage over 7 to 14 days

That gives you the order for instrumentation. If ChatGPT and Copilot dominate usage, those are your first control points. If Perplexity appears only twice, leave it lower on the list until the heavy traffic is covered.

DNS-level visibility is a good start. Prompt-level visibility is the full picture.

!Prytive authenticated dashboard showing a per-user breakdown of which AI tools have been used and how often

Once you can see prompts before they leave the browser, you stop guessing about whether a visit turned into a data exposure event. You can tell which tool was used, by whom, and what category of sensitive data was caught, without storing the raw sensitive text.

What good looks like after the audit

By the end of this exercise, you should have:

A ranked list of AI tools actually used in your environment
Named users and devices behind the top traffic
A separate list of likely self-hosted LLM activity
A short list of the first 2-3 tools to instrument

That is enough to move from policy theatre to operational control.

If you report into compliance, tie this back to real obligations. GDPR Article 5(1)(f) requires integrity and confidentiality of personal data. GDPR Article 32 requires appropriate technical and organisational measures. For US health data, the HIPAA Security Rule at 45 CFR §164.312 requires technical safeguards around access and transmission. You cannot meet those standards on AI usage you have not inventoried.

The 5-step playbook you can do this afternoon

1. Query DNS for the known AI domains

Search for:

chat.openai.com
chatgpt.com
claude.ai
gemini.google.com
copilot.microsoft.com
perplexity.ai
poe.com
character.ai
mistral.ai
cohere.ai
huggingface.co
replicate.com

Group by user, source IP, and domain.

2. Validate with proxy or web gateway data

Filter for meaningful traffic volume and extract user, destination, bytes, and timestamp. Confirm which tools were really used, not just resolved.

3. Join against MDM or endpoint inventory

Identify the user, device type, management state, and last-seen time. Separate managed laptops from BYOD and developer machines.

4. Check for shadow local LLM use

Search EDR and internal traffic for ollama, lm-studio, localhost inference servers, and internal IP endpoints.

5. Rank by volume and pick the first controls

Start with the top 2-3 tools by usage. Instrument those first. That is where your risk reduction happens fastest.

Run the audit today. Then run a 14-day Prytive pilot on the top 2-3 tools you find.