AI-Powered Reconnaissance: OSINT Automation with Language Models
OSINT used to take days. A capable LLM with the right toolset can compress that into hours — and find things human analysts miss by connecting disparate data sources in ways that require holding a lot of context simultaneously.
The reconnaissance pipeline
from anthropic import Anthropic
from tools import whois, dns_lookup, shodan_search, github_search, linkedin_search
client = Anthropic()
def recon_target(organization: str) -> ReconReport:
tools = [
{"name": "whois", "description": "WHOIS lookup for domains"},
{"name": "dns_lookup", "description": "DNS enumeration"},
{"name": "shodan_search", "description": "Shodan internet scan data"},
{"name": "github_search", "description": "Search GitHub for org code"},
]
messages = [{"role": "user", "content": f"""
Perform comprehensive OSINT on: {organization}
Build a complete attack surface map including:
- IP ranges and ASNs
- Exposed services and versions
- Employee names and roles (from LinkedIn)
- Code repositories and any leaked secrets
- Technology stack from job postings
"""}]
# Agentic loop
while True:
response = client.messages.create(
model="claude-opus-4-8",
tools=tools,
messages=messages,
)
if response.stop_reason == "end_turn":
break
# Process tool calls, append results, continue
messages = handle_tool_calls(response, messages)
return parse_report(response)
What LLMs add beyond automation
The real value isn't speed — it's synthesis. A human analyst finds an S3 bucket URL in a GitHub commit from 2019. An LLM cross-references it against the current domain list, checks if the bucket is still accessible, correlates it with the employee who pushed the commit (now at a different company), and surfaces it as a live finding.
Passive vs active techniques
Passive (legal everywhere): WHOIS, DNS, Shodan, GitHub, LinkedIn, job postings, certificate transparency logs, web archives
Active (requires authorization): port scanning, web crawling, API fuzzing
Be explicit about the line. A well-configured LLM agent will respect scope boundaries if you define them clearly in the system prompt.
Opsec for AI-assisted recon
AI recon tools make API calls that can be logged. If operational security matters:
- Use a VPN or Tor for API queries
- Avoid querying proprietary threat intel about your own infrastructure (paradoxically tells vendors what you're interested in)
- Shodan and Censys queries are logged — use pre-downloaded data for sensitive work
Get the next writeup in your inbox
New posts delivered when I publish. No spam.