AI-Powered Reconnaissance: OSINT Automation with Language Models

OSINT used to take days. A capable LLM with the right toolset can compress that into hours — and find things human analysts miss by connecting disparate data sources in ways that require holding a lot of context simultaneously.

The reconnaissance pipeline

from anthropic import Anthropic
from tools import whois, dns_lookup, shodan_search, github_search, linkedin_search

client = Anthropic()

def recon_target(organization: str) -> ReconReport:
    tools = [
        {"name": "whois", "description": "WHOIS lookup for domains"},
        {"name": "dns_lookup", "description": "DNS enumeration"},
        {"name": "shodan_search", "description": "Shodan internet scan data"},
        {"name": "github_search", "description": "Search GitHub for org code"},
    ]
    
    messages = [{"role": "user", "content": f"""
    Perform comprehensive OSINT on: {organization}
    Build a complete attack surface map including:
    - IP ranges and ASNs
    - Exposed services and versions
    - Employee names and roles (from LinkedIn)
    - Code repositories and any leaked secrets
    - Technology stack from job postings
    """}]
    
    # Agentic loop
    while True:
        response = client.messages.create(
            model="claude-opus-4-8",
            tools=tools,
            messages=messages,
        )
        if response.stop_reason == "end_turn":
            break
        # Process tool calls, append results, continue
        messages = handle_tool_calls(response, messages)
    
    return parse_report(response)

What LLMs add beyond automation

The real value isn't speed — it's synthesis. A human analyst finds an S3 bucket URL in a GitHub commit from 2019. An LLM cross-references it against the current domain list, checks if the bucket is still accessible, correlates it with the employee who pushed the commit (now at a different company), and surfaces it as a live finding.

Passive vs active techniques

Passive (legal everywhere): WHOIS, DNS, Shodan, GitHub, LinkedIn, job postings, certificate transparency logs, web archives

Active (requires authorization): port scanning, web crawling, API fuzzing

Be explicit about the line. A well-configured LLM agent will respect scope boundaries if you define them clearly in the system prompt.

Opsec for AI-assisted recon

AI recon tools make API calls that can be logged. If operational security matters:

Use a VPN or Tor for API queries
Avoid querying proprietary threat intel about your own infrastructure (paradoxically tells vendors what you're interested in)
Shodan and Censys queries are logged — use pre-downloaded data for sensitive work

Keep going

Get the next writeup in your inbox

New posts delivered when I publish. No spam.

automation red-team osint llm reconnaissance

AI-Powered Reconnaissance: OSINT Automation with Language Models