LLM-Augmented DFIR-IRIS Case Templates: Embedding AI Prompts Directly in Your IR Reports

In a previous post I released a library of DFIR-IRIS case templates covering common incident types. Those templates give you a pre-built task list, structured note directories, and a report scaffold, but the actual narrative content still needs to be written by a human analyst at the end of a long and usually exhausting investigation.

I've been experimenting with a different approach: embedding structured LLM prompts directly inside the case template's summary field, so that when the investigation is complete, an AI can draft the report narrative from the case data automatically. This post describes the concept, shows how the prompts are structured, and discusses where it works well and where it still needs a human.

Experimental status: These are experimental templates. They are not a replacement for analyst judgment and should not be used to generate reports that go to stakeholders without review. The intent is to reduce the time cost of first-draft report writing, not to remove analysts from the loop.

The Problem with IR Report Writing

DFIR-IRIS does a good job of structuring case data: tasks, notes, IOCs, assets, and timelines all live in one place by the time an investigation closes. The problem is that translating all of that structured data into a coherent written report, an executive summary, a MITRE ATT&CK analysis section, a CTI findings narrative, a conclusion, is time-consuming and cognitively expensive at exactly the moment when the team is most fatigued.

The standard case summary in IRIS is a free-text markdown field. The non-LLM version of my Ransomware template uses that field as a report scaffold with placeholder comments and empty tables for analysts to populate. That works, but it still requires an analyst to synthesise and write every narrative section from scratch.

The LLM-augmented version replaces those placeholder comments with explicit, tightly constrained prompts that tell an LLM exactly what to write, what data to use, and what not to fabricate.


How It Works

The template uses a simple {{AI_PROMPT: ...}} marker syntax embedded in the case summary markdown. Each marker contains a detailed instruction to an LLM, specifying what section to generate, what source data to use, what constraints apply, and how to handle uncertainty.

For example, the executive summary section looks like this:

{{AI_PROMPT: Using the case data provided, write a high-level, non-technical executive summary suitable for senior leadership and stakeholders. Cover: (1) the nature of the incident and ransomware variant if identified, (2) how and when it was detected and by whom, (3) the estimated scope of impact including affected systems, accounts, and business functions, (4) key investigation findings to date, and (5) the current response status and phase. Write in clear prose across two to four paragraphs. Do not use technical jargon without explanation. Base all statements strictly on case data, do not infer or fabricate details not present in the source material.}}

And a tactic-level ATT&CK section looks like this:

{{AI_PROMPT: Using the case data provided, identify and list the specific MITRE ATT&CK technique IDs and names observed for Initial Access (e.g. T1566.001 Spearphishing Attachment, T1190 Exploit Public-Facing Application). If no techniques were confirmed, state this explicitly. Base all statements strictly on case data.}}

The workflow is: complete the investigation in IRIS as normal, populating tasks, notes, IOCs, and assets throughout. When the case is ready to close, export or pass the full case data to an LLM alongside the template, and the prompts drive generation of each narrative section.


Prompt Design Principles

The prompts are written with a few consistent constraints that I found necessary to get useful output rather than hallucinated nonsense:

1. Ground every prompt in case data

Every prompt ends with some variation of: "Base all statements strictly on case data, do not infer or fabricate details not present in the source material." This is the most important constraint. Without it, LLMs will fill gaps with plausible-sounding but fabricated findings, which is worse than a blank field in an IR report.

2. Handle absence explicitly

Rather than leaving a section blank or populated with placeholders when a tactic had no observed activity, the prompts instruct the model to state the absence explicitly. For example, a tactic with no evidence should produce: "No activity was observed for this tactic during the investigation period." This is meaningful in a forensic report, it signals the tactic was considered, not overlooked.

3. Distinguish confirmed from suspected

Prompts for investigation sections explicitly ask the model to distinguish confirmed findings from current hypotheses, and to state confidence levels where attribution or scope is uncertain. IR reports that conflate confirmed evidence with working theories are a liability.

4. Audience-appropriate tone per section

The executive summary prompt specifically asks for non-technical language and prohibits unexplained jargon. The ATT&CK analysis prompts ask for technique IDs, evidence tables, and precise language. The conclusion prompt specifies a "clear, authoritative, and forward-looking" tone suitable for leadership. Each section has a different reader in mind.

5. Structural completeness

One prompt at the top of the ATT&CK section instructs the model to review all tactic subsections and replace any that remain blank or placeholder-filled with an explicit "no activity" statement. This prevents a half-populated report from going out.


What the Template Covers

The Ransomware LLM template generates AI-assisted drafts for the following report sections:

SectionWhat the LLM drafts
Executive SummaryNon-technical 2–4 paragraph overview for leadership
Scope of InvestigationIn-scope/out-of-scope definition, evidence limitations
ATT&CK Framework IntroOverall attack chain summary, tactic coverage overview
TA0001 — Initial AccessTechnique IDs, narrative, evidence
TA0002 — ExecutionTechnique IDs, tools/commands, narrative
TA0003 — PersistenceTechnique IDs, artifact paths, remediation status narrative
TA0004 — Privilege EscalationTechnique IDs, accounts, privilege level narrative
TA0005 — Defense EvasionTechnique IDs, evasion actions, log coverage impact
TA0006 — Credential AccessTechnique IDs, credential types, account scope
TA0007 — DiscoveryTechnique IDs, enumeration scope narrative
TA0008 — Lateral MovementTechnique IDs, movement sequence narrative
TA0011 — Command and ControlTechnique IDs, C2 infrastructure and beaconing narrative
TA0010 — ExfiltrationTechnique IDs, staging/transfer narrative, data types at risk
TA0040 — ImpactTechnique IDs, encryption scope, recovery inhibition narrative
CTI FindingsThreat actor attribution narrative, confidence assessment
Threat Actor ProfileGroup profile, TTP alignment, campaign links
Remediation IntroContainment/eradication posture summary, gap framing
ConclusionAttack chain summary, dwell time, attribution confidence, recovery path

The evidence tables (per-tactic and per-section) remain as structured markdown for analysts to populate — the LLM handles prose, humans handle tabular evidence documentation.


What This Is Not

It is worth being direct about the limitations, because IR reporting is a context where the cost of error is high.

It does not investigate for you. The quality of the generated report is entirely dependent on the quality of the case data in IRIS. Poorly documented investigations produce poorly generated reports. The template does not compensate for gaps in the underlying investigation.

It does not replace analyst review. Every section marked {{AI_PROMPT: ...}} produces a first draft, not a final product. ATT&CK technique mappings, attribution statements, and exfiltration assessments in particular need human verification before going to stakeholders, legal, or regulators.

It does not handle data sensitivity automatically. If your IRIS case contains information that should not leave a particular environment, PII, attorney-client privileged communications, classified data, you need to think carefully about what you pass to any external LLM API. Run these against a locally hosted model or an enterprise API with appropriate data handling controls if that is a concern in your environment.


Where It Goes Next

The current implementation requires manually passing case data to an LLM alongside the template prompts. The logical next step is to wire this into an n8n workflow that pulls the completed case from IRIS via API, constructs the prompt payload automatically, and writes the generated report back to the case, or delivers it directly to a reporting pipeline. That integration is something I'm actively working on.

I'm also exploring the same prompt-embedding approach for other incident types in the library, particularly the Data Breach and Supply Chain templates, where the notification obligation sections and regulatory framework analysis are the most time-consuming parts of report writing.


Get the Templates

The LLM-augmented templates are available alongside the standard templates on GitHub at https://github.com/zach115th/DFIR-IRIS-Templates/tree/main/Templates/Case/LLM. Look for files with the LLM suffix in the display name.

The goal isn't to have an AI write your IR reports. The goal is to get a defensible first draft in front of a tired analyst in five minutes instead of two hours.

Comments

Popular posts from this blog

DFIR-IRIS Case Templates: A Free, Open-Source Library for Common Incident Types