Sohrabi Safa, Nader ORCID: https://orcid.org/0000-0003-4897-0084
(2026)
Trustworthy AI: AI Agents Information Security and AI Agents for Information Security.
MDPI, Information.
- Details
Generative Artificial Intelligence (AI) refers to pretrained models capable of producing text, images, code, and other content in response to prompts. Large Language Models (LLMs) are prominent examples of such systems and have become widely adopted across industry and research settings. Building upon these models, AI agents are autonomous or semi‑autonomous software entities that use AI and LLM capabilities to pursue bounded, goal‑driven tasks across a range of domains. When multiple specialized agents collaborate within an orchestration framework, they form agentic AI systems capable of executing complex, multi‑stage objectives—including tasks that enhance the information security posture of systems, infrastructures, or entire organizations. By integrating planning, reasoning, and action, such agents bring substantial productivity gains and high levels of automation to security workflows.
However, the increasing autonomy and capability of AI agents also introduces new information security risks. As these agents read sensitive context, ingest untrusted data, call external tools, and operate across networks, they expand the attack surface in ways fundamentally different from traditional software. Instructions encoded in natural language, large context windows that accept potentially malicious inputs, and agent tool‑use that executes real-world actions all create opportunities for exploitation. Challenges such as hallucinations, adversarial manipulation, and false positives further complicate the secure deployment of these systems.
At the same time, malicious actors are leveraging AI agents to enhance offensive operations. Attackers now use AI to automate end‑to‑end attack chains, conduct cyber‑espionage, generate novel malware variants, perform large‑scale vulnerability scanning, and target agent orchestration frameworks themselves. This dual use illustrates a critical tension at the heart of modern cybersecurity. AI agents are increasingly powerful tools both for defending systems and for compromising them. The scope of this Special Issue includes, but is not limited to, the following subjects:
Trustworthy autonomous AI agents for cyber defense;
Secure multi-agent collaboration frameworks in enterprise networks;
Adversarial attacks against AI agent orchestration systems;
Prompt injection and jailbreak attacks in agentic AI systems;
Hallucination detection and mitigation in security-oriented AI agents;
Explainable and auditable AI agents for information security operations;
AI agents for automated threat hunting and incident response;
Privacy-preserving AI agents for sensitive data environments;
Zero-trust architectures for agentic AI ecosystems;
AI-agent-driven malware analysis and reverse engineering;
Offensive use of AI agents in cybercrime and cyber espionage;
Detection of malicious AI agents in cloud and edge environments;
AI agents for vulnerability discovery, prioritization, and patch management;
Ethical, legal, and governance challenges of autonomous AI agents in cybersecurity;
Resilient and robust AI agents against adversarial manipulation and data poisoning;
Vertical federated learning and AI agents.
This Special Issue seeks to explore this emerging landscape. We invite contributions from researchers, academics, and practitioners that investigate AI agents as both defensive enablers and potential security threats. Our goal is to gather high‑quality insights, empirical studies, theoretical analyses, and practical advances that illuminate the evolving role of agentic AI in information security — for better and for worse.
Dr. Nader Sohrabi Safa (Department of Computing, Worcester Business School, University of Worcester) was Guest Editor for this Special Issue about Trustworthy AI.
![]() |
Tools
Tools