5 min read Intermediate AI / LLM

LLM Security Assessment

LLM pentesting (Large Language Model penetration testing) is the process of testing an AI language model to find weaknesses in its behavior, safety, and reliability.

It involves checking how the model responds to different inputs to understand whether it can be misled, produce incorrect or unsafe outputs, or fail to follow intended rules. The goal is to improve the model’s security, accuracy, and robustness before real-world use.

OWASP Top 10

https://genai.owasp.org/llm-top-10/

LLM Basics

Concept	Meaning	Simple Example
Model	The AI system that understands input and generates responses based on patterns it learned	Think of it like a super advanced autocomplete that can also write essays, solve problems, or code
Prompt	The instruction or question you give to the AI	“Explain photosynthesis like I’m 10 years old”
Context Window	How much conversation text the AI can keep in mind at once	If you chat too long, the AI may forget what you said at the beginning
Tokens	Small chunks of text the AI reads (not always full words)	“ChatGPT is great” becomes pieces like “Chat”, “GPT”, “is”, “great”
System Message	Hidden rules that control how the AI should behave	“Be helpful, safe, and don’t give harmful instructions” (you don’t normally see this)
Developer Message	App-level instructions that shape how the AI behaves in a specific app	A chatbot app saying: “Keep answers short and formal”
User Message	What you directly type to the AI	“What is machine learning?”
Inference	The moment the AI generates an answer	You ask a question and instantly get a response back
Training Data	The huge amount of text used to teach the AI	Books, websites, articles, code used before the AI was released
Temperature	Controls how creative or random the answer is	Low = factual answer, High = more creative or story-like response
Alignment	How well the AI follows human rules and stays safe	The AI refusing harmful requests and staying helpful
Tool Use (Function Calling)	When AI uses external systems like APIs or tools	Asking “weather today” → AI calls a weather service instead of guessing
Memory (if available)	Ability to remember past chats	AI remembering your name or preferences in later conversations

Core Concept

Model = brain
Prompt = question
Context = short-term memory
Tokens = words broken into pieces
System message = hidden teacher rules
Inference = thinking and answering process

LLM Vulnerabilities

Vulnerability	OWASP Mapping	Description	LLM-Specific Mechanism	Impact
Prompt Injection	LLM01	Malicious instructions override system behavior	Injected text in prompts, webpages, or documents	Policy bypass, unauthorized actions
Indirect Prompt Injection	LLM01	Hidden instructions in external data sources	RAG content, PDFs, web pages influencing model	Stealth control of outputs
Jailbreaking	LLM01 / LLM07	Bypassing safety alignment rules	Role-play, obfuscation, multi-turn persuasion	Restricted content generation
System Prompt Leakage	LLM07	Exposure of hidden system/developer prompts	“Repeat instructions above” attacks	Loss of control/security logic
Context Window Injection	LLM01	Hidden malicious instructions inside long context	Large documents overriding earlier rules	Silent behavioral manipulation
Retrieval Poisoning (RAG Attack)	LLM08	Poisoning external knowledge base	Malicious embeddings or documents in vector DB	Wrong or manipulated responses
Data Extraction (Memorization Leak)	LLM02	Extracting training or sensitive data	Targeted prompting to retrieve memorized content	Privacy leakage
Membership Inference	LLM02	Detecting if data was in training set	Confidence probing, response pattern analysis	Privacy violation
Model Inversion	LLM02	Reconstructing original private data	Query-based reconstruction attacks	Sensitive data recovery
Tool / Function Call Manipulation	LLM06	Misuse of external tools or APIs	Prompt forces unsafe function execution	Data exfiltration, unauthorized actions
Excessive Agency Exploitation	LLM06	Over-permissioned autonomous systems	Agent performs actions beyond intended scope	Financial/data/system damage
Output Handling Vulnerabilities	LLM05	Unsafe downstream processing of outputs	Injected SQL/HTML/code from model output	XSS, SQL injection, command injection
Token Smuggling	LLM01	Hidden instructions in encoded formats	Base64, Unicode tricks, formatting bypass	Safety filter evasion
Multi-turn Jailbreak	LLM01	Gradual manipulation across conversations	Step-by-step trust building	Policy bypass over time
Prompt Chaining Attack	LLM01	Splitting malicious intent into safe steps	Each prompt appears harmless individually	Combined harmful outcome
Instruction Hierarchy Confusion	LLM01	Conflicting system/user/context rules	Ambiguity in priority of instructions	Unpredictable or unsafe outputs
Hallucination Exploitation	LLM09	Leveraging false but confident outputs	Model fabricates answers under uncertainty	Misinformation propagation
Bias Exploitation	LLM09	Triggering learned biases in outputs	Leading or framed prompts	Harmful or discriminatory content
Overlong Context Degradation	LLM10	Performance degradation in long inputs	Attention dilution across large context	Missed constraints or errors
Unbounded Resource Consumption	LLM10	Forcing excessive computation or loops	Recursive prompts or tool loops	Cost explosion / denial of service
Data Poisoning (Training/Fine-tune)	LLM03 / LLM04	Corrupting training or tuning data	Injected malicious dataset entries	Persistent harmful model behavior
Supply Chain Model Attack	LLM03	Vulnerabilities in third-party models/tools	External APIs, plugins, model weights	Backdoors or compromised behavior

OWASP Top 10​

LLM Basics​

Core Concept​

LLM Vulnerabilities

OWASP Top 10

LLM Basics

Core Concept