Machine Learning-Based AES Key Recovery via Side-Channel Analysis on the ASCAD Dataset

Investigating the application of machine learning (ML) and Deep Learning (DL) models to exploit electromagnetic (EM) side-channel leakage for AES key recovery. This project uses the public ASCAD dataset and focuses on the Key Rank metric for evaluation.

A paper detailing this work is currently in progress.

Explore

**Status: Paper in progress, preliminary results available.**

View on Github Access Draft

The Vulnerability

Cryptographic algorithms like Advanced Encryption Standard (AES) are mathematically robust. However, their physical implementations on devices can leak information through side channels, such as power consumption or electromagnetic (EM) emissions. This leakage can potentially compromise theoretically secure algorithms. Electromagnetic analysis (EMA) is a potent form of side-channel analysis (SCA) where attackers measure EM fields radiating from a device during cryptographic operations. These emissions often contain subtle variations correlated with the intermediate data being processed, which can be linked to the secret key.

Recent advancements show Machine Learning (ML) and Deep Learning (DL) are powerful tools for automatically learning these complex correlations, often outperforming traditional statistical SCA techniques. This project focuses on leveraging ML/DL for AES key recovery using the ASCAD dataset.

Key Challenges

Low signal-to-noise ratio (SNR) in EM traces.
Standard classification metrics (e.g., accuracy) being uninformative for SCA.
High dimensionality of side-channel data (e.g., 700-1400 samples per trace).
Computational cost of training complex ML models.
Necessity of domain-specific evaluation metrics like Key Rank.

Key Aspects & Contributions

Comparative Model Analysis

A comparative performance analysis of standard classifiers (Random Forest, Support Vector Machine) and a tailored Convolutional Neural Network (CNN) for AES key byte recovery on the ASCAD fixed and variable key datasets.

Feature Importance & Reduction

Exploration of Random Forest-based feature importance for dimensionality reduction and its impact on model efficiency and effectiveness in the SCA context.

Key Rank Metric Demonstration

A clear demonstration of the necessity and superiority of the domain-specific Key Rank metric over standard accuracy for evaluating ML-based SCA success, especially in low Signal-to-Noise Ratio scenarios.

Successful Key Recovery

Confirmation of successful key recovery using both CNN and feature-selected RF models, highlighting the practical feasibility of ML-based side-channel attacks despite low per-trace classification accuracy.

Technical Methodology

Target: AES S-Box Operation

The attack targets the output of the first-round AES S-box operation. The S-box input for a byte $i$ is $\text{Plaintext}[i] \oplus \text{Key}[i]$. The output is:

$\textit{Sbox_Output}[i] =$ $\textit{Sbox} (\textit{Plaintext}[i] \oplus \textit{Key}[i])$

Predicting this 256-class output allows deduction of the key byte. We target the 3rd key byte (index 2).

Fig 1: Basic Steps of an AES Encryption Round

Attack Mechanics in Detail

The side-channel leakage occurs primarily in the first masked multiplier of the S-box operation, where XOR gates absorb different numbers of transitions for different data inputs. This creates distinctive power consumption patterns that correlate directly with the processed data values.

Our attack adopts a value-based leakage model, assuming the EM trace contains information correlated with the specific value (0-255) of the S-box output. Since this output depends on both the known plaintext and unknown key, predicting it allows us to deduce the key byte through a 256-class classification problem.

Key Rank Metric: Technical Details

The superiority of Key Rank over standard accuracy stems from the nature of side-channel attacks. With low signal-to-noise ratio, perfect classification of every trace is unrealistic. Instead, our goal is to distinguish the correct key from 255 incorrect hypotheses by aggregating subtle evidence across numerous traces.

For each key hypothesis $k_{guess}$ (0-255), we calculate:

$Score(k_{guess})\ =$ $\sum_{i=1}^{N} \log(P(label=Z\_hyp\_i | trace_i) + \varepsilon)$

Where $Z\_hyp\_i = Sbox(plaintext_i \oplus k_{guess})$ for each trace $i$, and $\varepsilon$ is a small constant to prevent $\log(0)$. The logarithm converts probability multiplications to additions, improving computational efficiency.

Feature Importance Analysis

Our feature selection approach using Random Forest's Gini importance showed that EM leakage is distributed across the trace but concentrated in specific time regions. By selecting only the top 100 features, we reduced the number of attack traces required by approximately 50% for ASCADf and 40% for ASCADv.

This dimensionality reduction mitigates overfitting and focuses on the most informative leakage points, significantly improving model efficiency while maintaining attack effectiveness.

Dataset & Preprocessing

Utilizes the public ASCAD 'fixed-key' (ASCADf: 50k training, 10k attack traces, 700 samples/trace) and 'variable-key' (ASCADv: 200k training, 100k attack traces, 1400 samples/trace) datasets. Raw EM traces are standardized (zero mean, unit variance) based on the profiling set.

Machine Learning Models

Random Forest(RF): Ensemble of decision trees $n\_estimators=100$, $max\_depth=20$, $min\_samples\_leaf=10$. Used for classification and Gini importance-based feature selection (top 100 features).
Support Vector Machine (SVM): Trained on reduced features with RBF kernel.
Convolutional Neural Network (CNN): Custom PyTorch CNN with 4 convolutional blocks (Conv1D, BatchNorm, ReLU, AvgPool1D) followed by dense layers. Inspired by existing SCA literature.

Fig 2: CNN Architecture for SCA

Evaluation: Key Rank

Primary metric is Key Rank. For N attack traces, it involves: 1. Obtaining model's probability distribution for S-box output for each trace. 2. For each key byte hypothesis (0-255), calculate hypothetical S-box outputs and sum log-probabilities from the model. 3. Rank key hypotheses by their total score. Rank 0 for the true key means successful recovery. This metric aggregates evidence across traces, effective even with low per-trace accuracy.

Fig 3: Example Key Rank Chart

Experimental Results & Outcomes

ASCADf=ASCAD fixed-key dataset, ASCADv=ASCAD variable-key dataset
Full Features=all 700 traces for ASCADf, 1400 for ASCADv, Reduced Features=top 100 features based on Gini importance

Model	Dataset	Feature Type	Attack Traces for Rank 0*
CNN	ASCADf	Full Features	~65 traces
CNN	ASCADv	Full Features	--
Random Forest	ASCADf	Reduced Features	~200 traces
Random Forest	ASCADf	Full Features	~492 traces
Random Forest	ASCADv	Full Features	~750 traces
Random Forest	ASCADv	Reduced Features	~470 traces
SVM	ASCADf	Reduced Features	~320 traces
SVM	ASCADv	Reduced Features	~320 traces

Note: The Key Rank metric is crucial for evaluating side-channel attacks, demonstrating that models with low per-trace classification accuracy can still recover the key when evidence is aggregated across multiple traces.

*The results are preliminary and may vary from the final paper.

Key Contributions

Some of the key contributions of this work are:

Practical Model Comparison: A thorough comparison of traditional ML models (RF, SVM) against deep learning approaches (CNN) on standardized datasets, demonstrating that simpler models with feature selection can achieve competitive results.
Feature Selection Impact: Quantitative evidence that RF-based feature selection can reduce attack trace requirements by 40-50%, offering a practical efficiency gain without the computational demands of deep learning.
Evaluation Metrics Insight: Clear demonstration that conventional classification metrics can be misleading for SCA evaluation, with models achieving under 2% accuracy still successfully recovering encryption keys.
Accessible Implementation: A proposed methodology that balances attack effectiveness with computational efficiency, making SCA more accessible for security research and evaluation.

References & Further Reading

This work builds upon existing research in side-channel analysis and machine learning. Key references include:

Obaid, Z.M., Ali Alheeti, K.M.: Enhancing malware detection through electromagnetic side-channel analysis using random forest classifier. Journal of Cybersecurity & Information Management 15(2) (2025)
Berreby, Y.E., Sauvage, L.: Investigating efficient deep learning architectures for side-channel attacks on aes. arXiv preprint arXiv:2309.13170 (2023)
Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. In: Advances in Cryptology—CRYPTO'99: 19th Annual International Cryptology Conference Santa Barbara, California, USA, August 15–19, 1999 Proceedings 19. pp. 388–397. Springer (1999)
Benadjila, R., Prouff, E., Strullu, R., Cagli, E., Dumas, C.: Deep learning for side-channel analysis and introduction to ascad database. Journal of Cryptographic Engineering 10(2), 163–188 (2020)
Huang, H., Wu, J., Tang, X., Zhao, S., Liu, Z., Yu, B.: Deep learning-based improved side-channel attacks using data denoising and feature fusion. PloS one 20(4), e0315340 (2025)
Picek, S., Heuser, A., Jovic, A., Bhasin, S., Regazzoni, F.: The curse of class imbalance and conflicting metrics with machine learning for side-channel evaluations. IACR Transactions on Cryptographic Hardware and Embedded Systems pp. 209–237 (2019)
Zaid, G., Bossuet, L., Habrard, A., Venelli, A.: Methodology for efficient cnn architectures in profiling attacks. IACR Transactions on Cryptographic Hardware and Embedded Systems pp. 1–36 (2020)

For a comprehensive results and methodologies please request a draft of the research paper.