fig3
Figure 3. Proposed feature extractors for both modalities. (A) CNN4 architecture for malware image encoding, featuring 32 → 64 → 128 channel progression with residual connections and adaptive pooling; (B) Simple Transformer architecture for API sequence encoding, with two hidden layers and two attention heads. CNN: Convolutional neural network; API: application programming interface.







