Evaluating the Robustness of Large Language Models to Abuse in the Swiss Cyber-Defense Landscape
Evaluating the Robustness of Large Language Models to Abuse in the Swiss Cyber-Defense Landscape
This project aims at laying the preliminary groundwork to enable the Swiss cyber-defense ecosystem to prepare for the large-scale deployment of LLMs and new attack surface such a deployment would entail, contributing to the CYD Campus missions of developing the means to counter novel cyber threats and training partners responsible for defense in cyber-space within the Confederation. To achieve this goal, we plan to leverage (i) manual and automated LLM red-teaming, (ii) domain-specific knowledge screening, (iii) generation bias evaluation (iv) self-censored generation to evaluate the robustness of the current generation of LLMs to misuse in the cyber-offensive operations against targets within Switzerland.