README.md
LLM Powered Git-Scan is an experimental security-focused CLI tool developed to analyse Git commit diffs for potentially sensitive data such as API keys, credentials, database connection details, and other secrets that should not be committed to a repository.
The tool was developed independently in Python as part of a JetBrains internship assignment, where the goal was to explore how Large Language Models could be integrated into a cybersecurity automation workflow. It supports both local Git repositories and public GitHub repositories, with remote repositories temporarily cloned when needed before analysis.
Instead of scanning an entire repository, the tool focuses specifically on the diffs from a configurable number of recent commits. Added and removed lines are extracted together with commit and file context, then processed in batches and analysed through the OpenAI API. Potential findings are written to a structured JSON report containing the commit, file, modified line, change type, detection result, confidence score, and explanation.
The project also includes CLI arguments, repository validation, Python version checks, automatic package checking/installation, progress indicators, temporary repository handling, and configurable output generation. These features were added to make the tool more usable as a standalone command-line utility.
Although the concept proved technically feasible during development, the project also exposed an important limitation of using generative AI in deterministic tooling. The tool depended on consistent JSON responses from the LLM, but changes and variation in response formatting later caused reliability issues in the parsing step. Because of this, the project is now mainly presented as an experimental prototype and learning project rather than a production-ready security scanner.
Built with Python, GitPython, OpenAI API, dotenv, tqdm, CLI tooling, Git diff analysis, and JSON reporting.