Rhesis Changelog

All notable changes to the Rhesis project are documented in this file.

This is the aggregated changelog for the entire Rhesis repository. For detailed component-specific changes, please refer to:

[0.6.2] - 2026-01-29

Platform Release

This release includes the following component versions:

Backend 0.6.1
Frontend 0.6.2
SDK 0.6.2

Video Overview

Summary

This release introduces Garak LLM vulnerability scanner integration for comprehensive security testing of LLM applications, 3-level metrics hierarchy for flexible test execution configuration, and enhanced MCP integrations including Jira/Confluence, GitHub, and observability support. The platform receives significant performance optimizations and infrastructure improvements across all components.

Featured Capabilities

Garak LLM Vulnerability Scanner

The platform now integrates Garak , one of the most popular vulnerability scanners for LLM and agentic applications:

65+ Security Test Cases: Import Garak’s extensive probe library directly into Rhesis for testing prompt injection, jailbreaks, toxic outputs, data leakage, and other vulnerabilities
No Command-Line Required: Import test cases, execute them, and inspect results entirely through the UI—no terminal commands or JSON parsing needed
Team Collaboration: Review and discuss security test results with your team in one centralized platform
Redis-Cached Probe Enumeration: Efficient probe management with caching for improved performance
Detector Metrics: Built-in support for Garak detector metrics in the SDK
CI/CD Integration: Run security scans as part of your automated testing pipeline

3-Level Metrics Hierarchy

Enhanced flexibility in test execution with a new metrics hierarchy system:

Behavior-Level Metrics: Define default metrics at the behavior level
Test-Level Overrides: Override metrics for specific tests when needed
Execution-Time Configuration: Dynamically configure metrics at test execution time
Metric Source Selection: UI support for choosing which metric configuration to use
Improved Test Flexibility: Adapt evaluation criteria without modifying test definitions

Enhanced MCP Integrations

Expanded Model Context Protocol (MCP) support with new integrations:

Atlassian Stdio: Connect Jira and Confluence for issue tracking and documentation
GitHub Repository Retrieval: Access GitHub repositories and code context
Observability Integration: OpenTelemetry tracing with MCP support in SDK

Backend Highlights

Added Garak LLM vulnerability scanner integration with Redis caching for probe enumeration
Implemented 3-level metrics hierarchy (behavior → test → execution) allowing execution-time metric overrides
Added MCP Jira/Confluence (Atlassian Stdio), GitHub repository retrieval, and observability integrations
Upgraded FastAPI/Starlette and security dependencies for improved stability
Optimized Docker image with CPU-only PyTorch, reducing image size significantly

Frontend Highlights

Added 3-level metrics hierarchy UI for test execution with metric source selection
Integrated Garak LLM vulnerability scanner import UI for test sets
Added MCP Atlassian, GitHub, and observability support in the UI
Added context and expected response fields to test run detail view
Improved test execution configuration interface

SDK Highlights

Added Model entity with provider auto-resolution and Project entity with integration tests
Implemented batch processing framework and embedders support
Added Vertex AI support for embeddings and generation
Added Garak detector metric integration for security testing
Implemented MCP observability with OpenTelemetry tracing
Implemented continuous slow retry mode for connector resilience
Enhanced test execution with better error handling and recovery

[0.6.0] - 2026-01-15

Platform Release

This release includes the following component versions:

Backend 0.6.0
Frontend 0.6.0
SDK 0.6.0
Polyphemus 0.2.4

Video Overview

Summary

This major release introduces two significant features: Enhanced Tracing for comprehensive observability and a New Interactive Glossary with 50+ AI testing terms. The SDK tracing provides deep insights into your application under test, revealing its inner workings and execution flow with async support and smart serialization, while the glossary makes it easier for teams to understand and adopt AI testing best practices.

Important: As of version 0.6.0, all production services for the cloud version of Rhesis are now hosted in European data centers, ensuring data sovereignty and compliance with European data protection regulations.

Featured Capabilities

Enhanced Tracing

The SDK now includes comprehensive tracing capabilities that provide deep observability into your application under test (the target) and its inner workings:

Application Insights: Gain visibility into your target application’s execution flow, function calls, and internal state
Execution Flow Visualization: See how your application processes requests, including all intermediate steps and decision points
Async Support: Full support for asynchronous operations with proper trace propagation across async boundaries
Smart Serialization: Intelligent handling of complex objects and data structures in traces, making internal state visible
Improved I/O Display: Enhanced visualization of inputs and outputs at each step of your application’s execution
OpenTelemetry Integration: Basic telemetry support for standard observability tools
Advanced Filtering: Filter traces by status, duration, and custom attributes in the frontend
Dependency Injection: New bind parameter in endpoint decorators for cleaner code organization

New Interactive Glossary

The platform now includes a comprehensive glossary at docs.rhesis.ai/glossary with 50+ carefully curated terms covering:

Configuration Concepts: Organization, Project, Endpoint, Model, API Tokens
Testing Fundamentals: Single-turn tests, Multi-turn tests, Metrics, Behaviors, Test Sets
Advanced Testing: Penelope agent, RAG systems, Chain-of-Thought, Prompt Injection
Quality Metrics: Precision/Recall, F1 Score, Confusion Matrix, Confidence Scores
Development Tools: SDK integration, MCP protocol, Test Generation
Results Analysis: Baseline comparison, Regression testing, A/B testing

Each glossary entry includes:

Clear definitions and extended explanations
Practical examples and use cases
Code snippets demonstrating implementation
Related terms for deeper exploration
Links to relevant documentation sections

Backend Highlights

Infrastructure: All production services now hosted in European data centers for enhanced data sovereignty and GDPR compliance
Enhanced connectivity with MCP including GitHub support and multi-transport capabilities
Improved observability with comprehensive OpenTelemetry integration for tracing, visualization, and filtering
Added Chatbot Intent Recognition functionality
Streamlined organization onboarding with integrated test execution

Frontend Highlights

Enhanced SDK tracing with improved visualization and filtering in the UI
Improved MCP connection stability and added new GitHub MCP provider
Integrated test execution into organization onboarding process
Added interactive glossary page with search and filtering

SDK Highlights

Added dependency injection support with bind parameter in endpoint decorators
Enhanced SDK tracing with async support, smart serialization, and improved I/O display
Introduced OpenTelemetry integration for basic telemetry

Polyphemus Highlights

Added bind parameter to endpoint decorator for dependency injection
Introduced Bucket Model for improved data management

[0.5.4] - 2025-12-18

Platform Release

This release includes the following component versions:

Backend 0.5.4
Frontend 0.5.4
SDK 0.5.2
Polyphemus 0.2.3

Summary

This release adds a new Polyphemus provider with schema support across the platform. The documentation receives enhancements including comprehensive guides and improved SDK metrics documentation. SDK improvements focus on enhanced MCP error handling and improved metric creation capabilities with support for categories and threshold operators.

Backend Highlights

Added new Polyphemus provider with schema definition support
Enhanced provider configuration for Polyphemus integration

Frontend Highlights

Added new Polyphemus provider with schema support
Documentation improvements including enhanced guides and SDK metrics
Dependency updates for Next.js (16.0.7 → 16.0.10) and Nodemailer (6.10.1 → 7.0.11)

SDK Highlights

Added new Polyphemus provider with schema support
Improved MCP error handling and usability
Enhanced metric creation with support for categories and threshold operators
Improved generation prompts using research-backed Chain-of-Thought techniques
Hotfix for MCP compatibility issues with npx and bunx

[0.5.3] - 2025-12-11

Platform Release

This release includes the following component versions:

Backend 0.5.3
Frontend 0.5.3
Polyphemus 0.2.2

Summary

This release enhances MCP authentication and error handling, adds unique constraints to database fields, and improves multi-turn test support. The frontend receives UX improvements for trial drawer and manual test writer, while Polyphemus fixes rate limiting behavior for unauthenticated requests.

Backend Highlights

Improved MCP authentication and error handling, enhancing usability
Added unique constraint to nano_id columns in the database
Enhanced testing capabilities with endpoint connection testing and multi-turn test support
Notifications now separate execution status from test results in emails

Frontend Highlights

Improved trial drawer with multi-turn support and UX enhancements
Added multi-turn test support in manual test writer
Fixed authentication errors and improved UX for MCP (Model Comparison Platform)
Resolved cursor focus loss in title and description fields

Polyphemus Highlights

Fixed rate limiting to occur after authentication, preventing unintended rate limits on unauthenticated requests

[0.5.2] - 2025-12-08

Platform Release

This release includes the following component versions:

Backend 0.5.2
Frontend 0.5.2

Summary

This release adds a Test Connection Tool for easier configuration and troubleshooting, removes permission restrictions from entity routes, and addresses security vulnerabilities. The frontend includes updates for React and Next.js to address CVE-2025-55182, plus support for tags in test runs and enhanced metrics capabilities.

Backend Highlights

Added a Test Connection Tool for easier configuration and troubleshooting
Removed permission restrictions from entity routes
Fixed: Activities API response now returns valid fields

Frontend Highlights

Security: Updated React and Next.js to address CVE-2025-55182
Test Runs: Added support for tags
Metrics: Added support for categories and threshold operators
Added Test Connection Tool

[0.5.1] - 2025-12-04

Platform Release

This release includes the following component versions:

Backend 0.5.1
Frontend 0.5.1
Polyphemus 0.2.1

Summary

This release modernizes the dashboard with MUI X charts and activity timeline, improves connector output mapping with message field support, and adds OpenRouter provider support. The frontend receives grid state persistence and improved MCP dialogs, while Polyphemus adds user verification status.

Backend Highlights

Modernized dashboard with MUI X charts and activity timeline
Improved connector output mapping with message field support
Added support for OpenRouter provider
Increased exporter timeout from 10 to 30 seconds

Frontend Highlights

Modernized dashboard with MUI X charts and activity timeline
Improved MCP import and tool selector dialogs
Added grid state persistence to localStorage
Backend execution RPC fixes and UI improvements

Polyphemus Highlights

Added “Is Verified” status to user profiles
Users can now be marked as verified

[0.5.0] - 2025-11-27

Platform Release

This release includes the following component versions:

Backend 0.5.0
Frontend 0.5.0
SDK 0.5.0

Summary

This release introduces comprehensive multi-turn test support via Penelope execution agent, bidirectional SDK connector with intelligent auto-mapping, and enhanced behavior-metrics management. MCP server integration with Notion improves context for test generation, while the new interactive onboarding system guides users through initial setup.

Featured Capabilities

Multi-Turn Testing with Penelope

Watch Penelope autonomously execute complex multi-turn conversations against your LLM applications:

SDK Connector Integration

See how the @endpoint decorator integrates your LLM application in under a minute:

Behaviors and Metrics Structure

Learn how the two-layer testing structure enables whole-team collaboration on quality:

MCP Integration with Notion

Connect your Notion workspace directly to your testing pipeline:

Backend Highlights

Added comprehensive multi-turn test support including creation, listing, execution, and preview generation
Implemented bidirectional SDK connector with intelligent auto-mapping
Added Tool Source Type for MCP server integration
Added in-place test execution without worker infrastructure
Enhanced synthesizers for improved test generation
Added database persistence for onboarding progress
Refactored Base Entity for improved maintainability
Updated MCP Tool Database for enhanced tool management
Added endpoint to list available models for providers

Frontend Highlights

Implemented interactive onboarding tour system
Added behaviors page with refactored metrics UI
Implemented multi-turn conversation preview in test generation flow
Added bidirectional SDK connector with intelligent auto-mapping
Implemented Tool Configuration Frontend
Redesigned test results page with improved filters and client-side search
Redesigned knowledge detail page for design system consistency
Reorganized navigation with sections and external links
Upgraded to Next.js 16 and MUI v7
Added models list for providers

SDK Highlights

Added bidirectional SDK connector with intelligent auto-mapping
Added comprehensive multi-turn test support
Added Google Cloud integration for Polyphemus
Added functionality to list available models for providers
Improved synthesizers functionality
Refactored base entity structure
Updated MCP Tool Database functionality

[0.4.3] - 2025-11-17

Platform Release

This release includes the following component versions:

Backend 0.4.3
Frontend 0.4.3
SDK 0.4.2

Summary

This release focuses on improving multi-turn conversation handling with centralized conversation tracking and fixes critical deployment issues in the frontend Docker image.

Backend Highlights

Added centralized conversation tracking for improved multi-turn conversation handling
Enhanced conversation state management across test executions

Frontend Highlights

Fixed Docker image build failure during local deployment
Resolved file ownership issue (chown bug) affecting containerized deployments

SDK Highlights

Added support for custom HTTP headers in API requests
Improved error handling for network requests with more descriptive error messages
Updated internal retry mechanism for failed API calls
Fixed date parsing issues in certain locales
Resolved bug causing occasional crashes when handling large data responses

[0.4.2] - 2025-11-13

Platform Release

This release includes the following component versions:

Backend 0.4.2
Frontend 0.4.2
SDK 0.4.1

Summary

This release makes it easier than ever to get started with Rhesis through zero-configuration Docker Compose setup. Spin up the entire platform with a single command! The release also introduces multi-turn test support with conversational metrics, enhanced MCP integration with Notion, and improved local development experience.

Backend Highlights

Added support for multi-turn tests with configuration, execution, and conversational metrics
Improved local development setup with zero-configuration Docker Compose
Introduced generic MCP integration endpoints and user model configuration
Added scenarios, tags, and comments infrastructure for sources
Enhanced command-line interface for easier platform management

Frontend Highlights

Implemented multi-turn test support with configuration UI and goal display
Enhanced test set management with test type display and filtering
Improved local development with Docker Compose and auto-login feature
Integrated conversational metrics for multi-turn test evaluation

SDK Highlights

Added Langchain integration and Penelope language model support
Introduced Conversational Metrics with Goal Achievement Judge and DeepEval integration
Enhanced MCP Agent with autonomous ReAct loop and improved error handling
Added structured output support for tool calling via Pydantic schemas
Improved VertexAI provider reliability

[0.4.1] - 2025-10-30

Platform Release

This release includes the following component versions:

Backend 0.4.1
Frontend 0.4.1
SDK 0.4.0

Summary

This release introduces comprehensive OpenTelemetry telemetry, enhanced test generation with iteration context, and improved source tracking. The platform now uses “Sources” terminology throughout, replacing “Documents” for consistency. Key improvements include soft deletion with cascade-aware restoration, API key authentication with rate limiting, and enhanced metrics integration with Ragas and DeepEval.

Backend Highlights

Added comprehensive OpenTelemetry telemetry system for monitoring and analytics
Enhanced test generation with iteration context support and source ID tracking
Integrated SDK metrics with simplified evaluation and database migration
Implemented cascade-aware restoration for soft-deleted entities
Added API key authentication with user-based rate limiting

Frontend Highlights

Replaced “Documents” terminology with “Sources” throughout the application
Enhanced test generation UI with improved backend support and source context display
Implemented OpenTelemetry for enhanced monitoring
Added support for additional file formats (.pptx, .xlsx, .html, .htm, .zip)
Improved test results display with error status icons and execution time for failed runs

SDK Highlights

Added Cohere and Vertex AI LLM providers with Ollama integration
Enhanced AI-based test generation with iteration context support
Improved metrics integration with Ragas and DeepEval (updated to v3.6.7)
Added support for both plain and OpenAI-wrapped JSON schemas
Refactored metrics for improved organization and maintainability

[0.4.0] - 2025-10-16

Platform Release

This release includes the following component versions:

Backend 0.4.0
Frontend 0.4.0
SDK 0.3.1

Summary

This release focuses on user-defined LLM providers, enhanced source handling, and soft delete functionality. The Knowledge section receives significant improvements with dynamic source types and hybrid storage. User settings management is centralized, and the recycle bin provides comprehensive soft-deleted item recovery.

Backend Highlights

Added support for user-defined LLM providers and model configuration
Implemented soft delete functionality with recycle bin management
Enhanced source handling with dynamic source types and hybrid cloud/local storage
Added user settings API endpoints for managing default models
Implemented encryption for sensitive data in database fields

Frontend Highlights

Enhanced Knowledge section with source upload, preview, and OData filtering
Redesigned Test Runs detail page with modern dashboard interface
Improved Models management with edit modal and connection testing
Added advanced filtering for test results
Standardized UI consistency using theme values

SDK Highlights

Added support for user-defined LLM provider generation and execution
Enhanced DocumentExtractor with BytesIO support
Added model parameter support to synthesizer factory
Updated ParaphrasingSynthesizer for improved LLM selection

[0.3.0] - 2025-10-02

Platform Release

This release includes the following component versions:

Backend 0.3.0
Frontend 0.3.0
SDK 0.3.0

Summary

This release introduces persistent storage for documents, robust organization-level data isolation, and comprehensive task management with email notifications. The frontend receives a complete rebranding with the new Rhesis AI visual identity.

Backend Highlights

Added persistent storage for documents with new StorageService
Implemented robust organization-level data isolation and access control
Enhanced comment and task management with email notifications
Introduced new endpoint for generating test configurations
Fixed critical cross-tenant data access vulnerabilities

Frontend Highlights

Complete rebranding: New Rhesis AI brand identity with updated color palette and logos
Implemented comprehensive frontend testing infrastructure
Enhanced task management with editable titles and improved UI consistency
Improved UI/UX across dashboards, metrics pages, and data grids
Added pre-commit hooks for code quality

SDK Highlights

Added functionality to push and pull metrics (categorical and numeric)
Introduced configuration options for metrics with enum support
Refactored metric classes for improved structure and reusability
Added metrics endpoint to SDK client

[0.2.4] - 2025-09-18

Platform Release

This release includes the following component versions:

Backend 0.2.4
Frontend 0.2.4
SDK 0.2.4

Summary

This release introduces comprehensive task management functionality and integrates DocumentSynthesizer for automated document-based test generation. Enhanced metadata tracking and email notifications improve collaboration workflows.

Backend Highlights

Added task management with statuses, priorities, assignments, and email notifications
Integrated DocumentSynthesizer for automated document-based test generation
Enhanced test set attributes with document sources and metadata tracking
Improved database session handling and route refactoring

Frontend Highlights

Added “Source Documents” section to test detail and Test Set Details pages
Test sets now display document name and description
Project updates work without requiring page reload
Added send button to comment text box

SDK Highlights

Rewritten benchmarking framework with improved model handling
Introduced Document dataclass and DocumentSynthesizer for text extraction
Added new LLM providers including Ollama
Refactored metrics and moved them from backend to SDK

[0.2.3] - 2025-09-04

Platform Release

This release includes the following component versions:

Backend 0.2.3
Frontend 0.2.3
SDK 0.2.3

Summary

This release adds collaboration features with comments support, introduces test run statistics, and enhances LLM service integration with schema support.

Backend Highlights

Added test run stats endpoint with performance improvements
Implemented comment support with CRUD operations and emoji reactions
Introduced LLM service integration with schema support
Improved environment variable handling for deployment flexibility

Frontend Highlights

Added comments feature for collaboration on tests, test sets, and test runs
Improved metrics creation and editing workflow with visual feedback
Enhanced test run details with dynamic charts
Fixed tooltip visibility issues and improved datagrid performance

SDK Highlights

Renamed and reorganized LLM provider components for clarity
Added support for JSON schemas in LLM requests for structured responses
Introduced API key handling for LLM providers
Updated linting process to use uvx

[0.2.2] - 2025-08-22

Platform Release

This release includes the following component versions:

Backend 0.2.2
Frontend 0.2.2
SDK 0.2.2

Summary

This release adds document content extraction, enhances Docker configuration, and improves security with Redis authentication. Support for additional document formats (.docx, .pptx, .xlsx) is introduced.

Backend Highlights

Added document content extraction endpoint
Added document support to test set generation endpoint
Implemented Redis authentication for enhanced security
Improved Docker configuration and startup scripts
Added unit tests for backend components

Frontend Highlights

Improved document upload experience with automatic metadata generation
Enhanced project creation and management
Refactored form validation and UI elements
Updated Docker configuration for production mode

SDK Highlights

Migrated document extraction from docling to markitdown
Added support for docx, pptx, and xlsx formats
Improved code style with automated linting and formatting
Removed support for .url and .youtube file extensions

[0.2.1] - 2025-08-08

Platform Release

This release includes the following component versions:

Backend 0.2.1
Frontend 0.2.1
SDK 0.2.1
Polyphemus 0.1.0

Summary

This release introduces Test Results functionality and document upload capabilities. Polyphemus, the LLM inference and benchmarking service, makes its initial release.

Backend Highlights

Added support for filtering test sets related to runs
Added document upload functionality via /documents/upload endpoint
Enhanced test generation with optional documents parameter
Added test result statistics support and “last login” functionality

Frontend Highlights

Introduced Test Results functionality for viewing and analyzing outcomes
Added interfaces for handling test results statistics
Fixed infinite loading issues for test sets

SDK Highlights

Added get_field_names_from_schema method to BaseEntity class
Updated default base URL for API endpoint
Improved documentation

Polyphemus Highlights

Initial release of LLM inference and benchmarking service
FastAPI-based REST API with Dolphin 3.0 Llama 3.1 8B model support
Modular benchmarking suite and OWASP-based security test sets

[0.2.0] - 2025-07-25

Platform Release

This release includes the following component versions:

Backend 0.2.0
Frontend 0.2.0
SDK 0.2.0

Summary

This release enhances team collaboration with improved invitation security, implements email notifications for test completion, and introduces sequential test execution with Redis-based task orchestration.

Backend Highlights

Enhanced team invitation with improved security, validation, and rate limiting
Implemented email-based notification system for test execution results
Improved test execution framework with sequential execution and Redis orchestration
Fixed issues related to OData filtering, JWT expiration, and score calculation

Frontend Highlights

Added version information display
Introduced new team invitation flow with enhanced security and validation
Improved session management with server logout upon expiration
Numerous bug fixes and UI improvements across components

SDK Highlights

Added support for .txt files to DocumentExtractor
Introduced documents parameter to PromptSynthesizer
Added functionality for custom behaviors informed by prompts

[0.1.0] - 2025-05-15

Platform Release

First release of the Rhesis main repository, including all components. Note that the SDK was previously developed separately and is now at version 0.1.8 internally, but is included in this repository-wide v0.1.0 release.

Backend

Core API for test management
Database models and schemas with SQLAlchemy
Authentication system with JWT and Auth0
CRUD operations for main entities
API documentation with Swagger/OpenAPI
PostgreSQL integration with row-level security
Error handling and logging

Frontend

Next.js 15 with App Router
Material UI v6 component library
Authentication with NextAuth.js
Protected routes and middleware
Dashboard and test management interface
Test visualization and monitoring
Dark/light theme support
Responsive design

SDK

Test set management and generation capabilities
Prompt synthesizers for test case generation
Paraphrasing capabilities
LLM service integration
CLI scaffolding
Documentation with Sphinx

Infrastructure

Docker containerization for all services
CI/CD pipeline setup
Development environment configuration
Repository structure for monorepo management

Note

The SDK was previously developed and released (up to v0.1.8) in a separate repository
After this initial release, each component follows its own versioning lifecycle
Component-specific tags use the format: <component>-vX.Y.Z