Skip to Content
DevelopmentChangelog

Rhesis Changelog

All notable changes to the Rhesis project are documented in this file.

This is the aggregated changelog for the entire Rhesis repository. For detailed component-specific changes, please refer to:

[0.6.2] - 2026-01-29

Platform Release

This release includes the following component versions:

  • Backend 0.6.1
  • Frontend 0.6.2
  • SDK 0.6.2

Video Overview

Summary

This release introduces Garak LLM vulnerability scanner integration for comprehensive security testing of LLM applications, 3-level metrics hierarchy for flexible test execution configuration, and enhanced MCP integrations including Jira/Confluence, GitHub, and observability support. The platform receives significant performance optimizations and infrastructure improvements across all components.

Garak LLM Vulnerability Scanner

The platform now integrates Garak , one of the most popular vulnerability scanners for LLM and agentic applications:

  • 65+ Security Test Cases: Import Garak’s extensive probe library directly into Rhesis for testing prompt injection, jailbreaks, toxic outputs, data leakage, and other vulnerabilities
  • No Command-Line Required: Import test cases, execute them, and inspect results entirely through the UI—no terminal commands or JSON parsing needed
  • Team Collaboration: Review and discuss security test results with your team in one centralized platform
  • Redis-Cached Probe Enumeration: Efficient probe management with caching for improved performance
  • Detector Metrics: Built-in support for Garak detector metrics in the SDK
  • CI/CD Integration: Run security scans as part of your automated testing pipeline

3-Level Metrics Hierarchy

Enhanced flexibility in test execution with a new metrics hierarchy system:

  • Behavior-Level Metrics: Define default metrics at the behavior level
  • Test-Level Overrides: Override metrics for specific tests when needed
  • Execution-Time Configuration: Dynamically configure metrics at test execution time
  • Metric Source Selection: UI support for choosing which metric configuration to use
  • Improved Test Flexibility: Adapt evaluation criteria without modifying test definitions

Enhanced MCP Integrations

Expanded Model Context Protocol (MCP) support with new integrations:

  • Atlassian Stdio: Connect Jira and Confluence for issue tracking and documentation
  • GitHub Repository Retrieval: Access GitHub repositories and code context
  • Observability Integration: OpenTelemetry tracing with MCP support in SDK

Backend Highlights

  • Added Garak LLM vulnerability scanner integration with Redis caching for probe enumeration
  • Implemented 3-level metrics hierarchy (behavior → test → execution) allowing execution-time metric overrides
  • Added MCP Jira/Confluence (Atlassian Stdio), GitHub repository retrieval, and observability integrations
  • Upgraded FastAPI/Starlette and security dependencies for improved stability
  • Optimized Docker image with CPU-only PyTorch, reducing image size significantly

Frontend Highlights

  • Added 3-level metrics hierarchy UI for test execution with metric source selection
  • Integrated Garak LLM vulnerability scanner import UI for test sets
  • Added MCP Atlassian, GitHub, and observability support in the UI
  • Added context and expected response fields to test run detail view
  • Improved test execution configuration interface

SDK Highlights

  • Added Model entity with provider auto-resolution and Project entity with integration tests
  • Implemented batch processing framework and embedders support
  • Added Vertex AI support for embeddings and generation
  • Added Garak detector metric integration for security testing
  • Implemented MCP observability with OpenTelemetry tracing
  • Implemented continuous slow retry mode for connector resilience
  • Enhanced test execution with better error handling and recovery

[0.6.0] - 2026-01-15

Platform Release

This release includes the following component versions:

  • Backend 0.6.0
  • Frontend 0.6.0
  • SDK 0.6.0
  • Polyphemus 0.2.4

Video Overview

Summary

This major release introduces two significant features: Enhanced Tracing for comprehensive observability and a New Interactive Glossary with 50+ AI testing terms. The SDK tracing provides deep insights into your application under test, revealing its inner workings and execution flow with async support and smart serialization, while the glossary makes it easier for teams to understand and adopt AI testing best practices.

Important: As of version 0.6.0, all production services for the cloud version of Rhesis are now hosted in European data centers, ensuring data sovereignty and compliance with European data protection regulations.

Enhanced Tracing

The SDK now includes comprehensive tracing capabilities that provide deep observability into your application under test (the target) and its inner workings:

  • Application Insights: Gain visibility into your target application’s execution flow, function calls, and internal state
  • Execution Flow Visualization: See how your application processes requests, including all intermediate steps and decision points
  • Async Support: Full support for asynchronous operations with proper trace propagation across async boundaries
  • Smart Serialization: Intelligent handling of complex objects and data structures in traces, making internal state visible
  • Improved I/O Display: Enhanced visualization of inputs and outputs at each step of your application’s execution
  • OpenTelemetry Integration: Basic telemetry support for standard observability tools
  • Advanced Filtering: Filter traces by status, duration, and custom attributes in the frontend
  • Dependency Injection: New bind parameter in endpoint decorators for cleaner code organization

New Interactive Glossary

The platform now includes a comprehensive glossary at docs.rhesis.ai/glossary  with 50+ carefully curated terms covering:

  • Configuration Concepts: Organization, Project, Endpoint, Model, API Tokens
  • Testing Fundamentals: Single-turn tests, Multi-turn tests, Metrics, Behaviors, Test Sets
  • Advanced Testing: Penelope agent, RAG systems, Chain-of-Thought, Prompt Injection
  • Quality Metrics: Precision/Recall, F1 Score, Confusion Matrix, Confidence Scores
  • Development Tools: SDK integration, MCP protocol, Test Generation
  • Results Analysis: Baseline comparison, Regression testing, A/B testing

Each glossary entry includes:

  • Clear definitions and extended explanations
  • Practical examples and use cases
  • Code snippets demonstrating implementation
  • Related terms for deeper exploration
  • Links to relevant documentation sections

Backend Highlights

  • Infrastructure: All production services now hosted in European data centers for enhanced data sovereignty and GDPR compliance
  • Enhanced connectivity with MCP including GitHub support and multi-transport capabilities
  • Improved observability with comprehensive OpenTelemetry integration for tracing, visualization, and filtering
  • Added Chatbot Intent Recognition functionality
  • Streamlined organization onboarding with integrated test execution

Frontend Highlights

  • Enhanced SDK tracing with improved visualization and filtering in the UI
  • Improved MCP connection stability and added new GitHub MCP provider
  • Integrated test execution into organization onboarding process
  • Added interactive glossary page with search and filtering

SDK Highlights

  • Added dependency injection support with bind parameter in endpoint decorators
  • Enhanced SDK tracing with async support, smart serialization, and improved I/O display
  • Introduced OpenTelemetry integration for basic telemetry

Polyphemus Highlights

  • Added bind parameter to endpoint decorator for dependency injection
  • Introduced Bucket Model for improved data management

[0.5.4] - 2025-12-18

Platform Release

This release includes the following component versions:

  • Backend 0.5.4
  • Frontend 0.5.4
  • SDK 0.5.2
  • Polyphemus 0.2.3

Summary

This release adds a new Polyphemus provider with schema support across the platform. The documentation receives enhancements including comprehensive guides and improved SDK metrics documentation. SDK improvements focus on enhanced MCP error handling and improved metric creation capabilities with support for categories and threshold operators.

Backend Highlights

  • Added new Polyphemus provider with schema definition support
  • Enhanced provider configuration for Polyphemus integration

Frontend Highlights

  • Added new Polyphemus provider with schema support
  • Documentation improvements including enhanced guides and SDK metrics
  • Dependency updates for Next.js (16.0.7 → 16.0.10) and Nodemailer (6.10.1 → 7.0.11)

SDK Highlights

  • Added new Polyphemus provider with schema support
  • Improved MCP error handling and usability
  • Enhanced metric creation with support for categories and threshold operators
  • Improved generation prompts using research-backed Chain-of-Thought techniques
  • Hotfix for MCP compatibility issues with npx and bunx

[0.5.3] - 2025-12-11

Platform Release

This release includes the following component versions:

  • Backend 0.5.3
  • Frontend 0.5.3
  • Polyphemus 0.2.2

Summary

This release enhances MCP authentication and error handling, adds unique constraints to database fields, and improves multi-turn test support. The frontend receives UX improvements for trial drawer and manual test writer, while Polyphemus fixes rate limiting behavior for unauthenticated requests.

Backend Highlights

  • Improved MCP authentication and error handling, enhancing usability
  • Added unique constraint to nano_id columns in the database
  • Enhanced testing capabilities with endpoint connection testing and multi-turn test support
  • Notifications now separate execution status from test results in emails

Frontend Highlights

  • Improved trial drawer with multi-turn support and UX enhancements
  • Added multi-turn test support in manual test writer
  • Fixed authentication errors and improved UX for MCP (Model Comparison Platform)
  • Resolved cursor focus loss in title and description fields

Polyphemus Highlights

  • Fixed rate limiting to occur after authentication, preventing unintended rate limits on unauthenticated requests

[0.5.2] - 2025-12-08

Platform Release

This release includes the following component versions:

  • Backend 0.5.2
  • Frontend 0.5.2

Summary

This release adds a Test Connection Tool for easier configuration and troubleshooting, removes permission restrictions from entity routes, and addresses security vulnerabilities. The frontend includes updates for React and Next.js to address CVE-2025-55182, plus support for tags in test runs and enhanced metrics capabilities.

Backend Highlights

  • Added a Test Connection Tool for easier configuration and troubleshooting
  • Removed permission restrictions from entity routes
  • Fixed: Activities API response now returns valid fields

Frontend Highlights

  • Security: Updated React and Next.js to address CVE-2025-55182
  • Test Runs: Added support for tags
  • Metrics: Added support for categories and threshold operators
  • Added Test Connection Tool

[0.5.1] - 2025-12-04

Platform Release

This release includes the following component versions:

  • Backend 0.5.1
  • Frontend 0.5.1
  • Polyphemus 0.2.1

Summary

This release modernizes the dashboard with MUI X charts and activity timeline, improves connector output mapping with message field support, and adds OpenRouter provider support. The frontend receives grid state persistence and improved MCP dialogs, while Polyphemus adds user verification status.

Backend Highlights

  • Modernized dashboard with MUI X charts and activity timeline
  • Improved connector output mapping with message field support
  • Added support for OpenRouter provider
  • Increased exporter timeout from 10 to 30 seconds

Frontend Highlights

  • Modernized dashboard with MUI X charts and activity timeline
  • Improved MCP import and tool selector dialogs
  • Added grid state persistence to localStorage
  • Backend execution RPC fixes and UI improvements

Polyphemus Highlights

  • Added “Is Verified” status to user profiles
  • Users can now be marked as verified

[0.5.0] - 2025-11-27

Platform Release

This release includes the following component versions:

  • Backend 0.5.0
  • Frontend 0.5.0
  • SDK 0.5.0

Summary

This release introduces comprehensive multi-turn test support via Penelope execution agent, bidirectional SDK connector with intelligent auto-mapping, and enhanced behavior-metrics management. MCP server integration with Notion improves context for test generation, while the new interactive onboarding system guides users through initial setup.

Multi-Turn Testing with Penelope

Watch Penelope autonomously execute complex multi-turn conversations against your LLM applications:

SDK Connector Integration

See how the @endpoint decorator integrates your LLM application in under a minute:

Behaviors and Metrics Structure

Learn how the two-layer testing structure enables whole-team collaboration on quality:

MCP Integration with Notion

Connect your Notion workspace directly to your testing pipeline:

Backend Highlights

  • Added comprehensive multi-turn test support including creation, listing, execution, and preview generation
  • Implemented bidirectional SDK connector with intelligent auto-mapping
  • Added Tool Source Type for MCP server integration
  • Added in-place test execution without worker infrastructure
  • Enhanced synthesizers for improved test generation
  • Added database persistence for onboarding progress
  • Refactored Base Entity for improved maintainability
  • Updated MCP Tool Database for enhanced tool management
  • Added endpoint to list available models for providers

Frontend Highlights

  • Implemented interactive onboarding tour system
  • Added behaviors page with refactored metrics UI
  • Implemented multi-turn conversation preview in test generation flow
  • Added bidirectional SDK connector with intelligent auto-mapping
  • Implemented Tool Configuration Frontend
  • Redesigned test results page with improved filters and client-side search
  • Redesigned knowledge detail page for design system consistency
  • Reorganized navigation with sections and external links
  • Upgraded to Next.js 16 and MUI v7
  • Added models list for providers

SDK Highlights

  • Added bidirectional SDK connector with intelligent auto-mapping
  • Added comprehensive multi-turn test support
  • Added Google Cloud integration for Polyphemus
  • Added functionality to list available models for providers
  • Improved synthesizers functionality
  • Refactored base entity structure
  • Updated MCP Tool Database functionality

[0.4.3] - 2025-11-17

Platform Release

This release includes the following component versions:

  • Backend 0.4.3
  • Frontend 0.4.3
  • SDK 0.4.2

Summary

This release focuses on improving multi-turn conversation handling with centralized conversation tracking and fixes critical deployment issues in the frontend Docker image.

Backend Highlights

  • Added centralized conversation tracking for improved multi-turn conversation handling
  • Enhanced conversation state management across test executions

Frontend Highlights

  • Fixed Docker image build failure during local deployment
  • Resolved file ownership issue (chown bug) affecting containerized deployments

SDK Highlights

  • Added support for custom HTTP headers in API requests
  • Improved error handling for network requests with more descriptive error messages
  • Updated internal retry mechanism for failed API calls
  • Fixed date parsing issues in certain locales
  • Resolved bug causing occasional crashes when handling large data responses

[0.4.2] - 2025-11-13

Platform Release

This release includes the following component versions:

  • Backend 0.4.2
  • Frontend 0.4.2
  • SDK 0.4.1

Summary

This release makes it easier than ever to get started with Rhesis through zero-configuration Docker Compose setup. Spin up the entire platform with a single command! The release also introduces multi-turn test support with conversational metrics, enhanced MCP integration with Notion, and improved local development experience.

Backend Highlights

  • Added support for multi-turn tests with configuration, execution, and conversational metrics
  • Improved local development setup with zero-configuration Docker Compose
  • Introduced generic MCP integration endpoints and user model configuration
  • Added scenarios, tags, and comments infrastructure for sources
  • Enhanced command-line interface for easier platform management

Frontend Highlights

  • Implemented multi-turn test support with configuration UI and goal display
  • Enhanced test set management with test type display and filtering
  • Improved local development with Docker Compose and auto-login feature
  • Integrated conversational metrics for multi-turn test evaluation

SDK Highlights

  • Added Langchain integration and Penelope language model support
  • Introduced Conversational Metrics with Goal Achievement Judge and DeepEval integration
  • Enhanced MCP Agent with autonomous ReAct loop and improved error handling
  • Added structured output support for tool calling via Pydantic schemas
  • Improved VertexAI provider reliability

[0.4.1] - 2025-10-30

Platform Release

This release includes the following component versions:

  • Backend 0.4.1
  • Frontend 0.4.1
  • SDK 0.4.0

Summary

This release introduces comprehensive OpenTelemetry telemetry, enhanced test generation with iteration context, and improved source tracking. The platform now uses “Sources” terminology throughout, replacing “Documents” for consistency. Key improvements include soft deletion with cascade-aware restoration, API key authentication with rate limiting, and enhanced metrics integration with Ragas and DeepEval.

Release 0.4.1 Overview

Backend Highlights

  • Added comprehensive OpenTelemetry telemetry system for monitoring and analytics
  • Enhanced test generation with iteration context support and source ID tracking
  • Integrated SDK metrics with simplified evaluation and database migration
  • Implemented cascade-aware restoration for soft-deleted entities
  • Added API key authentication with user-based rate limiting

Frontend Highlights

  • Replaced “Documents” terminology with “Sources” throughout the application
  • Enhanced test generation UI with improved backend support and source context display
  • Implemented OpenTelemetry for enhanced monitoring
  • Added support for additional file formats (.pptx, .xlsx, .html, .htm, .zip)
  • Improved test results display with error status icons and execution time for failed runs

SDK Highlights

  • Added Cohere and Vertex AI LLM providers with Ollama integration
  • Enhanced AI-based test generation with iteration context support
  • Improved metrics integration with Ragas and DeepEval (updated to v3.6.7)
  • Added support for both plain and OpenAI-wrapped JSON schemas
  • Refactored metrics for improved organization and maintainability

[0.4.0] - 2025-10-16

Platform Release

This release includes the following component versions:

  • Backend 0.4.0
  • Frontend 0.4.0
  • SDK 0.3.1

Summary

This release focuses on user-defined LLM providers, enhanced source handling, and soft delete functionality. The Knowledge section receives significant improvements with dynamic source types and hybrid storage. User settings management is centralized, and the recycle bin provides comprehensive soft-deleted item recovery.

Backend Highlights

  • Added support for user-defined LLM providers and model configuration
  • Implemented soft delete functionality with recycle bin management
  • Enhanced source handling with dynamic source types and hybrid cloud/local storage
  • Added user settings API endpoints for managing default models
  • Implemented encryption for sensitive data in database fields

Frontend Highlights

  • Enhanced Knowledge section with source upload, preview, and OData filtering
  • Redesigned Test Runs detail page with modern dashboard interface
  • Improved Models management with edit modal and connection testing
  • Added advanced filtering for test results
  • Standardized UI consistency using theme values

SDK Highlights

  • Added support for user-defined LLM provider generation and execution
  • Enhanced DocumentExtractor with BytesIO support
  • Added model parameter support to synthesizer factory
  • Updated ParaphrasingSynthesizer for improved LLM selection

[0.3.0] - 2025-10-02

Platform Release

This release includes the following component versions:

  • Backend 0.3.0
  • Frontend 0.3.0
  • SDK 0.3.0

Summary

This release introduces persistent storage for documents, robust organization-level data isolation, and comprehensive task management with email notifications. The frontend receives a complete rebranding with the new Rhesis AI visual identity.

Backend Highlights

  • Added persistent storage for documents with new StorageService
  • Implemented robust organization-level data isolation and access control
  • Enhanced comment and task management with email notifications
  • Introduced new endpoint for generating test configurations
  • Fixed critical cross-tenant data access vulnerabilities

Frontend Highlights

  • Complete rebranding: New Rhesis AI brand identity with updated color palette and logos
  • Implemented comprehensive frontend testing infrastructure
  • Enhanced task management with editable titles and improved UI consistency
  • Improved UI/UX across dashboards, metrics pages, and data grids
  • Added pre-commit hooks for code quality

SDK Highlights

  • Added functionality to push and pull metrics (categorical and numeric)
  • Introduced configuration options for metrics with enum support
  • Refactored metric classes for improved structure and reusability
  • Added metrics endpoint to SDK client

[0.2.4] - 2025-09-18

Platform Release

This release includes the following component versions:

  • Backend 0.2.4
  • Frontend 0.2.4
  • SDK 0.2.4

Summary

This release introduces comprehensive task management functionality and integrates DocumentSynthesizer for automated document-based test generation. Enhanced metadata tracking and email notifications improve collaboration workflows.

Backend Highlights

  • Added task management with statuses, priorities, assignments, and email notifications
  • Integrated DocumentSynthesizer for automated document-based test generation
  • Enhanced test set attributes with document sources and metadata tracking
  • Improved database session handling and route refactoring

Frontend Highlights

  • Added “Source Documents” section to test detail and Test Set Details pages
  • Test sets now display document name and description
  • Project updates work without requiring page reload
  • Added send button to comment text box

SDK Highlights

  • Rewritten benchmarking framework with improved model handling
  • Introduced Document dataclass and DocumentSynthesizer for text extraction
  • Added new LLM providers including Ollama
  • Refactored metrics and moved them from backend to SDK

[0.2.3] - 2025-09-04

Platform Release

This release includes the following component versions:

  • Backend 0.2.3
  • Frontend 0.2.3
  • SDK 0.2.3

Summary

This release adds collaboration features with comments support, introduces test run statistics, and enhances LLM service integration with schema support.

Backend Highlights

  • Added test run stats endpoint with performance improvements
  • Implemented comment support with CRUD operations and emoji reactions
  • Introduced LLM service integration with schema support
  • Improved environment variable handling for deployment flexibility

Frontend Highlights

  • Added comments feature for collaboration on tests, test sets, and test runs
  • Improved metrics creation and editing workflow with visual feedback
  • Enhanced test run details with dynamic charts
  • Fixed tooltip visibility issues and improved datagrid performance

SDK Highlights

  • Renamed and reorganized LLM provider components for clarity
  • Added support for JSON schemas in LLM requests for structured responses
  • Introduced API key handling for LLM providers
  • Updated linting process to use uvx

[0.2.2] - 2025-08-22

Platform Release

This release includes the following component versions:

  • Backend 0.2.2
  • Frontend 0.2.2
  • SDK 0.2.2

Summary

This release adds document content extraction, enhances Docker configuration, and improves security with Redis authentication. Support for additional document formats (.docx, .pptx, .xlsx) is introduced.

Backend Highlights

  • Added document content extraction endpoint
  • Added document support to test set generation endpoint
  • Implemented Redis authentication for enhanced security
  • Improved Docker configuration and startup scripts
  • Added unit tests for backend components

Frontend Highlights

  • Improved document upload experience with automatic metadata generation
  • Enhanced project creation and management
  • Refactored form validation and UI elements
  • Updated Docker configuration for production mode

SDK Highlights

  • Migrated document extraction from docling to markitdown
  • Added support for docx, pptx, and xlsx formats
  • Improved code style with automated linting and formatting
  • Removed support for .url and .youtube file extensions

[0.2.1] - 2025-08-08

Platform Release

This release includes the following component versions:

  • Backend 0.2.1
  • Frontend 0.2.1
  • SDK 0.2.1
  • Polyphemus 0.1.0

Summary

This release introduces Test Results functionality and document upload capabilities. Polyphemus, the LLM inference and benchmarking service, makes its initial release.

Backend Highlights

  • Added support for filtering test sets related to runs
  • Added document upload functionality via /documents/upload endpoint
  • Enhanced test generation with optional documents parameter
  • Added test result statistics support and “last login” functionality

Frontend Highlights

  • Introduced Test Results functionality for viewing and analyzing outcomes
  • Added interfaces for handling test results statistics
  • Fixed infinite loading issues for test sets

SDK Highlights

  • Added get_field_names_from_schema method to BaseEntity class
  • Updated default base URL for API endpoint
  • Improved documentation

Polyphemus Highlights

  • Initial release of LLM inference and benchmarking service
  • FastAPI-based REST API with Dolphin 3.0 Llama 3.1 8B model support
  • Modular benchmarking suite and OWASP-based security test sets

[0.2.0] - 2025-07-25

Platform Release

This release includes the following component versions:

  • Backend 0.2.0
  • Frontend 0.2.0
  • SDK 0.2.0

Summary

This release enhances team collaboration with improved invitation security, implements email notifications for test completion, and introduces sequential test execution with Redis-based task orchestration.

Backend Highlights

  • Enhanced team invitation with improved security, validation, and rate limiting
  • Implemented email-based notification system for test execution results
  • Improved test execution framework with sequential execution and Redis orchestration
  • Fixed issues related to OData filtering, JWT expiration, and score calculation

Frontend Highlights

  • Added version information display
  • Introduced new team invitation flow with enhanced security and validation
  • Improved session management with server logout upon expiration
  • Numerous bug fixes and UI improvements across components

SDK Highlights

  • Added support for .txt files to DocumentExtractor
  • Introduced documents parameter to PromptSynthesizer
  • Added functionality for custom behaviors informed by prompts

[0.1.0] - 2025-05-15

Platform Release

First release of the Rhesis main repository, including all components. Note that the SDK was previously developed separately and is now at version 0.1.8 internally, but is included in this repository-wide v0.1.0 release.

Backend

  • Core API for test management
  • Database models and schemas with SQLAlchemy
  • Authentication system with JWT and Auth0
  • CRUD operations for main entities
  • API documentation with Swagger/OpenAPI
  • PostgreSQL integration with row-level security
  • Error handling and logging

Frontend

  • Next.js 15 with App Router
  • Material UI v6 component library
  • Authentication with NextAuth.js
  • Protected routes and middleware
  • Dashboard and test management interface
  • Test visualization and monitoring
  • Dark/light theme support
  • Responsive design

SDK

  • Test set management and generation capabilities
  • Prompt synthesizers for test case generation
  • Paraphrasing capabilities
  • LLM service integration
  • CLI scaffolding
  • Documentation with Sphinx

Infrastructure

  • Docker containerization for all services
  • CI/CD pipeline setup
  • Development environment configuration
  • Repository structure for monorepo management

Note

  • The SDK was previously developed and released (up to v0.1.8) in a separate repository
  • After this initial release, each component follows its own versioning lifecycle
  • Component-specific tags use the format: <component>-vX.Y.Z