SWE-bench Lite Leaderboard
| Rank | System | Score (%) | Date | Status |
|---|---|---|---|---|
| 1 | SGAgent + Claude 4 Sonnet (Ours) | 60.67 | 2025-12-20 | Competitive |
| 1 | ExpeRepair-v1.0 + Claude 4 Sonnet | 60.33 | 2025-06-25 | Active |
| 2 | Refact.ai Agent | 60.00 | 2025-04-25 | Active |
| 3 | SWE-agent + Claude 4 Sonnet | 56.67 | 2025-05-26 | Active |
| 4 | Isea + Claude 3.5 Sonnet (Ours) | 51.33 | 2025-09-10 | Competitive |
| 5 | ExpeRepair-v1.0 | 48.33 | 2025-06-13 | Active |
| 6 | SWE-agent + Claude 3.7 Sonnet | 48.00 | 2025-02-26 | Active |
| 7 | DARS Agent | 47.00 | 2025-02-05 | Active |
System Overview
SGAgent is an advanced multi-agent issue fixing system that achieves a 60.67% success rate on SWE-bench Lite with Claude 4 Sonnet, ranking #1 among all submissions. The system combines Neo4j knowledge graphs, specialized AI agents, and intelligent patch generation to automatically locate, analyze, and fix software issues.
Multi-Agent Architecture
Knowledge Graph
Code structure & relationships
Localizer Agent
Identifies ≤5 issue locations
Suggester Agent
Proposes repair strategies
Fixer Agent
Generates 1 patch per round (optional multi-candidate)
Core Components
State Management
Sophisticated state graph orchestrating agent interactions and maintaining conversation context throughout the debugging process.
Neo4j Knowledge Graph
Optional graph backend that, when enabled, provides global code structure navigation and relationship analysis.
Advanced Tool Framework
Rich tool suite for code analysis, search, and manipulation, based on static analysis with optional Neo4j integration.
Dynamic Summarization
Intelligent context management that summarizes long conversations to maintain focus and prevent token limit issues.
Localizer / Suggester / Fixer Output Example
{
"locations": [
{
"path": "/root/temp_container/astropy__astropy-12907/astropy/modeling/separable.py",
"start_line": 245,
"end_line": 245
}
],
"reasons": [
"The bug is in the _cstack function at line 245. When the right operand is an ndarray..."
]
}
{
"suggestions": [
{
"title": "Fix _cstack function to properly handle nested CompoundModels",
"rationale": [
"The current _cstack function incorrectly handles nested compound models by setting all values to 1 instead of preserving the actual relationship matrix",
"When the right operand is an array (from a recursive call), it should properly copy the matrix layout",
"This fix ensures that nested compound models preserve their separability characteristics"
],
"confidence": 0.8,
"impact_area": "astropy.modeling.separable._cstack",
"actions": [
{
"path": "/root/temp_container/astropy__astropy-12907/astropy/modeling/separable.py",
"operation": "replace",
"start_line": 244,
"end_line": 245,
"symbol": null,
"patch_preview": " cright = np.zeros((noutp, right.shape[1]))\n cright[-right.shape[0]:, -right.shape[1]:] = right"
}
],
"risks": [
"May affect other separability calculations if the change is too broad"
],
"tests": [
"Test nested compound models with separability_matrix function",
"Verify that simple compound models still work correctly",
"Run the existing test suite for separability module"
],
"references": [
"astropy.modeling.separable._cstack",
"astropy.modeling.separable.separability_matrix",
"astropy.modeling.core.CompoundModel"
]
}
]
}
diff --git a/astropy/modeling/separable.py b/astropy/modeling/separable.py @@ -242,7 +242,7 @@ def _cstack(left, right): cright = _coord_matrix(right, 'right', noutp) else: cright = np.zeros((noutp, right.shape[1])) - cright[-right.shape[0]:, -right.shape[1]:] = 1 + cright[-right.shape[0]:, -right.shape[1]:] = right return np.hstack([cleft, cright])
Comprehensive Tool Suite
Code Structure Analysis Tools
Method & Class Analysis Tools
Variable & Import Analysis Tools
Content Search Tools
File System & Editing Tools
Helper & Thinking Tools
Complete System Pipeline
Repository Input
SWE-bench Project
Code Index / Neo4j Graph (Optional)
Static code index by default, with optional Neo4j knowledge graph when enabled
Problem Statement
Bug Description Input
Localizer Agent
Identifies ≤5 Locations
Suggester Agent
Receives Locations
+ Proposes Strategies
Fixer Agent
Patch Implementation
Default: Single Patch
1 Patch / Round
Direct Output
Final Solution
Verification Only
Optional: Multi-Candidate
Multi-Temp Gen
40 Candidates
4-Level Filtering
Optimal Patch
Best of 40
Complete Workflow
4-Phase Pipeline
Phase 1: Repository Preprocessing
Code Index & Optional Neo4j Graph: Parse and index the repository into an internal code index. When Neo4j is configured, the same information is also stored in a knowledge graph for richer global queries.
Phase 2: Issue Location Analysis
Input: Problem Statement from SWE-bench
Localizer Agent: Analyzes problem description, navigates knowledge graph, identifies up to 5 suspicious issue locations
Suggester Agent: Receives identified locations, collects contextual information, proposes coordinated repair strategies
Output: Issue locations + comprehensive repair suggestions
Phase 3: Patch Generation
Fixer Agent: Implements coordinated patches for identified locations
Generation Strategy:
- Default: Generates 1 high-precision patch per round using deterministic sampling (T=0.0)
- Optional Multi-Candidate: Can be configured to generate 40 total variants (10 per round) with diverse temperature settings for difficult issues
- Multi-location coordination for interconnected fixes
Phase 4: Verification & Selection
Verification Strategy:
- Single Patch Mode: Direct verification against reproduction script and regression tests.
- Multi-Candidate Mode (Filtering Hierarchy):
- Regression Test Pass Rate: Select patches with maximum passing tests
- Reproduction Test Pass Rate: Prioritize patches that pass original reproduction tests
- Normalized Patch Diversity: Choose most frequent normalized patterns
- Patch Size Optimization: Prefer patches with larger meaningful changes
Intelligent State Management
Dynamic Routing
Conditional edges route between agents based on current state and message content, enabling adaptive workflow management.
Context Summarization
Automatic conversation summarization when message count exceeds thresholds, maintaining essential context while preventing token overflow.
Error Recovery
Robust error handling with JSON parsing fallbacks and tool execution error management.
API Statistics
Comprehensive tracking of API calls, token usage, and performance metrics for optimization and analysis.
Technical Implementation
Knowledge Graph Schema
Nodes: Class, Method, Variable, Test
Relationships:
• BELONGS_TO: Method/Variable → Class
• CALLS: Method → Method
• HAS_METHOD: Class ↔ Method
• HAS_VARIABLE: Class ↔ Variable
• INHERITS: Class → Class
• REFERENCES: Method → Variable/Class
• TESTED: Method → Test
Interactive Neo4j Knowledge Graph Visualization
Interactive Neo4j knowledge graph: Drag nodes to reposition • Click nodes to highlight connections • Hover for details
Shows real relationships: Classes (pink), Methods (blue), Variables (orange) with CALLS, BELONGS_TO, HAS_METHOD edges
Complete Pipeline Implementation
## Phase 1: System Initialization
INITIALIZE multi_agent_system
SET precise_llm = LLM(model=CLAUDE_SONNET, temperature=0.0)
SET creative_llm = LLM(model=CLAUDE_SONNET, temperature=0.8)
CREATE Agent_Localizer(tools=NEO4J_TOOLS + FILE_TOOLS)
CREATE Agent_Suggester(tools=NEO4J_TOOLS + FILE_TOOLS)
CREATE Agent_Fixer(tools=NEO4J_TOOLS + FILE_TOOLS)
## Phase 2: Multi-Agent Workflow Execution
INITIALIZE workflow_graph = StateGraph(AgentState)
ADD_NODES(Localizer, Suggester, Fixer, ToolNodes, Summarizer)
ADD_CONDITIONAL_EDGES(routing_logic)
COMPILE workflow_graph
EXECUTE workflow_graph.stream(initial_state)
→ Localizer identifies ≤5 issue locations
→ Suggester analyzes context and proposes repair strategies
→ Fixer generates coordinated patches
## Phase 3: Multi-Variant Patch Generation
FOR EACH issue_location IN identified_locations:
BUILD context_prompt(location, surrounding_code, imports, suggestions)
// Generate precise patch variant
precise_patch = precise_llm.INVOKE(context_prompt)
EXTRACT code_block FROM precise_patch.response
STORE precise_patch[location_id] = extracted_code
// Generate diverse patch variants
diverse_patches = []
FOR variant_num = 1 TO 9 :
variant_response = creative_llm.INVOKE(context_prompt)
variant_code = EXTRACT_CODE(variant_response)
diverse_patches.APPEND(variant_code)
END FOR
STORE variant_patches[location_id] = diverse_patches
END FOR
## Phase 4: Atomic Multi-File Patch Application
FUNCTION apply_patches_and_generate_diff(patch_collection):
file_modifications = CREATE_EMPTY_MAP()
// Group patches by target files
FOR EACH location, patch_code IN patch_collection:
target_file = GET_FILE_PATH(location)
line_range = GET_LINE_RANGE(location)
file_modifications[target_file].ADD(line_range, patch_code)
END FOR
// Apply modifications atomically (reverse order)
FOR EACH file IN file_modifications:
original_content = READ_FILE(file)
modifications = SORT_REVERSE_BY_LINE_NUMBER(file_modifications[file])
FOR EACH modification IN modifications:
REPLACE_LINES(original_content, modification.range, modification.code)
END FOR
WRITE_FILE(file, modified_content)
END FOR
diff_output = EXECUTE_GIT_DIFF(repository_root)
RESTORE_ORIGINAL_FILES(original_state)
RETURN diff_output
END FUNCTION
## Phase 5: Comprehensive Results Export
all_patch_variants = INITIALIZE_COLLECTION()
all_patch_variants["precise_patches"] = precise_patches
FOR variant_index = 1 TO 9:
variant_set = EXTRACT_VARIANT(diverse_patches, variant_index)
variant_diff = apply_patches_and_generate_diff(variant_set)
all_patch_variants[f"variant_{variant_index}"] = variant_diff
END FOR
final_results = {
"patch_variants": all_patch_variants,
"git_diffs": diff_collection,
"metadata": execution_statistics
}
EXPORT_JSON(final_results, output_directory)
Key Technical Innovations
🔍 CKGRetriever Integration
Custom Neo4j retriever with singleton pattern ensuring efficient database connections and query optimization.
🎛️ Dynamic Temperature Control
Variable temperature settings (0.0 for precision, 0.8 for creativity) optimizing patch generation diversity.
📏 Intelligent Truncation
Smart output truncation preventing token overflow while preserving essential information integrity.
🔧 Process Management
Sophisticated patch processing with line number management and context preservation.
Model Configuration
PRIMARY_MODEL = ADVANCED_LLM_BACKEND
TEMPERATURE_PRECISE = 0.0 // Deterministic responses
TEMPERATURE_CREATIVE = 0.8 // Diverse solution generation
CONTEXT_THRESHOLD = 16 // Message count for summarization trigger
TOKEN_OPTIMIZATION = ENABLED // Intelligent content compression
# Performance Monitoring System
ENABLE api_statistics_collection()
TRACK prompt_content, response_content
MONITOR token_usage(prompt_tokens, completion_tokens, total_tokens)
LOG execution_timestamps
EXPORT performance_metrics TO json_format
IMPLEMENT real_time_analytics_dashboard()
Core Agent State Definition
DEFINE AgentState EXTENDS MessagesState:
// Core workflow state
conversation_history: MessageSequence
issue_locations: List[LocationDescriptor]
repair_suggestions: StrategicAnalysis
generated_patches: PatchCollection
// Agent coordination flags
locator_ready: Boolean
suggester_ready: Boolean
fixer_ready: Boolean
// Context management
conversation_summary: CompressedContext
current_agent: AgentIdentifier
next_agent: AgentIdentifier
execution_metrics: PerformanceCounters
// Problem context
problem_statement: ProblemDescription
project_context: ProjectMetadata
failed_attempts: List[FailureRecord]
END DEFINE