AI Coding2026-02-24OpenAI Blog

OpenAI Drops SWE-bench Verified Due to Contamination Issues

OpenAI has announced it will cease using the SWE-bench Verified benchmark to evaluate its models, citing growing concerns over data contamination and flawed measurement. The company identified issues where problems from the benchmark's test set may have leaked into public training data, artificially inflating model performance. This 'contamination' problem is a major challenge in AI benchmarking. If a model has been indirectly trained on test questions, it may memorize solutions rather than dem

Noticias relacionadas