Everything About: Evaluation

👋 Sign in for the ability to sort posts by relevant, latest, or top.

kasi viswanath vandanapu

Apr 1

SQL Comparison Library Architecture

#sql #ai #evaluation #llm

14 min read

Tebogo Tseka

Mar 31

Building an LLM Judge That Doesn't Lie to You

#ai #evaluation #testing #machinelearning

8 min read

kasi viswanath vandanapu

Mar 30

Build a Production‑Ready SQL Evaluation Engine for LLMs

#sql #llm #evaluation #python

5 min read

Tebogo Tseka

Mar 30

Beyond Text: How We Built an Evaluation Framework for Multi-File AI Outputs

#ai #evaluation #testing #webdev

8 min read

Alina Trofimova

Mar 19

Evaluating Vendor Offerings: A Structured Approach to Identify High-Quality, Compatible Tools at Conferences

#devops #kubecon #evaluation #kubernetes

13 min read

Ultra Dune

Mar 17

EVAL #006: LLM Evaluation Tools — RAGAS vs DeepEval vs Braintrust vs LangSmith vs Arize Phoenix

#llm #evaluation #ai #machinelearning

10 min read

Ritwika Kancharla

Mar 3

Building an LLM Evaluation Framework That Actually Works

#evaluation #llm #ai

7 min read

Lamhot Siagian

Feb 22

Evals Aren’t a One-Time Report: Build a Living Test Suite That Ships With Every Release.

#llm #ai #evaluation

6 min read

Jamie Gray

Mar 23

How I Approach Evaluation When Building AI Features

#ai #machinelearning #testing #evaluation

6 min read

HK Lee

Mar 6

LLM Evaluation and Testing: How to Build an Eval Pipeline That Actually Catches Failures Before Production

#ai #llm #evaluation

14 min read

Lamhot Siagian

Feb 22

If you don't red-team your LLM app, your users will

#ai #llm #evaluation #security

7 min read

mgbec for AWS Community Builders

Jan 25

Go Ahead and Judge Me- Agent Evaluators in AWS AgentCore

#evaluation #agents #amazonbedrock

6 min read

Priyam

Jan 6

Why Image Hallucination Is More Dangerous Than Text Hallucination

#evaluation #ai #machinelearning #futureagi

1 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

DEV Community

# evaluation

SQL Comparison Library Architecture

Building an LLM Judge That Doesn't Lie to You

Build a Production‑Ready SQL Evaluation Engine for LLMs

Beyond Text: How We Built an Evaluation Framework for Multi-File AI Outputs

Evaluating Vendor Offerings: A Structured Approach to Identify High-Quality, Compatible Tools at Conferences

EVAL #006: LLM Evaluation Tools — RAGAS vs DeepEval vs Braintrust vs LangSmith vs Arize Phoenix

Building an LLM Evaluation Framework That Actually Works

Evals Aren’t a One-Time Report: Build a Living Test Suite That Ships With Every Release.

How I Approach Evaluation When Building AI Features

LLM Evaluation and Testing: How to Build an Eval Pipeline That Actually Catches Failures Before Production

If you don't red-team your LLM app, your users will

Go Ahead and Judge Me- Agent Evaluators in AWS AgentCore

Why Image Hallucination Is More Dangerous Than Text Hallucination