eval-callback : add support for saving logits #18281

danbev · 2025-12-22T08:36:17Z

This commit adds support for saving logits to files in the evaluation callback example. Two files are stored which is a binary file and a text file for manual inspection.

Two options have been added to this example:

----- example-specific params -----

--save-logits                           save final logits to files for verification (default: false)
--logits-output-dir DIR                 directory for saving logits output files (default: data

This motivation for this change (and follow up changes) is to replace llama-logits in examples/model-conversion, which stores logits so they can be compared to the original models logits.

Future commits will add more of the features that are currently in llama-logits, like printing the prompt and token ids, and also enhance this to also store the logits and token ids so that they can also be compared as part of the verification process.

This commit adds support for saving logits to files in the evaluation callback example. Two files are stored which is a binary file and a text file for manual inspection. Two options have been added to this example: ```console ----- example-specific params ----- --save-logits save final logits to files for verification (default: false) --logits-output-dir PATH directory for saving logits output files (default: data) ``` This motivation for this change (and follow up changes) is to replace llama-logits in examples/model-conversion, which stores logits so they can be compared to the original models logits. Future commits will add more of the features that are currently in llama-logits, like printing the prompt and token ids, and also enhance this to also store the logits and token ids so that they can also be compared as part of the verification process.

pwilkin · 2025-12-22T09:52:24Z

I didn't commit it, but I had a patch that did this for llama-completion for --n-predict == 1.

I think that now llama-cli is separate, we should actually deprecate llama-eval-callback and move that functionality to llama-completion since it also permits dumping generation for more than one token, which is often needed for proper debugging.

pwilkin · 2025-12-22T09:53:43Z

Also, I'll prioritize #17914 since it's relevant to this as well.

danbev · 2025-12-22T10:50:38Z

I think that now llama-cli is separate, we should actually deprecate llama-eval-callback and move that functionality to llama-completion

My feeling is that llama-completion is already doing a lot and having something separate and more focused would be nice for things like model verification.

But obviously if the majority think we should remove eval-callback then we should. I'll leave this open for a bit to allow others to chime in.

pwilkin · 2025-12-22T11:47:56Z

@danbev I mean, I feel like llama-completion has been relieved of most of its duties and now is exactly mostly a debuggging / special processing tool.

What I mean is that if we want some debugging functionality, then I feel it really must address mechanisms such as chunking and autoregressive pass, which are already supported under llama-completion. Thus, it would be easier to add dump functionality to llama-completion then to add multi-token / chunking support to eval-callback. But I really do feel like we do need both on one tool, otherwise it's just too much overhead to maintain proper debugging support on both.

ngxson · 2025-12-22T12:27:26Z

I think the main problem with llama-completion is that there can be some not-very-visible logic under the hood, like chat template or context shifting, that can potentially affect the end-result if used incorrectly.

What I'm thinking is that we can re-group eval-callback and llama-tokenize into a new example called llama-debug. IMO it can be quite useful because tokenizer and logits matching are 2 main use cases when we work with partnership for day-0 support. The llama-debug will be pure completion, and does not contain any chat template logic.

danbev · 2025-12-22T12:49:55Z

What I'm thinking is that we can re-group eval-callback and llama-tokenize into a new example called llama-debug.

I like the sound of this and I think this would be useful for model verification.

@pwilkin I see your point here and perhaps we should have some similar functionality, or a subset, for llama-completion as well.

Lets leave this open over the holidays to get some more input and then proceeded.

danbev requested a review from ggerganov as a code owner December 22, 2025 08:36

github-actions bot added the examples label Dec 22, 2025

loci-dev mentioned this pull request Dec 22, 2025

UPSTREAM PR #18281: eval-callback : add support for saving logits auroralabs-loci/llama.cpp#661

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

eval-callback : add support for saving logits #18281

eval-callback : add support for saving logits #18281

danbev commented Dec 22, 2025

Uh oh!

pwilkin commented Dec 22, 2025

Uh oh!

pwilkin commented Dec 22, 2025

Uh oh!

danbev commented Dec 22, 2025

Uh oh!

pwilkin commented Dec 22, 2025

Uh oh!

ngxson commented Dec 22, 2025 •

edited

Loading

Uh oh!

danbev commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eval-callback : add support for saving logits #18281

Are you sure you want to change the base?

eval-callback : add support for saving logits #18281

Conversation

danbev commented Dec 22, 2025

Uh oh!

pwilkin commented Dec 22, 2025

Uh oh!

pwilkin commented Dec 22, 2025

Uh oh!

danbev commented Dec 22, 2025

Uh oh!

pwilkin commented Dec 22, 2025

Uh oh!

ngxson commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danbev commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngxson commented Dec 22, 2025 •

edited

Loading