Skip to content

Conversation

@danbev
Copy link
Member

@danbev danbev commented Dec 22, 2025

This commit adds support for saving logits to files in the evaluation callback example. Two files are stored which is a binary file and a text file for manual inspection.

Two options have been added to this example:

----- example-specific params -----

--save-logits                           save final logits to files for verification (default: false)
--logits-output-dir DIR                 directory for saving logits output files (default: data

This motivation for this change (and follow up changes) is to replace llama-logits in examples/model-conversion, which stores logits so they can be compared to the original models logits.

Future commits will add more of the features that are currently in llama-logits, like printing the prompt and token ids, and also enhance this to also store the logits and token ids so that they can also be compared as part of the verification process.

This commit adds support for saving logits to files in the evaluation
callback example. Two files are stored which is a binary file and a text
file for manual inspection.

Two options have been added to this example:
```console
----- example-specific params -----

--save-logits                           save final logits to files for verification (default: false)
--logits-output-dir PATH                directory for saving logits output files (default: data)
```

This motivation for this change (and follow up changes) is to replace
llama-logits in examples/model-conversion, which stores logits so they
can be compared to the original models logits.

Future commits will add more of the features that are currently in
llama-logits, like printing the prompt and token ids, and also enhance
this to also store the logits and token ids so that they can also be
compared as part of the verification process.
@pwilkin
Copy link
Collaborator

pwilkin commented Dec 22, 2025

I didn't commit it, but I had a patch that did this for llama-completion for --n-predict == 1.

I think that now llama-cli is separate, we should actually deprecate llama-eval-callback and move that functionality to llama-completion since it also permits dumping generation for more than one token, which is often needed for proper debugging.

@pwilkin
Copy link
Collaborator

pwilkin commented Dec 22, 2025

Also, I'll prioritize #17914 since it's relevant to this as well.

@danbev
Copy link
Member Author

danbev commented Dec 22, 2025

I think that now llama-cli is separate, we should actually deprecate llama-eval-callback and move that functionality to llama-completion

My feeling is that llama-completion is already doing a lot and having something separate and more focused would be nice for things like model verification.

But obviously if the majority think we should remove eval-callback then we should. I'll leave this open for a bit to allow others to chime in.

@pwilkin
Copy link
Collaborator

pwilkin commented Dec 22, 2025

@danbev I mean, I feel like llama-completion has been relieved of most of its duties and now is exactly mostly a debuggging / special processing tool.

What I mean is that if we want some debugging functionality, then I feel it really must address mechanisms such as chunking and autoregressive pass, which are already supported under llama-completion. Thus, it would be easier to add dump functionality to llama-completion then to add multi-token / chunking support to eval-callback. But I really do feel like we do need both on one tool, otherwise it's just too much overhead to maintain proper debugging support on both.

@ngxson
Copy link
Collaborator

ngxson commented Dec 22, 2025

I think the main problem with llama-completion is that there can be some not-very-visible logic under the hood, like chat template or context shifting, that can potentially affect the end-result if used incorrectly.

What I'm thinking is that we can re-group eval-callback and llama-tokenize into a new example called llama-debug. IMO it can be quite useful because tokenizer and logits matching are 2 main use cases when we work with partnership for day-0 support. The llama-debug will be pure completion, and does not contain any chat template logic.

@danbev
Copy link
Member Author

danbev commented Dec 22, 2025

What I'm thinking is that we can re-group eval-callback and llama-tokenize into a new example called llama-debug.

I like the sound of this and I think this would be useful for model verification.

@pwilkin I see your point here and perhaps we should have some similar functionality, or a subset, for llama-completion as well.

Lets leave this open over the holidays to get some more input and then proceeded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants