Prevent crash if TTFT >300sec, boosted to 90 days #18279

wbtek · 2025-12-22T07:52:19Z

llama-server has a hardcoded HTTP timeout of 300 seconds (the library default). Time to First Token (TTFT) frequently exceeds this on consumer hardware or when processing attached files, etc. It is basically impossible to load any reasonable amount of code for analysis with CPU-only hardware and a large enough model to be useful.

When this happens, the connection is dropped, the WebUI gets out of sync, etc.

This PR extends the timeout to 90 days, effectively hiding the problem and allowing llama-server to actually be used.

I would call this a stopgap but urgently needed measure.

Note: Manually dropping a model using the WebUI can get stuck if the model is busy. Perhaps llama-server's model-dropping code should be made more forceful, with or without this simple fix in place.

ngxson

the fix is correct but the comment is wrong: TTFT refers to time it takes from sending request to first token generated. it is made of:

time to read HTTP body
time to parse the request and do any pre-processing like tokenizer
time to wait for the LLM inference to start, stop, etc
time to send back the HTTP request

set_read_timeout is simply the timeout for reading HTTP body. it has nothing to do with LLM

ngxson

this won't work as-is because there is also a timeout imposed by the model instance in server-http.cpp

instead, please set the set_read_timeout and set_write_timeout according to the input params (do NOT hard-code it to 90 days)

llama.cpp/tools/server/server-http.cpp

Lines 105 to 106 in 147a521

    
           srv->set_read_timeout (params.timeout_read); 
        
           srv->set_write_timeout(params.timeout_write);

Edit: for client cli, the set_write_timeout use be set to params.timeout_read (reversed to server)

ngxson

keep the code vertically aligned

tools/server/server-models.cpp

wbtek · 2025-12-22T17:08:48Z

@ngxson There ya go! :)

90day timeout is ugly whether hardcoded or on the command line. It is a band-aid.

That timeout is for communications. It shouldn't be waiting for the LLM, but it is. If the LLM doesn't send its first response token before that timeout, it fails.

Maybe:

If the timeout happens, it should check if everything is ok and wait again.
or
A "continue" message should be sent periodically to reset the timer.

Also, model unload won't complete until cli->send() finishes. I think model unload needs to try harder.

wbtek · 2025-12-22T17:15:25Z

keep the code vertically aligned

Yeah, that sux. Sry 'bout that. It looked ok before it got to github. Spacing fixed.

wbtek · 2025-12-23T09:55:00Z

--- Testing ---
Model: Qwen3-Next-80B-A3B-Instruct-Q5_K_L.gguf (57G)
Attachments: 58k of llama-server C++ source code
Hardware: CPU only, Intel, 64G ram (It runs!)
Prompt: "What is 1+1?" (short answer wanted, test only requires one char to succeed)

Shorten timeout using -to 77 proving code is using the command-line argument. Fails correctly at (77 seconds).
Prove original and related problems are still corrected with prompt / attachments requiring >600 seconds before first response. Default timeout was from http library (300sec), but is now from llama-server (600sec). Set the timeout to a practical infinity with -to 7777777 (~90 days). The model responds after 780 seconds with no failure.
Show default is now coming from llama-server (600sec). Same as test 2 with -to removed completely. The model fails correctly at exactly 600 seconds (10 minutes).

Prevent crash if TTFT >300sec, boosted to 90 days

25267f8

wbtek requested review from ggerganov and ngxson as code owners December 22, 2025 07:52

github-actions bot added examples server labels Dec 22, 2025

ngxson reviewed Dec 22, 2025

View reviewed changes

ngxson requested changes Dec 22, 2025

View reviewed changes

server : allow configurable HTTP timeouts for child models

39c2038

ngxson requested changes Dec 22, 2025

View reviewed changes

tools/server/server-models.cpp Outdated Show resolved Hide resolved

server : pass needed timeouts from params only

5c037f5

wbtek force-pushed the wbtek/server-timeout-extension branch from d49cd26 to 5c037f5 Compare December 23, 2025 07:49

loci-dev mentioned this pull request Dec 23, 2025

UPSTREAM PR #18279: Prevent crash if TTFT >300sec, boosted to 90 days auroralabs-loci/llama.cpp#672

Open

wbtek requested a review from ngxson December 23, 2025 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent crash if TTFT >300sec, boosted to 90 days #18279

Prevent crash if TTFT >300sec, boosted to 90 days #18279

wbtek commented Dec 22, 2025 •

edited

Loading

Uh oh!

ngxson left a comment •

edited

Loading

Uh oh!

ngxson left a comment •

edited

Loading

Uh oh!

ngxson left a comment

Uh oh!

Uh oh!

wbtek commented Dec 22, 2025

Uh oh!

wbtek commented Dec 22, 2025 •

edited

Loading

Uh oh!

wbtek commented Dec 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	srv->set_read_timeout (params.timeout_read);
	srv->set_write_timeout(params.timeout_write);

Prevent crash if TTFT >300sec, boosted to 90 days #18279

Are you sure you want to change the base?

Prevent crash if TTFT >300sec, boosted to 90 days #18279

Conversation

wbtek commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wbtek commented Dec 22, 2025

Uh oh!

wbtek commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wbtek commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wbtek commented Dec 22, 2025 •

edited

Loading

ngxson left a comment •

edited

Loading

ngxson left a comment •

edited

Loading

wbtek commented Dec 22, 2025 •

edited

Loading

wbtek commented Dec 23, 2025 •

edited

Loading