Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Dec 23, 2025

Ref discussion: #18261 (comment)

@ServeurpersoCom I think we need to add a test with INI preset at some point

Example preset for this PR:

[THUDM/glm-edge-v-5b-gguf:Q4_K_M]
no-mmap = 0
temp = 123.000
autoload = 1
unsafe-allow-api-override = no-mmap,c

And API request:

{
        "model": "THUDM/glm-edge-v-5b-gguf:Q4_K_M",
        "overrides": {"c": "512"}
}

Returns:

{
    "success": true,
    "args": [
        "........../build/bin/llama-server",
        "--host",
        "127.0.0.1",
        "--mmap",
        "--port",
        "65054",
        "--temp",
        "123.000",
        "--alias",
        "THUDM/glm-edge-v-5b-gguf:Q4_K_M",
        "--ctx-size",
        "512",
        "--hf-repo",
        "THUDM/glm-edge-v-5b-gguf:Q4_K_M"
    ]
}

@ServeurpersoCom
Copy link
Collaborator

Interesting! If I understand correctly, you're implementing a filter that lets the router accept cold-start param overrides via the API body (like ctx, n-gpu-layers), with the whitelist in the INI controlling which params can be touched. This gives fine-grained control that can be refined later. Smart approach to transfer the risk to the admin's explicit whitelist!

@ServeurpersoCom
Copy link
Collaborator

ServeurpersoCom commented Dec 23, 2025

I was initially thinking more about hot-reloading the INI by the router to keep control while having the INI modifications handled by the admin. But you're right that since there's already an API, we're already working in two directions at once. We can improve the INI whitelist later, add threshold validation, constraints, etc

In my case, if I expose my WebUI to the internet, I need to open the entire API. So anything admin-related (memory allocation etc) requires authentication. I had initially thought about a simple /admin endpoint prefix. With all-or-nothing access, I'll have to filter at the reverse proxy level, which is not admin-friendly

Ultimately, all we need is an open "user" level as currently exists, and an "admin" level with a password for everything that changes depending on available resources such as memory, etc..

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 23, 2025

I was initially thinking more about hot-reloading the INI by the router to keep control while having the INI modifications handled by the admin.

I think allow reloading INI can be useful, but will be quite complicated due to some edge cases as I specified in #18261 (comment)

We can improve the INI whitelist later, add threshold validation, constraints, etc

IMO this will also be quite over-complicated because:

  • In most cases, wrong param will simply make the subprocess exits immediately
  • Some params requires model info to validate (which requires loading the model)

Re. the /admin endpoint, it is also quite risky in term of security. It's best to avoid any privileged logic in our code base, because there are usually a lot of small issues and edge cases when dealing with privileged access.

Instead, I would transfer the risk to end-user by allowing them to hot-reload the INI. Users can develop their own admin panel that allow modifying INI file, then trigger the hot-reload - they are now responsible for any security risks that may happen in their admin code.

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 23, 2025

A bit more context on why we want to transfer the risk whenever possible: we do have a security advisory program, but many bug reports are AI-generated and they nitpick functionalities like this.

In the context of this PR, we simply transfer the risk of allowing overriding a param to the user. So if they explicitly wrote something like this:

unsafe-allow-api-override = chat-template-file

And someone use it via API with chat-template-file = /etc/passwd

Then now it's their responsibility as they explicitly written the word unsafe themself

@ServeurpersoCom
Copy link
Collaborator

ServeurpersoCom commented Dec 23, 2025

If we code our own backend/WebUI, we don't need the API overrides; we write the INI directly. This feature is more for exposing the llama.cpp API directly to external clients, right?
Or if we're building an API for LAN only, it shouldn't be necessary for the default llama.cpp WebUI to function, right?
Everything that is HTTP should, in principle, be considered accessible from an untrusted environment! but with explicit "unsafe-" it's OK for me (just never enable it on my server, except temporarily to test the PR)....

Also, a separate endpoint/API for "unsafe" items would make it easier for the community to develop admin panels. Mixing everything together isn't ideal; unsafe items should remain localhost (--jinja /etc/passwd).

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 23, 2025

The whole router feature is never meant to be used in untrusted network. A classical DDoS attack vector is just by repeatedly sending load and unload requests.

Instead, this unsafe-allow-api-override is meant to be used by downstream application code, where the application can ask to (re-)launch the model with different config. Like for example, modified context length or GPU layers, like how LM Studio allow to do that from the UI.

@ServeurpersoCom
Copy link
Collaborator

The whole router feature is never meant to be used in untrusted network. A classical DDoS attack vector is just by repeatedly sending load and unload requests.

Yes it's a classic DoS, it doesn't even need to be distributed (DDoS), easy to filter / rate limit

@ngxson
Copy link
Collaborator Author

ngxson commented Dec 23, 2025

If we code our own backend/WebUI, we don't need the API overrides; we write the INI directly.

yeah I agree that this unsafe-allow-api-override is overlapped with INI hot-reload.

main issue is that INI hot-reload have some edge cases so we need to list that out before working on the feature.

tbh I don't know if the current PR is as useful as it is, mainly to address the fact that a lot of people asked that extra_args back. but probably fine to keep this as a draft. to see if people actually need it or not.

@ServeurpersoCom
Copy link
Collaborator

The most important thing is to discuss and test in practice, and automatically we will find the best compromises!

@Chrisischris
Copy link

This looks like what I need! Quick question to confirm: if I use --models-dir with a minimal preset file that just has a global [*] section with unsafe-allow-api-override = c, would that allow context size overrides for all dynamically discovered models?

My use case is loading models with different context sizes without disrupting others already warm in memory, hence why reloading the ini file each time doesn't suffice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants