Reasoning context#22
Merged
Merged
Conversation
…el tools and timeouts - ARBITERAI_ENABLE_LLAMA CMake option (default ON): consumers using only remote OpenAI-compatible servers can build without the llama.cpp dependency; local-model APIs return NotImplemented when disabled - Schema sanitization for llama.cpp GBNF: handle boolean/array/opaque object types, strip $schema/additionalProperties, repair initializer-list array-instead-of-object schemas, normalise null properties, recurse into items - buildRequest: request-level tools take priority over session tools; plumb per-request timeout_ms and low_speed_time_s - Telemetry and storage manager additions Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds an optional cache_prompt flag at both the request level (CompletionRequest) and session level (ChatConfig). When set, the openai provider includes "cache_prompt" in the request body so a llama.cpp server reuses its KV cache for the common prompt prefix (system prompt + tool schemas), saving most of the prefill cost on every call. Only sent when explicitly set, since OpenAI itself rejects unrecognised arguments. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
InferenceScheduler is part of the local llama.cpp runtime subsystem (includes llama.h, uses ModelRuntime's LoadedModel); compiling it with the runtime disabled fails. Move it and its tests into the gated block. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.