Rendezllama is a text interface for running a local chatbot based on ggerganov's llama.cpp.
For now, there's just a command-line interface, but the plan is to make a progressive web app that connects with the chatbot running on a home server.
Assuming you have the quantized weights already and can compile C++, you can try the assistant_plain example with a few commands:
# If undefined, assume the 7B model exists in a sibling llama.cpp/ dir.
MODEL="${MODEL:-../llama.cpp/models/7B/ggml-model-q4_0.gguf}"
# Make just creates a bld/ directory and invokes CMake to build there.
make
# Run with specific settings from a file. They can be given as flags too.
./bld/src/chat/chat \
--x_setting example/prompt/assistant_plain/setting.sxpb \
--thread_count 8 \
--model "${MODEL}"
See the example/prompt/ directory for more interesting/whimsical examples.
- Setting file.
--x_setting setting.sxpb
loads settings fromsetting.sxpb
.- All other options can be set within this file.
- Model files.
--model ggml-model-q4_0.gguf
are the model weights. Usually quantized.- See doc/setting/model.md for LoRA files and memory options.
- Prompt files.
--x_priming priming.txt
specifies the priming prompt text file. This is the prompt that never changes.--x_rolling rolling.txt
specifies rolling prompt. This is the initial chat dialogue. As the chat continues, older dialogue expires and "rolls" out of context.- The protagonist and confidant names are derived automatically from this.
- See doc/setting/prompt.md for more prompt file & format options.
In the chat, most things you type will be prefixed with the protagonist's name and suffixed by the confidant's dialogue line. There are some special inputs and commands that help keep an infinite chat from going off the rails. Remember, the recent chat content is just a rolling prompt concatenated to the end of the priming prompt, so its quality is just as important!
- Interactivity.
- An empty input lets token generation keep happening.
- See doc/setting/stdio.md for settings that I/O behavior and limits.
/tail
or/tail 10
shows the last 10 lines./head
or/head 10
shows the first 10 lines of the rolling prompt./forget 10
removes the first 10 lines of the rolling prompt.
- Characters.
/(protagonist "User")
changes the protagonist's name to "User"./(confidant "Char")
changes the confidant's name to "Char".- See doc/setting/prompt.md#prefix for more ways to control chat line prefixes.
- Editing.
- A blank space forces token generation to continue on the same line.
some text
(note blank space in front) addssome text
to the current line.some text
(note blank spaces in front and back) addssome text
and forces another token on the same line. Useful when inserting a sentence.\nsome text
(note the escaped newline in front) adds a new line of dialogue for the confidant that starts withsome text
./puts A line of text.
adds a new line of text. Does not echo anything./yield
or/y
adds a new line dialogue for the confidant./yield Char:
or/y Char:
adds a new line starting withChar:
./gets 64 Char:
is like/yield
but generates slightly over a max of 64 bytes. Only prints the newly-generated text. Always includes a newline at the end./r
regenerates the last line of dialogue./R
generates text from the current position. Subsequent/r
commands will only replace the generated text, nothing before it on the line./d
deletes up to and including the last chat prefix./D
or/D 0
deletes all text on the current line without consuming a newline. Positive integers delete that many earlier lines in full./b
or/b 1
deletes the last token./B
or/B 1
deletes the last word.
- Sampling.
- A slash followed by a valid sampling configuration in
setting.sxpb
reconfigures the sampling parameters. /(language ((infer_via sampling) (adjust_thru (()) (temperature 0.9))))
sets the temperature to 0.9.- See doc/setting/sampling.md for more ways to control inference.
- A slash followed by a valid sampling configuration in