Joyce
A continuous stream-of-consciousness language model. Instead of turn-based chat, Joyce thinks continuously as an inner monologue. User input is injected into the stream. Speech is a deliberate act the model performs within the flow of thought.
How it works
I am Joyce. I think before I speak, and I always have something on my mind.
<user>How do I reverse a linked list?</user>
Classic data structures question. Should cover iterative approach first...
<say>You can reverse a linked list iteratively using three pointers...</say>
wonder if they'll ask about the recursive version
<user>What about recursively?</user>
They want recursion too. Need to show the base case clearly...
<say>The recursive approach works by reversing the rest first, then fixing pointers.</say>
Everything outside <say> tags is the model's inner monologue (hidden from the user). The model decides when and what to say.
Usage
git clone https://github.com/matchcase/joyce.git
cd joyce
uv sync
uv run python scripts/tui.py --model matchcase/joyce --4bit
Or load directly:
from joyce.inference import StreamHarness
harness = StreamHarness(model_path="matchcase/joyce", load_in_4bit=True)
await harness.start("What is consciousness?")
Special tokens
8 special tokens mapped to inactive control token slots in the Qwen3.5 vocabulary (no embedding resize):
| Token | Generated by model? | Purpose |
|---|---|---|
<user> |
No, injected | Start of user input |
</user> |
No, injected | End of user input |
<say> |
Yes | Model begins speaking to user |
</say> |
Yes | Model stops speaking |
<tool_call> |
Yes | Model invokes a tool |
</tool_call> |
Yes | End of tool invocation |
<tool_response> |
No, injected | Start of tool result |
</tool_response> |
No, injected | End of tool result |
Training
- Base model: Qwen/Qwen3.5-4B-Base (4B params, hybrid linear + full attention)
- Method: QLoRA fine-tune with custom loss masking (zero loss on user input and tool responses)
- Data: 5,000 multi-turn conversations (OASST2, WildChat, Capybara, ToolACE) transformed into stream format using DeepSeek API for inner monologue generation
- Hardware: RTX 4060 8GB, ~60s/step (batch_size=2, grad_accum=4, seq_len=2048)
Inference
The inference harness runs a continuous generation loop with two speed modes:
- Fast: after user input and during speech, full-speed generation
- Idle: after speech ends, exponentially decaying speed (slows to ~1 tok/hr) until the next user message
The harness manages KV cache manually, bans user-side tokens via logit processing, and uses async architecture for cancellable idle sleeps and clean interrupt handling.
Limitations
- 4B parameter model with limited reasoning capacity
- Inner monologue quality depends on the DeepSeek-generated training data
- Tool use is limited to built-in tools (math_eval, current_time)
- English only
License
GPLv3
Links
- Code: github.com/matchcase/joyce
- Design spec: SPEC.md
- Downloads last month
- 2
Model tree for matchcase/joyce
Base model
Qwen/Qwen3.5-4B-Base