Joyce

A continuous stream-of-consciousness language model. Instead of turn-based chat, Joyce thinks continuously as an inner monologue. User input is injected into the stream. Speech is a deliberate act the model performs within the flow of thought.

How it works

I am Joyce. I think before I speak, and I always have something on my mind.
<user>How do I reverse a linked list?</user>
Classic data structures question. Should cover iterative approach first...
<say>You can reverse a linked list iteratively using three pointers...</say>
wonder if they'll ask about the recursive version
<user>What about recursively?</user>
They want recursion too. Need to show the base case clearly...
<say>The recursive approach works by reversing the rest first, then fixing pointers.</say>

Everything outside <say> tags is the model's inner monologue (hidden from the user). The model decides when and what to say.

Usage

git clone https://github.com/matchcase/joyce.git
cd joyce
uv sync
uv run python scripts/tui.py --model matchcase/joyce --4bit

Or load directly:

from joyce.inference import StreamHarness

harness = StreamHarness(model_path="matchcase/joyce", load_in_4bit=True)
await harness.start("What is consciousness?")

Special tokens

8 special tokens mapped to inactive control token slots in the Qwen3.5 vocabulary (no embedding resize):

Token Generated by model? Purpose
<user> No, injected Start of user input
</user> No, injected End of user input
<say> Yes Model begins speaking to user
</say> Yes Model stops speaking
<tool_call> Yes Model invokes a tool
</tool_call> Yes End of tool invocation
<tool_response> No, injected Start of tool result
</tool_response> No, injected End of tool result

Training

  • Base model: Qwen/Qwen3.5-4B-Base (4B params, hybrid linear + full attention)
  • Method: QLoRA fine-tune with custom loss masking (zero loss on user input and tool responses)
  • Data: 5,000 multi-turn conversations (OASST2, WildChat, Capybara, ToolACE) transformed into stream format using DeepSeek API for inner monologue generation
  • Hardware: RTX 4060 8GB, ~60s/step (batch_size=2, grad_accum=4, seq_len=2048)

Inference

The inference harness runs a continuous generation loop with two speed modes:

  • Fast: after user input and during speech, full-speed generation
  • Idle: after speech ends, exponentially decaying speed (slows to ~1 tok/hr) until the next user message

The harness manages KV cache manually, bans user-side tokens via logit processing, and uses async architecture for cancellable idle sleeps and clean interrupt handling.

Limitations

  • 4B parameter model with limited reasoning capacity
  • Inner monologue quality depends on the DeepSeek-generated training data
  • Tool use is limited to built-in tools (math_eval, current_time)
  • English only

License

GPLv3

Links

Downloads last month
2
Safetensors
Model size
5B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for matchcase/joyce

Finetuned
(78)
this model