Joyce

A continuous stream-of-consciousness language model. Instead of turn-based chat, Joyce thinks continuously as an inner monologue. User input is injected into the stream. Speech is a deliberate act the model performs within the flow of thought.

How it works

I am Joyce. I think before I speak, and I always have something on my mind.
<user>How do I reverse a linked list?</user>
Classic data structures question. Should cover iterative approach first...
<say>You can reverse a linked list iteratively using three pointers...</say>
wonder if they'll ask about the recursive version
<user>What about recursively?</user>
They want recursion too. Need to show the base case clearly...
<say>The recursive approach works by reversing the rest first, then fixing pointers.</say>

Everything outside <say> tags is the model's inner monologue (hidden from the user). The model decides when and what to say.

Usage

git clone https://github.com/matchcase/joyce.git
cd joyce
uv sync
uv run python scripts/tui.py --model matchcase/joyce --4bit

Or load directly:

from joyce.inference import StreamHarness

harness = StreamHarness(model_path="matchcase/joyce", load_in_4bit=True)
await harness.start("What is consciousness?")

Special tokens

8 special tokens mapped to inactive control token slots in the Qwen3.5 vocabulary (no embedding resize):

Token	Generated by model?	Purpose
`<user>`	No, injected	Start of user input
`</user>`	No, injected	End of user input
`<say>`	Yes	Model begins speaking to user
`</say>`	Yes	Model stops speaking
`<tool_call>`	Yes	Model invokes a tool
`</tool_call>`	Yes	End of tool invocation
`<tool_response>`	No, injected	Start of tool result
`</tool_response>`	No, injected	End of tool result

Training

Base model: Qwen/Qwen3.5-4B-Base (4B params, hybrid linear + full attention)
Method: QLoRA fine-tune with custom loss masking (zero loss on user input and tool responses)
Data: 5,000 multi-turn conversations (OASST2, WildChat, Capybara, ToolACE) transformed into stream format using DeepSeek API for inner monologue generation
Hardware: RTX 4060 8GB, ~60s/step (batch_size=2, grad_accum=4, seq_len=2048)

Inference

The inference harness runs a continuous generation loop with two speed modes:

Fast: after user input and during speech, full-speed generation
Idle: after speech ends, exponentially decaying speed (slows to ~1 tok/hr) until the next user message

The harness manages KV cache manually, bans user-side tokens via logit processing, and uses async architecture for cancellable idle sleeps and clean interrupt handling.

Limitations

4B parameter model with limited reasoning capacity
Inner monologue quality depends on the DeepSeek-generated training data
Tool use is limited to built-in tools (math_eval, current_time)
English only

License

GPLv3

Model tree for matchcase/joyce

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

(78)

this model

matchcase
/

joyce