Mars Colony Simulation

Abstract

We present a two-layer architecture for studying emergent collaborative behavior in multi-agent LLM systems. The lower layer is a deterministic rule substrate governing agent needs, pairwise bond dynamics, action selection, and a hard cost ceiling. The upper layer is an LLM-powered dialogue and decision module (GPT-4o-mini / Claude Haiku) that produces natural speech but does not drive the emergent social or task statistics.

Reproducible claims in this paper come from the lower layer, which ships as experiments/run.py and runs in ~3 seconds with stdlib Python.

Keywords: Multi-Agent LLMs, Emergent Behavior, GPT-4o-mini, Claude Haiku, Autonomous Agents, Reproducible Rigs

Motivation

Recent work (Park et al., 2023; Wang et al., 2024) shows that LLM-powered agents can develop emergent collaborative patterns when given personalities, contextual awareness, and freedom to act autonomously. The open question is not whether such patterns emerge, but which are artifacts of the specific LLM and which are properties of the agent architecture itself. If every number in a paper depends on which model version was called, results are unverifiable.

Research Questions

Under a shared rule substrate, does a random initial role mix produce reproducible strong-friendship and tension counts?
Does self-organized construction emerge without explicit task assignment — and how many unique workers does a single site attract?
Can the system hold all agents' needs above breach thresholds for the entire session?
What does the work-vs-social action distribution look like under a need-biased role-weighted policy?

System Architecture

Each of the 10 colonists is an autonomous agent with:

Role + personality — one of commander / scientist / builder / engineer / miner / medic; traits used by the LLM layer but not by the rule substrate.
Needs — energy, social, purpose in [0, 100]. Decay rates (0.020, 0.015, 0.010) per tick.
Bonds — symmetric pair score. Gain is scaled by a per-pair affinity multiplier derived from role compatibility plus random jitter; affinities above a pivot of 0.9 gain bond, below lose.
Action — one of working / socializing / resting / building / walking, chosen by a need-biased role-weighted selector.

Experimental Setup

Platform: Apple M2 Pro, 32 GB RAM, Python 3.13, stdlib only
Configuration: 10 agents × 10,000 ticks (≈30 simulated days) × 10 seeds
Wall-clock: ≈3 seconds per full run
No network calls, no NumPy, no PyTorch

Key Findings

Social dynamics (mean ± std across 10 seeds)

Category	Value per session
Strong friendships (bond > 50)	3.5 ± 2.8
Working relationships (15 < bond ≤ 50)	14.6 ± 2.3
Neutral relationships	23.3 ± 2.7
Tensions (bond < −15)	3.6 ± 1.6

Task coordination

Metric	Value
Construction sites completed	3 / 3 every session
Unique workers per site (mean)	7.3
Stable-needs fraction	100%

Action distribution

Action	Share of agent-ticks
Working	49.2%
Socializing	20.0%
Resting	16.7%
Walking	10.7%
Building	3.5%

Cost envelope

Dialogue events are rate-limited to 100 per session. At a GPT-4o-mini average cost of $7 × 10⁻⁵ per call, the analytic ceiling is $0.007 per session — regardless of prompt content.

Reproduce

cd genass/publications/quarto/mars_colony_collaboration
uv run python experiments/run.py

Outputs land in experiments/results/ (flat JSON) and the paper pulls numbers from data/simulation_results.json.

Live Demo

The full LLM-enabled system runs in the browser at /future/gaming — watch emergent collaboration play out in 3D.

Emergent Social Collaboration in Multi-Agent LLM Systems: A Mars Colony Simulation Study