PGS — Partitioned Graph Synthesis

Overview

The Problem PGS Solves

Standard RAG retrieves the top-K nodes by embedding similarity and fits them into a single LLM context window. For knowledge graphs with hundreds or thousands of nodes, this has a hard structural limit: nodes semantically distant from the query embedding are never examined, even when they contain directly relevant findings. The system silently omits the gaps.

PGS addresses three failure modes simultaneously: local scoring misses topologically distant content (embedding similarity misses structurally relevant nodes), context window exhaustion (graphs over ~500 nodes cannot fit in a single pass), and no mechanism to report gaps (standard RAG returns answers derived only from retrieved content, with no channel for what was not found).

The architecture is a four-phase pipeline: Partition the graph using community detection, Route the query to relevant partitions using centroid similarity, Sweep each partition independently with a full-fidelity LLM pass, and Synthesize across all sweep outputs in a final cross-domain reasoning step.

PGS is not a new algorithm. It combines established techniques -- Louvain community detection, cosine similarity routing, parallel LLM inference -- in an architecture oriented toward coverage completeness and explicit gap reporting. The novelty is the structured information contract between sweep and synthesis phases, which transforms disconnected per-partition outputs into material for genuine cross-domain reasoning.

Pipeline

The Four-Phase Pipeline

Phase 0

Partition

Louvain community detection, cached per brain hash

Phase 1

Route

Cosine similarity against partition centroids

Phase 2

Sweep

Parallel full-fidelity LLM passes per partition

Phase 3

Synthesize

Single cross-domain reasoning pass over all sweep outputs

Phase 0: Partition

The graph is partitioned using the Louvain community detection algorithm -- pure JavaScript, no external dependencies. Cache validity is determined by a four-field brain hash (nodeCount:edgeCount:cycleCount:timestamp). Any change to the graph invalidates the cache and triggers re-partitioning.

The modularity gain formula for moving a node from one community to another:

// Modularity gain for moving node i to community d
gain     = wTarget - (ki * sigmaTarget) / m2
netGain  = gain - removeGain

// Where:
// wTarget    = sum of edge weights between node i and community d
// ki         = total strength of node i
// sigmaTarget = total strength of community d
// m2         = 2 * total graph weight
// A node moves only when netGain > 1e-10 (numerical stability)

The algorithm runs up to 20 iterations with randomized node processing order for better convergence, terminating early when a full pass produces no moves.

Post-Processing

After Louvain convergence, two adjustments are applied. Merge small communities: communities with fewer than 30 nodes (configurable) are merged into their most-connected neighbor by total edge weight, iteratively until no undersized communities remain. Split large communities: communities exceeding 1,800 nodes (configurable) are bisected using a greedy algorithm -- seed two groups, assign each remaining node to the group it shares more edge weight with, with a balance penalty to prevent degenerate splits. This is simple bisection, not hierarchical Louvain.

Partition Enrichment

Each partition is enriched with metadata before caching:

Centroid embedding -- element-wise mean of all 512-dimensional node embeddings within the partition. Nodes without embeddings are excluded.
Keywords -- term frequency scoring across node concept texts (document frequency, one count per node). Stop words filtered. Top 20 terms stored.
Adjacent partitions -- cross-partition edges enumerated, top 5 neighbor partitions by shared edge count. Used for "peripheral vision" in sweep prompts.
Quick summary -- no-LLM heuristic: top keywords plus the highest-weight node text (first 120 characters).

Phase 1: Route

Given a query, PGS computes its embedding and scores each partition by cosine similarity to its centroid. Partitions above the relevance threshold (default 0.25) are selected, capped at maxSweepPartitions (default 15).

Broad query bypass: Six regex patterns detect comprehensive or open-ended queries (e.g., "what are the gaps," "comprehensive overview," "full sweep"). When matched, routing is skipped and all partitions are selected up to the max cap. This prevents unfocused queries from routing to zero partitions.

Without embedding provider: When no embedding provider is configured, routing cannot compute cosine similarity. PGS degrades gracefully by sweeping all partitions up to the configured maximum.

Phase 2: Sweep

Selected partitions are swept in batches of 5 concurrent LLM calls (configurable). Each batch uses Promise.allSettled for fault tolerance -- a failed sweep does not block or abort others in the batch.

Full-fidelity context: Each sweep receives all nodes in its partition, sorted descending by weight, at full text -- no summarization, no compression, no tiering. A 500,000-character safety cap prevents context overflow for pathologically large partitions, but in practice no real brain has reached this limit.

Adjacent partition context: Summaries and keywords of up to 5 adjacent partitions are appended to the sweep prompt, giving each sweep awareness of neighboring domains without accessing those nodes directly.

Structured response contract: The sweep prompt specifies an exact four-section output structure:

Domain State

2-3 sentences on what this partition covers and its research state relative to the query.

Findings

Key discoveries with Node ID citations for traceability.

Outbound Flags

Specific, characterized connections to other partitions using adjacent partition context.

Absences

What was searched for and NOT found in this partition. The critical innovation.

The Absences section is the critical innovation. Rather than silently omitting missing content, each sweep explicitly reports what it searched for and did not find. This creates a machine-readable gap signal that the synthesis phase aggregates.

Phase 3: Synthesize

A single LLM call receives all sweep outputs concatenated with partition metadata headers. The synthesis prompt specifies four explicit tasks:

Cross-Domain Connection Discovery -- chase the outbound flags from the sweep phase. When Partition A flags a connection to Partition B's domain, determine whether the connection is genuine by consulting Partition B's sweep output.
Absence Detection -- when multiple partitions report absence of the same content, that is high-confidence evidence of a real gap in the knowledge graph. When one partition flags an outbound connection but the target partition reports absence, that identifies a specific research opportunity.
Convergence Identification -- findings appearing independently in multiple partitions are treated as stronger evidence than single-partition findings. Independent observation of the same phenomenon increases confidence.
Thesis Formation -- the synthesizer is instructed not to survey findings but to make claims, commit to positions, and rank insights. The goal is a thesis, not a literature review.

Synthesis uses reasoningEffort: 'high' and streams output to the client as it generates.

Sessions

Session Management

PGS maintains persistent session state, tracking which partitions have been examined across queries. This enables incremental coverage -- a user can progressively exhaust a knowledge graph rather than re-querying from scratch each time.

Three session modes control which partitions are swept:

full (default) -- route the current query against all partitions, sweep up to the configured maximum. Session history is recorded but does not affect partition selection.
continue -- skip partitions already searched. Sweep only the unsearched remainder. Falls back to full sweep if all partitions have been covered. This is the primary mode for incremental deep exploration.
targeted -- re-route the query against only the unsearched partitions. Combines session continuity with relevance routing -- useful when a follow-up query is more focused and should not re-sweep irrelevant partitions from the previous run.

Session tracking accumulates across calls. If a full sweep covers 5 of 11 partitions, a subsequent continue will sweep the remaining 6. After both runs, all 11 partitions have been examined and the session is exhausted.

Configuration

Configuration Reference

All parameters have defaults. Environment variables override defaults; per-call pgsConfig overrides both.

Parameter	Default	Description
`maxConcurrentSweeps`	5	Parallel LLM calls per sweep batch
`minCommunitySize`	30	Merge communities smaller than this
`targetPartitionMax`	1800	Split communities larger than this via bisection
`maxSweepPartitions`	15	Maximum partitions to sweep per query
`minSweepPartitions`	0	Minimum partitions to sweep (0 = relevance only)
`partitionRelevanceThreshold`	0.25	Cosine similarity threshold for routing
`sweepMaxTokens`	6000	Token budget per sweep LLM call
`synthesisMaxTokens`	16000	Token budget for synthesis LLM call
`sweepFraction`	null	Fraction of routed partitions to sweep (0.1-1.0); overrides maxSweepPartitions when set

API

API Reference

Constructor

const { PGSEngine } = require('pgs-engine');

const engine = new PGSEngine({
  sweepProvider,          // LLM for sweep (called N times in parallel)
  synthesisProvider,      // LLM for synthesis (called once)
  embeddingProvider,      // For query routing (optional)
  storage,                // Session/cache persistence (optional)
  config,                 // Override defaults (optional)
  onEvent,                // Global event listener (optional)
});

execute(query, graph, options)

Run the full PGS pipeline: partition, route, sweep, synthesize.

const result = await engine.execute(query, graph, {
  mode: 'full',            // 'full' | 'continue' | 'targeted'
  sessionId: 'default',    // Session tracking identifier
  fullSweep: false,        // Bypass routing, sweep everything
  sweepFraction: 0.5,     // Sweep 50% of routed partitions
  onChunk: (text) => {},  // Synthesis token streaming
});

// result.answer       — The synthesis output
// result.metadata.pgs — { totalNodes, totalPartitions, sweptPartitions, elapsed, ... }

layeredSearch(query, graph, options)

The gold pattern: run a standard top-K query first, then feed that context into a PGS pass. The standard query tells PGS what is already known; PGS finds what is not.

const result = await engine.layeredSearch(query, graph, {
  topK: 20,               // Standard query retrieves top-K nodes first
  standardProvider,        // LLM for standard query
  mode: 'full',
});

// result.answer          — PGS synthesis (enhanced with standard context)
// result.standardAnswer  — What standard retrieval found

partition(graph)

Run Louvain community detection and return partition structure. No LLM calls.

const partitions = engine.partition(graph);
// Array of { id, nodeIds, centroid, keywords, adjacentPartitions, summary }

route(query, graph, partitions)

Score partitions by cosine similarity to the query embedding. Returns partitions above the relevance threshold.

const routed = await engine.route(query, graph, partitions);
// Array of { partition, similarity } sorted by relevance

sweepPartition(query, partition, graph, partitions)

Run a single full-fidelity sweep on one partition. Returns the structured four-section output.

const sweepResult = await engine.sweepPartition(query, routed[0], graph, partitions);
// { content, partitionId, nodeCount, elapsed }

synthesize(query, sweepResults, context)

Single cross-domain reasoning pass over all sweep outputs.

const answer = await engine.synthesize(query, sweepResults, {
  onChunk: (text) => process.stdout.write(text),
});
// Returns the synthesis string

Providers

Provider Interfaces

PGS is provider-agnostic. Implement three simple interfaces to use any LLM or embedding service.

LLMProvider (sweep and synthesis)

{
  generate: async ({
    instructions,      // System prompt
    input,             // User content (graph nodes, query)
    maxTokens,         // Response token budget
    reasoningEffort,   // 'low' | 'medium' | 'high' (advisory)
    onChunk,           // Streaming callback (optional)
  }) => ({ content: string })
}

EmbeddingProvider (optional)

{
  embed: async (text) => number[] | null,  // Returns embedding vector
  dimensions: number                        // Output dimensions (e.g., 512)
}

StorageProvider (optional)

{
  read: async (key) => string | null,   // Read JSON string by key
  write: async (key, data) => void      // Write JSON string
}

Provider Examples

// OpenAI
const engine = new PGSEngine({
  sweepProvider: makeOpenAIProvider('gpt-4o-mini'),
  synthesisProvider: makeOpenAIProvider('gpt-4o'),
  embeddingProvider: makeOpenAIEmbeddingProvider(),
});

// Anthropic
const engine = new PGSEngine({
  sweepProvider: makeAnthropicProvider('claude-sonnet-4-20250514'),
  synthesisProvider: makeAnthropicProvider('claude-opus-4-20250514'),
});

// xAI
const engine = new PGSEngine({
  sweepProvider: makeOpenAICompatibleProvider('https://api.x.ai/v1', key, 'grok-3-mini'),
  synthesisProvider: makeOpenAICompatibleProvider('https://api.x.ai/v1', key, 'grok-3'),
});

Comparison

Comparison to Existing Systems

Feature	Standard RAG	Microsoft GraphRAG	PGS
Retrieval method	Top-K by embedding	Leiden communities + pre-generated reports	Louvain + full-fidelity sweep per query
Coverage awareness	None	Partial (community reports)	Full (session tracking with %)
Absence reporting	No	No	Yes (structured per partition)
Cross-domain discovery	No	Via community summaries	Yes (outbound flags + synthesis chasing)
Convergence detection	No	No	Yes (independent findings across partitions)
Session continuity	No	No	Yes (continue/targeted modes)
Build mode	Batch (offline index)	Batch (offline index)	Online (partition at query time)
Community algorithm	N/A	Leiden (hierarchical)	Louvain (single-level + bisection)
Entity extraction	N/A	Explicit NER pipeline	Pre-existing (graph built by research agents)

GraphRAG is designed as a batch pipeline: extract entities from a text corpus, build a graph, run Leiden, generate community reports, store. Queries run against stored community reports, which reflect the graph at index time and can become stale. PGS is a live query architecture: partitions are cached but sweep outputs are generated fresh per query against the current graph state.

Vector Stores

PGS vs. HNSW Vector Stores

Dimension	HNSW Stores (Pinecone, Weaviate)	PGS
Index type	Approximate nearest-neighbor vector index	Graph-topology-aware community partitioning
Retrieval scope	K nearest vectors in embedding space	Full partition content (all nodes, full text)
Structural relationships	None (vector proximity only)	13 semantic edge types
Gap reporting	No	Structured absences
Coverage tracking	No	Session-based, incremental
Query latency	Milliseconds	Seconds to minutes (N sweep LLM calls)

HNSW stores are read-optimized indexes designed for sub-millisecond retrieval of the K nearest embeddings. PGS trades latency for coverage: it examines entire topological communities at full fidelity rather than individual nearest neighbors. For small graphs or latency-critical applications, HNSW is strictly preferable. For research synthesis, gap analysis, and cross-domain discovery on graphs with 200+ nodes, PGS surfaces what vector similarity alone cannot.

Limitations

Known Limitations

PGS has real constraints. These are documented honestly because understanding them is essential for using the system effectively.

Singleton partition problem. Isolated nodes (zero edges or edges only to the dominant cluster) become their own partitions. Each singleton requires a separate sweep LLM call. For targeted queries, routing assigns low scores to most singletons, but broad queries or full sweeps generate many LLM calls that return only "no findings."
Supercluster tendency. Research graphs are often dominated by one large community. When one partition contains 98% of nodes, PGS for a focused query degenerates to sweeping that one partition -- equivalent to a large standard RAG call. PGS is most beneficial for graphs with meaningful community structure (multiple research domains, merged runs, or deliberately broad topics).
No hierarchical Louvain. The implementation runs only Phase 1 of Louvain (node moves). Phase 2 (graph contraction and recursion) is not implemented. For large dense graphs, Phase 1 alone may produce suboptimal community assignments. The bisection fallback is a workaround, not a substitute.
Unweighted centroids. The centroid embedding is an unweighted mean. High-importance nodes influence the centroid equally to low-importance nodes. A weighted centroid would produce a more representative routing target, but this is not yet implemented.
Latency. PGS makes N+1 LLM calls (N sweeps + 1 synthesis) versus 1 for standard RAG. A 5-partition sweep on local models takes approximately 78 seconds. Cloud APIs with higher throughput complete in 15-30 seconds for the same partition count.

Install

Get the Code

npm install notforyou23/pgs-engine

PGS Engine is open source under MIT license. The package includes a real research brain (586 nodes of conformal field theory research) for exploration and testing without API keys.

View on GitHub npm package