The core loop is straightforward: 1. Find recently modified markdown files 2. For each file, build a search query from its title and first paragraph 3. Run that query against a semantic search index (cosine similarity) 4. Filter results above a 0.68 similarity threshold 5. Skip files that are already linked 6. Append a `## Related` section with the new wikilinks The interesting part is what happen

Reweave: Teaching Your Vault to Link Itself

Automatic Knowledge Graph Linking

Manual linking breaks knowledge vaults at scale. Obsidian vaults grow to 500+ notes, but graph connections stagnate as humans fail to link new notes to existing ones. The result: disconnected islands of related knowledge that could reinforce each other.

Reweave solves this with automatic semantic linking. It scans recently modified notes, finds related notes using vector embeddings, and auto-appends wikilinks above a relevance threshold (0.68 cosine similarity).

Result: A knowledge vault that auto-discovers and strengthens connections. The system implements "spreading activation" — finding nodes that should be linked but aren't — turning a flat file collection into a self-reinforcing knowledge graph.

The Algorithm

The core loop is straightforward:

Find recently modified markdown files
For each file, build a search query from its title and first paragraph
Run that query against a semantic search index (cosine similarity)
Filter results above a 0.68 similarity threshold
Skip files that are already linked
Append a ## Related section with the new wikilinks

The interesting part is what happens between steps 4 and 6.

The Hub Problem

My first version linked everything to everything. Notes like "README" and "CLAUDE" appeared in the top results for almost every query — they're semantically broad enough to match anything. The vault graph turned into a hub-and-spoke disaster where a few generic notes had dozens of incoming links and the actual interesting connections were buried.

The fix is two-phase processing with hub detection:

# Phase 1: Run all queries, track how often each result appears
declare -A hub_counts
for file in candidates; do
    results=$(semantic_search "$query")
    for result in results; do
        hub_counts["$target"]=$(( ${hub_counts["$target"]:-0} + 1 ))
    done
done
 
# Phase 2: Skip files that appeared in too many result sets
if (( ${hub_counts["$target"]} >= HUB_THRESHOLD )); then
    continue  # This is a hub — skip it
fi

The threshold scales with batch size. In a 5-file scan, appearing twice makes you a hub. In a 20-file scan, you need 4 appearances. This prevents over-linking in small batches while still catching genuine hubs in large scans.

Filters That Earned Their Place

Every filter in the system exists because of a real false positive. Here's the progression:

Generic basenames. Files named README.md, _index.md, and CLAUDE.md exist in every project directory. They match everything semantically because they describe everything. A case-insensitive basename filter catches them all.

Template files. The vault has _templates/career/networking-contact.md which kept appearing as a match for notes about networking. The path filter */_templates/* missed root-relative paths — _templates/career/... doesn't start with */. Fixed by adding both patterns.

Non-markdown files. The semantic search index includes Python scripts. vault-ask.py kept appearing as a link target. A simple extension check removes them.

Stub summaries. Files with "(pending)" or "TODO" in their summary aren't useful link targets. They're placeholders that haven't been written yet.

Git worktrees. Linesheet uses git worktrees for parallel development. The worktree directory contains copies of every file — reweave was modifying the copies alongside the originals. A *-worktrees/* exclusion fixed it.

Well-connected files. Notes with 8+ existing outgoing links are already well-integrated into the graph. Adding more links has diminishing returns and clutters the note.

The Search Infrastructure

Reweave depends on a local semantic search index that covers the entire vault — 12,000+ files embedded with qwen3-embedding:0.6b (12GB+, runs on GPU alongside the inference models). Queries take ~164ms. The index lives in SQLite.

# Build search query from file content
query=$(build_query "$file")
 
# Run semantic search (cosine similarity, top 10)
results=$(python3 vault-search.py "$query" --mode semantic --json --top 10)

The build_query function extracts the first heading and first content paragraph, skipping frontmatter and structural elements. This gives the embedding model enough context without overwhelming it with boilerplate.

Quality Results

After five rounds of bug fixes and filter additions, the system produces consistently good links. A full-vault scan of 1,200+ project files found accurate semantic connections:

Source	Target	Why It Works
CODEX-REVIEW.md	2026-02-codex-review	Both are code review documents for the same project
TESTING.md	TESTING_GUIDE	Testing overview linked to its companion guide
README.md	GETTING-STARTED	Project README linked to the user-facing getting started doc

Zero false positives. The hub detection filtered out generic matches, the path exclusions caught structural files, and the similarity threshold ensured genuine semantic relevance.

Integration with the Heartbeat

Reweave runs as a daily heartbeat task at 04:00. It's script-only — no model inference needed. The semantic search handles the intelligence; the script handles the plumbing.

### reweave
- Schedule: daily 04:00
- Model: none (script-only)
- Script: reweave.sh

It also supports manual runs with path filtering:

# Scan entire vault (no time filter)
bash reweave.sh --full
 
# Scan only one project's files
bash reweave.sh --full --path=Projects/Linesheet

The --path flag is useful after creating a batch of new notes in a specific area — you can immediately wire them into the local graph without waiting for the daily run.

Graph Health Monitoring

To close the loop, the vault health heartbeat now tracks graph metrics via the Obsidian CLI:

if timeout 10 obsidian vault >/dev/null 2>&1; then
    orphan_count=$(timeout 15 obsidian orphans | wc -l)
    deadend_count=$(timeout 15 obsidian deadends | wc -l)
    unresolved_count=$(timeout 15 obsidian unresolved total)
fi

Orphans are notes with no incoming links. Dead-ends are notes with no outgoing links. If either exceeds 20, the health report suggests running reweave. Currently both are at zero — the combination of manual linking and automated reweave is keeping the graph connected.

One gotcha: obsidian orphans total counts all files in the vault, including source code in node_modules/. A 637-note vault reported 255,808 orphans. The fix is piping the markdown-only list through wc -l instead of using the total subcommand. The reweave system turns note-taking from a write-and-forget activity into a self-reinforcing knowledge graph. Every new note automatically discovers its neighbors. Every daily run strengthens connections that a human would eventually make manually — but probably wouldn't get around to. For a vault managing 10+ projects and 12,000+ files, that compound linking is the difference between a flat file system and an actual second brain.

Reweave: Teaching Your Vault to Link Itself

Automatic Knowledge Graph Linking

The Algorithm

The Hub Problem

Filters That Earned Their Place

The Search Infrastructure

Quality Results

Integration with the Heartbeat

Graph Health Monitoring

Related Articles

Building a Knowledge Vault: How I Use Claude to Research 80 Topics in a Day

Overnight Results: From 3.76 to 4.35

Vault Autoresearch: A Personal AI Learns From Itself

Sources

About the Author

Vache Sarkissian