Changelog
Changelog
Section titled “Changelog”Significant changes to Douto, organized chronologically. Follows Keep a Changelog conventions.
Format: each entry lists the date, associated commit(s), and categorized changes (Added, Changed, Fixed, Removed).
v0.1.0 — Initial Setup (2026-02)
Section titled “v0.1.0 — Initial Setup (2026-02)”The Douto repository was created and populated with the pipeline scripts (migrated from an Obsidian vault) and the knowledge base structure.
2026-02-28 — North Star Definition
Section titled “2026-02-28 — North Star Definition”Commit: b7930d3
Reference: SEN-368
AGENTS.mdupdated with Jude.md north star — Douto is formally a component of the unified legal research platform (Juca + Leci + Douto + Valter = Jude.md)- Issue epic SEN-368 defined as the convergence target
2026-02-28 — Knowledge Base Population
Section titled “2026-02-28 — Knowledge Base Population”Commit: c4e2c5b
- Populated MOCs with real corpus data — 56 books classified across 4 legal domains
MOC_CIVIL.md— 35 books, ~9,365 chunks (largest domain)MOC_PROCESSUAL.md— 8 books, ~22,182 chunksMOC_EMPRESARIAL.md— 7 booksMOC_CONSUMIDOR.md— placeholder structure (not yet populated)
2026-02-28 — Pipeline Migration
Section titled “2026-02-28 — Pipeline Migration”Commit: 8f9c702
Reference: SEN-358
pipeline/process_books.py— PDF to markdown extraction via LlamaParse (supports tiers: agentic, cost_effective, fast)pipeline/rechunk_v3.py— Intelligent legal chunking with 5 processing passes, 16 section patterns, running header detection, footnote groupingpipeline/enrich_chunks.py— Chunk enrichment via MiniMax M2.5 (5 concurrent threads, structured legal metadata)pipeline/embed_doutrina.py— Embedding generation using Legal-BERTimbau (768-dim, normalized)pipeline/search_doutrina_v2.py— Hybrid search (semantic cosine + BM25) with interactive CLIpipeline/requirements.txt— Python dependencies (sentence-transformers, torch, numpy, anthropic, llama-parse)
2026-02-28 — Initial Setup
Section titled “2026-02-28 — Initial Setup”Commit: ce16dbc
AGENTS.md— Agent identity, responsibilities, boundaries, and git protocolknowledge/INDEX_DOUTO.md— Skill graph index mapping 8 legal domainsknowledge/mocs/— MOC directory structureknowledge/nodes/.gitkeep— Placeholder for future atomic notestools/.gitkeep— Placeholder for future auxiliary tools.gitignore— Excludes node_modules, embeddings, .env, pycache
Documentation Session (2026-02-28)
Section titled “Documentation Session (2026-02-28)”In a single documentation session, the following strategic documents were created:
CLAUDE.md— Coding guidelines for AI code agents, aligned with the sens.legal ecosystem conventions (priority order, Python conventions, pipeline rules, knowledge base rules, embedding conventions, git patterns)PROJECT_MAP.md— Full project diagnostic: directory tree, stack details, architecture, data flow diagrams, gap analysis, recommendationsROADMAP.md— Product roadmap with 42 features across 6 milestones, 7 pending decisions, risk matrix, and mitigation planPREMORTEM.md— Risk analysis: 6 false premises, 14 technical risks, 5 product risks, 4 execution risks, 7 edge cases, and the IP risk disclosuredocs/— Starlight documentation site with 22+ pages covering getting started, architecture, features, configuration, development, roadmap, and reference
Pre-History
Section titled “Pre-History”The pipeline was developed over an undetermined period inside an Obsidian vault before this repository existed. Key facts about the pre-history:
- The 5 Python scripts in
pipeline/predate the repository - The corpus (~50 books, ~31,500 chunks) was processed before migration
- Hardcoded paths in the scripts reflect the original development environments (Linux native and WSL)
- The enrichment prompt (
enrich_prompt.md) was lost during migration and is not in the repository - Linear issues (SEN-XXX) were used for tracking before the repository had GitHub Issues