Glossary
Glossary
Section titled “Glossary”Terms and concepts you will encounter in Douto documentation, organized by domain.
Legal Domain Terms
Section titled “Legal Domain Terms”Instituto juridico
Section titled “Instituto juridico”A legal concept or institute — the fundamental unit of legal doctrine classification. Examples: exceptio non adimpleti contractus (defense of non-performance), boa-fe objetiva (objective good faith), tutela antecipada (preliminary injunction). In Douto, each chunk is classified by the instituto(s) it discusses. This is the primary metadata field for filtered search and the planned unit for atomic notes.
Doutrina
Section titled “Doutrina”Legal doctrine — scholarly analysis and interpretation of the law by legal academics and practitioners. Unlike legislation (the law itself) or jurisprudence (court decisions), doutrina represents the academic understanding and theoretical framework of legal concepts. Douto processes doutrina exclusively; case law is handled by Valter/Juca and legislation by Leci.
Ramo do direito
Section titled “Ramo do direito”Branch of law — a broad classification of legal domains. Douto organizes its knowledge base by ramo. The currently recognized branches are:
| Ramo | Portuguese | MOC Status |
|---|---|---|
| Civil law | Direito Civil | Active (35 books) |
| Civil procedure | Direito Processual Civil | Active (8 books) |
| Business law | Direito Empresarial | Active (7 books) |
| Consumer law | Direito do Consumidor | Placeholder |
| Tax law | Direito Tributario | Not created |
| Constitutional law | Direito Constitucional | Not created |
| Compliance & governance | Compliance & Governanca | Not created |
| Succession law | Sucessoes & Planejamento Patrimonial | Not created |
Fontes normativas
Section titled “Fontes normativas”Statutory sources — references to specific laws, articles, and legal provisions cited in doctrine. Examples: “CC art. 476” (Civil Code, article 476), “CPC art. 300” (Civil Procedure Code, article 300). Extracted during enrichment as a metadata field to enable cross-referencing with the Leci legislation service.
Tipo de conteudo
Section titled “Tipo de conteudo”Content type — classification of what a chunk actually contains. Values used in enrichment:
| Value | Meaning |
|---|---|
definicao | Definition of a legal concept |
requisitos | Requirements or elements of a legal institute |
exemplo | Practical example or case illustration |
jurisprudencia_comentada | Commented court decision |
critica_doutrinaria | Doctrinal critique or scholarly debate |
Fase processual
Section titled “Fase processual”Procedural phase — the stage of a legal process or contractual lifecycle. Values: formacao (formation), execucao (execution/performance), extincao (extinction/termination). Used in enrichment metadata to enable phase-specific filtering.
Jurisdicao estrangeira
Section titled “Jurisdicao estrangeira”Foreign jurisdiction — when doctrine references legal systems from countries other than Brazil. Relevant because the corpus includes some international comparative law books.
Technical Terms
Section titled “Technical Terms”A semantically coherent fragment of a legal book, produced by rechunk_v3.py. Chunks are the atomic unit of the pipeline — they are enriched, embedded, and searched individually. A chunk has YAML frontmatter with metadata and a markdown body. Size range: 1,500-15,000 characters of actual text.
Embedding
Section titled “Embedding”A 768-dimensional vector representation of a chunk’s semantic content, generated by the Legal-BERTimbau model. Embeddings capture meaning rather than exact words, enabling semantic search (finding conceptually similar content even when different terminology is used). Stored normalized for cosine similarity computation.
Enrichment
Section titled “Enrichment”The process of classifying chunks with structured metadata using an LLM (currently MiniMax M2.5). Each chunk is analyzed and tagged with instituto, tipo_conteudo, ramo, fase, fontes_normativas, and other fields. This metadata enables filtered search and is the foundation for planned synthesis features.
Hybrid search
Section titled “Hybrid search”A search approach that combines two ranking methods:
- Semantic search — cosine similarity on embeddings (captures meaning)
- BM25 — probabilistic keyword ranking (captures exact terms)
The scores are combined with a configurable weight (default: 0.7 semantic, 0.3 BM25). This produces better results than either method alone, especially for legal queries that mix conceptual intent with specific technical terms.
MOC (Map of Content)
Section titled “MOC (Map of Content)”An index file listing all books within a legal domain, with metadata and processing status. MOCs are the second level of the skill graph hierarchy (INDEX -> MOCs -> Books -> Chunks). Each MOC corresponds to a ramo do direito. File naming convention: MOC_{DOMAIN}.md.
Skill graph
Section titled “Skill graph”The hierarchical knowledge structure maintained by Douto:
INDEX_DOUTO.md # Root: 8 legal domains -> MOC_CIVIL.md # Domain index: 35 books -> Book directories # Per-book chunk collections -> chunk_001.md # Individual enriched chunks -> (future) atomic notes # One per instituto juridicoNavigable via Obsidian’s graph view and wikilinks.
Atomic note
Section titled “Atomic note”A single-concept knowledge note planned for the knowledge/nodes/ directory — one note per instituto juridico, synthesizing information from all chunks that discuss that concept across all books.
Planned Feature — Atomic notes are on the roadmap (F36, v0.5) but not yet implemented. Decision D03 (auto-generated vs. manually curated) is pending.
Frontmatter
Section titled “Frontmatter”YAML metadata block at the top of markdown files, delimited by --- markers. Contains structured data about the chunk (title, author, legal domain, enrichment status, etc.). Parsed by a custom regex-based parser in the pipeline scripts.
---knowledge_id: "contratos-orlando-gomes-cap05-001"tipo: chunktitulo: "Exceptio non adimpleti contractus"livro_titulo: "Contratos"autor: "Orlando Gomes"area_direito: civilstatus_enriquecimento: completo---Running header
Section titled “Running header”Repeated text that appears at the top of PDF pages (typically the book title, chapter name, or author name). These are artifacts of PDF layout, not meaningful content. rechunk_v3.py detects them by frequency analysis and filters them out to prevent false chunks.
Doctrine Brief
Section titled “Doctrine Brief”A synthesized summary of multiple authors’ positions on a single instituto juridico. Structured to include consensus views, divergent positions, historical evolution, and practical implications.
Planned Feature — The Doctrine Brief format is proposed as part of the Synthesis Engine (F43, v0.3.5) but not yet implemented.
Ecosystem Terms
Section titled “Ecosystem Terms”sens.legal
Section titled “sens.legal”The unified legal research platform comprising Douto, Valter, Juca, Leci, and Joseph. Also referred to by the product name Jude.md. Goal: provide Brazilian lawyers with integrated access to case law, legislation, and doctrine through a single interface.
Valter
Section titled “Valter”Backend service for the sens.legal ecosystem. Built with FastAPI, PostgreSQL, Qdrant (vector DB), Neo4j (knowledge graph), and Redis. Handles STJ case law (23,400+ decisions) and 28 MCP tools. Primary consumer of Douto’s doctrine embeddings. Repository: separate.
Frontend hub for sens.legal. Built with Next.js. Provides the user interface for lawyers, including the progressive briefing system (4 phases: diagnostic, precedents, risks, delivery). Accesses doctrine data through Valter.
Legislation service for sens.legal. Built with Next.js, PostgreSQL, and Drizzle ORM. Manages federal legislation database. Future cross-reference target for Douto (F35 — linking doctrinal commentary to specific statutory provisions).
Joseph
Section titled “Joseph”Orchestrator agent for sens.legal. Coordinates work across Valter, Juca, Leci, and Douto. Manages cases and workflow.
Jude.md
Section titled “Jude.md”Product name for the sens.legal unified platform. Juca (jurisprudencia) + Leci (legislacao) + Douto (doutrina) + Valter (backend) = Jude.md. Epic issue: SEN-368.
MCP (Model Context Protocol)
Section titled “MCP (Model Context Protocol)”An open protocol for exposing tools to AI models (developed by Anthropic). Douto plans to expose doctrine search as MCP tools (v0.4, F30), enabling Claude Desktop, Claude Code, and other MCP-compatible clients to query doctrine directly.
Acronyms
Section titled “Acronyms”| Acronym | Full Form | Context |
|---|---|---|
| BM25 | Best Matching 25 | Probabilistic keyword ranking algorithm used in hybrid search |
| BERT | Bidirectional Encoder Representations from Transformers | Architecture behind Legal-BERTimbau |
| STJ | Superior Tribunal de Justica | Brazil’s Superior Court of Justice — primary source for Valter’s case law |
| CPC | Codigo de Processo Civil | Brazilian Civil Procedure Code (Lei 13.105/2015) |
| CC | Codigo Civil | Brazilian Civil Code (Lei 10.406/2002) |
| CDC | Codigo de Defesa do Consumidor | Brazilian Consumer Protection Code (Lei 8.078/1990) |
| ETL | Extract, Transform, Load | Data processing pattern — Douto’s pipeline is an ETL system |
| ADR | Architecture Decision Record | Document recording an architectural decision and its rationale |
| MOC | Map of Content | Index file listing resources within a topic |
| nDCG | Normalized Discounted Cumulative Gain | Search quality metric measuring ranking effectiveness |
| HNSW | Hierarchical Navigable Small World | Approximate nearest neighbor algorithm used by vector databases (e.g., Qdrant) |
| FAISS | Facebook AI Similarity Search | Vector similarity search library by Meta |
| LGPD | Lei Geral de Protecao de Dados | Brazilian General Data Protection Law |
| MCP | Model Context Protocol | Protocol for AI tool exposure (Anthropic) |
| SSE | Server-Sent Events | Unidirectional server-to-client streaming protocol |
| WSL | Windows Subsystem for Linux | Linux compatibility layer on Windows — one of the hardcoded path environments |