Skip to content

Glossary

Terms and concepts you will encounter in Douto documentation, organized by domain.


A legal concept or institute — the fundamental unit of legal doctrine classification. Examples: exceptio non adimpleti contractus (defense of non-performance), boa-fe objetiva (objective good faith), tutela antecipada (preliminary injunction). In Douto, each chunk is classified by the instituto(s) it discusses. This is the primary metadata field for filtered search and the planned unit for atomic notes.

Legal doctrine — scholarly analysis and interpretation of the law by legal academics and practitioners. Unlike legislation (the law itself) or jurisprudence (court decisions), doutrina represents the academic understanding and theoretical framework of legal concepts. Douto processes doutrina exclusively; case law is handled by Valter/Juca and legislation by Leci.

Branch of law — a broad classification of legal domains. Douto organizes its knowledge base by ramo. The currently recognized branches are:

RamoPortugueseMOC Status
Civil lawDireito CivilActive (35 books)
Civil procedureDireito Processual CivilActive (8 books)
Business lawDireito EmpresarialActive (7 books)
Consumer lawDireito do ConsumidorPlaceholder
Tax lawDireito TributarioNot created
Constitutional lawDireito ConstitucionalNot created
Compliance & governanceCompliance & GovernancaNot created
Succession lawSucessoes & Planejamento PatrimonialNot created

Statutory sources — references to specific laws, articles, and legal provisions cited in doctrine. Examples: “CC art. 476” (Civil Code, article 476), “CPC art. 300” (Civil Procedure Code, article 300). Extracted during enrichment as a metadata field to enable cross-referencing with the Leci legislation service.

Content type — classification of what a chunk actually contains. Values used in enrichment:

ValueMeaning
definicaoDefinition of a legal concept
requisitosRequirements or elements of a legal institute
exemploPractical example or case illustration
jurisprudencia_comentadaCommented court decision
critica_doutrinariaDoctrinal critique or scholarly debate

Procedural phase — the stage of a legal process or contractual lifecycle. Values: formacao (formation), execucao (execution/performance), extincao (extinction/termination). Used in enrichment metadata to enable phase-specific filtering.

Foreign jurisdiction — when doctrine references legal systems from countries other than Brazil. Relevant because the corpus includes some international comparative law books.


A semantically coherent fragment of a legal book, produced by rechunk_v3.py. Chunks are the atomic unit of the pipeline — they are enriched, embedded, and searched individually. A chunk has YAML frontmatter with metadata and a markdown body. Size range: 1,500-15,000 characters of actual text.

A 768-dimensional vector representation of a chunk’s semantic content, generated by the Legal-BERTimbau model. Embeddings capture meaning rather than exact words, enabling semantic search (finding conceptually similar content even when different terminology is used). Stored normalized for cosine similarity computation.

The process of classifying chunks with structured metadata using an LLM (currently MiniMax M2.5). Each chunk is analyzed and tagged with instituto, tipo_conteudo, ramo, fase, fontes_normativas, and other fields. This metadata enables filtered search and is the foundation for planned synthesis features.

A search approach that combines two ranking methods:

  • Semantic search — cosine similarity on embeddings (captures meaning)
  • BM25 — probabilistic keyword ranking (captures exact terms)

The scores are combined with a configurable weight (default: 0.7 semantic, 0.3 BM25). This produces better results than either method alone, especially for legal queries that mix conceptual intent with specific technical terms.

An index file listing all books within a legal domain, with metadata and processing status. MOCs are the second level of the skill graph hierarchy (INDEX -> MOCs -> Books -> Chunks). Each MOC corresponds to a ramo do direito. File naming convention: MOC_{DOMAIN}.md.

The hierarchical knowledge structure maintained by Douto:

INDEX_DOUTO.md # Root: 8 legal domains
-> MOC_CIVIL.md # Domain index: 35 books
-> Book directories # Per-book chunk collections
-> chunk_001.md # Individual enriched chunks
-> (future) atomic notes # One per instituto juridico

Navigable via Obsidian’s graph view and wikilinks.

A single-concept knowledge note planned for the knowledge/nodes/ directory — one note per instituto juridico, synthesizing information from all chunks that discuss that concept across all books.

Planned Feature — Atomic notes are on the roadmap (F36, v0.5) but not yet implemented. Decision D03 (auto-generated vs. manually curated) is pending.

YAML metadata block at the top of markdown files, delimited by --- markers. Contains structured data about the chunk (title, author, legal domain, enrichment status, etc.). Parsed by a custom regex-based parser in the pipeline scripts.

---
knowledge_id: "contratos-orlando-gomes-cap05-001"
tipo: chunk
titulo: "Exceptio non adimpleti contractus"
livro_titulo: "Contratos"
autor: "Orlando Gomes"
area_direito: civil
status_enriquecimento: completo
---

Repeated text that appears at the top of PDF pages (typically the book title, chapter name, or author name). These are artifacts of PDF layout, not meaningful content. rechunk_v3.py detects them by frequency analysis and filters them out to prevent false chunks.

A synthesized summary of multiple authors’ positions on a single instituto juridico. Structured to include consensus views, divergent positions, historical evolution, and practical implications.

Planned Feature — The Doctrine Brief format is proposed as part of the Synthesis Engine (F43, v0.3.5) but not yet implemented.


The unified legal research platform comprising Douto, Valter, Juca, Leci, and Joseph. Also referred to by the product name Jude.md. Goal: provide Brazilian lawyers with integrated access to case law, legislation, and doctrine through a single interface.

Backend service for the sens.legal ecosystem. Built with FastAPI, PostgreSQL, Qdrant (vector DB), Neo4j (knowledge graph), and Redis. Handles STJ case law (23,400+ decisions) and 28 MCP tools. Primary consumer of Douto’s doctrine embeddings. Repository: separate.

Frontend hub for sens.legal. Built with Next.js. Provides the user interface for lawyers, including the progressive briefing system (4 phases: diagnostic, precedents, risks, delivery). Accesses doctrine data through Valter.

Legislation service for sens.legal. Built with Next.js, PostgreSQL, and Drizzle ORM. Manages federal legislation database. Future cross-reference target for Douto (F35 — linking doctrinal commentary to specific statutory provisions).

Orchestrator agent for sens.legal. Coordinates work across Valter, Juca, Leci, and Douto. Manages cases and workflow.

Product name for the sens.legal unified platform. Juca (jurisprudencia) + Leci (legislacao) + Douto (doutrina) + Valter (backend) = Jude.md. Epic issue: SEN-368.

An open protocol for exposing tools to AI models (developed by Anthropic). Douto plans to expose doctrine search as MCP tools (v0.4, F30), enabling Claude Desktop, Claude Code, and other MCP-compatible clients to query doctrine directly.


AcronymFull FormContext
BM25Best Matching 25Probabilistic keyword ranking algorithm used in hybrid search
BERTBidirectional Encoder Representations from TransformersArchitecture behind Legal-BERTimbau
STJSuperior Tribunal de JusticaBrazil’s Superior Court of Justice — primary source for Valter’s case law
CPCCodigo de Processo CivilBrazilian Civil Procedure Code (Lei 13.105/2015)
CCCodigo CivilBrazilian Civil Code (Lei 10.406/2002)
CDCCodigo de Defesa do ConsumidorBrazilian Consumer Protection Code (Lei 8.078/1990)
ETLExtract, Transform, LoadData processing pattern — Douto’s pipeline is an ETL system
ADRArchitecture Decision RecordDocument recording an architectural decision and its rationale
MOCMap of ContentIndex file listing resources within a topic
nDCGNormalized Discounted Cumulative GainSearch quality metric measuring ranking effectiveness
HNSWHierarchical Navigable Small WorldApproximate nearest neighbor algorithm used by vector databases (e.g., Qdrant)
FAISSFacebook AI Similarity SearchVector similarity search library by Meta
LGPDLei Geral de Protecao de DadosBrazilian General Data Protection Law
MCPModel Context ProtocolProtocol for AI tool exposure (Anthropic)
SSEServer-Sent EventsUnidirectional server-to-client streaming protocol
WSLWindows Subsystem for LinuxLinux compatibility layer on Windows — one of the hardcoded path environments