Introduction

Douto is the doctrine knowledge agent of the sens.legal platform. It processes legal textbooks into structured, searchable knowledge that lawyers and AI agents can query in real time.

The Problem

Legal research in Brazil requires consulting multiple authors on the same legal concept. A lawyer researching exceptio non adimpleti contractus might need to compare positions from Orlando Gomes, Caio Mário, and Pontes de Miranda — each in a different book, different chapter, different edition. This manual cross-referencing typically takes 2-4 hours per concept.

Current legal tech platforms (Jusbrasil, Turivius, Vlex, LexisNexis) perform search — they return documents that match a query. None of them perform synthesis — aggregating and comparing doctrinal positions across authors.

Douto bridges this gap by transforming raw legal textbooks into structured, classified, searchable knowledge with metadata that enables filtering by legal concept, content type, legal branch, and procedural phase.

What Douto Does

Douto operates in two complementary modes:

Batch Processing Pipeline

Five Python scripts executed sequentially transform legal PDFs into searchable data:

PDF → process_books.py → rechunk_v3.py → enrich_chunks.py → embed_doutrina.py → search_doutrina_v2.py

Each stage adds structure: raw PDF becomes markdown, markdown becomes intelligent chunks, chunks get classified with legal metadata, metadata-enriched text becomes embeddings, and embeddings enable semantic search.

Navigable Knowledge Base

An Obsidian-style markdown hierarchy organized by legal domain:

INDEX_DOUTO.md (root — 8 legal domains)
  └── MOC_CIVIL.md (35 books, ~9,365 chunks)
  └── MOC_PROCESSUAL.md (8 books, ~22,182 chunks)
  └── MOC_EMPRESARIAL.md (7 books)
  └── nodes/ (atomic notes — planned)

Who Uses Douto

Douto serves three audiences:

Audience	How they use Douto	Available today?
Lawyers	Search doctrine via Juca frontend during case research	Not yet — requires v0.4 integration
AI agents	Query doctrine via MCP/API during briefings and analysis	Not yet — MCP planned for v0.4
Developers	Extend the pipeline, add books, improve the knowledge base	Yes — via CLI

Where Douto Fits

In the sens.legal ecosystem, each agent handles a different pillar of legal knowledge:

Agent	Pillar	Current corpus
Valter	Case law (jurisprudência)	23,400+ STJ decisions
Leci	Legislation (legislação)	Federal laws
Douto	Doctrine (doutrina)	~50 books, ~31,500 chunks
Joseph	Orchestration	Coordinates all agents
Juca	Frontend	Presents results to lawyers

When fully integrated, a lawyer asking Juca about a legal concept will receive a unified view combining case law from Valter, legislation from Leci, and doctrine from Douto.

Core Concepts

These terms appear throughout the documentation:

Term	Definition
Chunk	A semantically coherent fragment of a legal book (200-1,000 tokens), produced by `rechunk_v3.py`, with YAML frontmatter metadata
Instituto jurídico	A legal concept or institute — e.g., exceptio non adimpleti contractus, boa-fé objetiva. The fundamental unit of classification.
Enrichment	The process of classifying chunks with structured metadata using an LLM (currently MiniMax M2.5)
Embedding	A 768-dimensional vector representing a chunk’s semantic content, generated by Legal-BERTimbau
MOC	Map of Content — an index file listing all books within a legal domain
Skill graph	The hierarchical knowledge structure: INDEX → MOCs → Chunks → Atomic Notes
Hybrid search	Combination of semantic search (cosine similarity) and BM25 (keyword matching) with weighted scoring

For full definitions, see the Glossary.

What Douto Does NOT Do

Clear boundaries from AGENTS.md:

Does not manage cases — that’s Joseph (orchestrator)
Does not search case law — that’s Valter (23,400+ STJ decisions)
Does not search legislation — that’s Leci (federal laws)
Does not manage infrastructure — that’s Valter (FastAPI backend)
Does not serve a web interface — that’s Juca (Next.js frontend)