Skip to content

Douto — Legal Doctrine Knowledge Agent

Douto is the legal doctrine knowledge agent for the sens.legal ecosystem. It transforms legal textbooks into searchable, structured, AI-ready knowledge through a five-stage Python pipeline and maintains a navigable skill graph organized by legal domain.

Douto’s pipeline processes legal textbooks from PDF to searchable embeddings:

  • PDF Extraction — converts legal PDFs to structured markdown via LlamaParse
  • Intelligent Chunking — splits documents using legal-domain heuristics (footnote grouping, law article preservation, running header detection)
  • LLM Enrichment — classifies each chunk with structured metadata: instituto jurídico, tipo de conteúdo, ramo do direito, fontes normativas
  • Semantic Embeddings — generates 768-dimensional vectors using Legal-BERTimbau with metadata-enhanced text composition
  • Hybrid Search — combines semantic search (cosine similarity) with BM25 keyword search and metadata filtering
MetricValue
Books processed~50
Chunks in corpus~31,500
Legal domains covered3 active (Civil, Processual, Empresarial) + 5 planned
Embedding dimensions768 (Legal-BERTimbau)
Search modesSemantic, BM25, Hybrid
Test coverage0%
Pipeline scripts5
Versionv0.1.0 (pre-release)
SectionDescription
IntroductionWhat Douto is, why it exists, and who uses it
QuickstartRun a search in under 5 minutes
ArchitectureHow the pipeline and knowledge base work
FeaturesComplete feature inventory with status
RoadmapWhere Douto is going — milestones v0.2 through v1.0
GlossaryLegal and technical terminology

Douto is one of five components in the sens.legal unified legal research platform:

graph LR
subgraph "sens.legal"
JU["Juca<br/>Frontend Hub<br/>Next.js"]
VA["Valter<br/>Case Law + Backend<br/>FastAPI + Neo4j"]
LE["Leci<br/>Legislation<br/>Next.js + PG"]
DO["Douto<br/>Legal Doctrine<br/>Python Pipeline"]
end
USER["Lawyer"] --> JU
JU --> VA
JU --> LE
JU --> DO
VA <-.->|"embeddings,<br/>knowledge graph"| DO
AgentRoleStack
ValterCase law backend — 23,400+ STJ decisions, 28 MCP toolsFastAPI, PostgreSQL, Qdrant, Neo4j, Redis
JucaFrontend hub — user interface for lawyersNext.js 16, block system, briefing progressivo
LeciLegislation — federal law databaseNext.js 15, PostgreSQL, Drizzle
JosephOrchestrator — coordinates agents
DoutoLegal doctrine — this projectPython 3, LlamaParse, Legal-BERTimbau