# Douto — Legal Doctrine Knowledge Agent > Douto processes legal books (PDF → chunks → embeddings) and maintains a navigable skill graph (INDEX → MOCs → atomic notes) to feed the sens.legal ecosystem. Stack: Python 3 + LlamaParse + MiniMax M2.5 + Legal-BERTimbau. Important notes: - Douto is a batch processing pipeline, not a web service — it produces JSON artifacts consumed by other agents (Valter, Juca) - The corpus contains ~50 legal books (~31,500 enriched chunks) covering Brazilian civil law, procedural law, business law, and consumer law - The enrichment prompt file (enrich_prompt.md) is currently missing from the repository — this is a known critical issue - Embeddings use Legal-BERTimbau (768-dim), a Portuguese legal domain model, with metadata-enriched text composition ## Home - [Douto — Legal Doctrine Knowledge Agent](https://douto.sens.legal): Documentation for Douto, the legal doctrine processing pipeline and knowledge base of the sens.legal ecosystem. ## Getting Started - [Introduction](https://douto.sens.legal/getting-started/introduction): What Douto is, what problem it solves, and why it exists within the sens.legal ecosystem. - [Quickstart](https://douto.sens.legal/getting-started/quickstart): Run a doctrine search in under 5 minutes with Douto's pre-built corpus. - [Installation](https://douto.sens.legal/getting-started/installation): Complete setup guide for running the full Douto pipeline from PDF extraction to search. ## Architecture - [Architecture Overview](https://douto.sens.legal/architecture/overview): How Douto's batch processing pipeline and markdown knowledge graph work together. - [Technology Stack](https://douto.sens.legal/architecture/stack): Complete technology stack with versions, justifications, and dependency map. - [Architecture Decision Records](https://douto.sens.legal/architecture/decisions): Key architectural decisions in Douto — context, rationale, trade-offs, and pending questions. - [Architecture Diagrams](https://douto.sens.legal/architecture/diagrams): Visual diagrams of Douto's architecture, data flow, and ecosystem position in Mermaid. ## Features - [Features](https://douto.sens.legal/features): Complete feature inventory for Douto — implemented, in progress, planned, and proposed — with links to detailed pages. - [PDF Extraction](https://douto.sens.legal/features/pipeline/pdf-extraction): How process_books.py converts legal PDFs to structured markdown using LlamaParse, with chapter splitting and YAML frontmatter generation. - [Intelligent Chunking v3](https://douto.sens.legal/features/pipeline/intelligent-chunking): How rechunk_v3.py splits legal markdown into semantically coherent chunks using a 5-pass algorithm with 14 section patterns, footnote aggregation, and domain-specific heuristics. - [Chunk Enrichment](https://douto.sens.legal/features/pipeline/enrichment): How enrich_chunks.py classifies legal chunks using MiniMax M2.5 to add 13 structured metadata fields via concurrent LLM inference. - [Embedding Generation](https://douto.sens.legal/features/pipeline/embeddings): How embed_doutrina.py generates 768-dimensional Legal-BERTimbau embeddings with a metadata-enriched text composition strategy for semantic search. - [Hybrid Search](https://douto.sens.legal/features/pipeline/hybrid-search): How search_doutrina_v2.py combines semantic search, BM25, and metadata filters across multiple legal areas for doctrine retrieval. - [Skill Graph](https://douto.sens.legal/features/knowledge-base/skill-graph): How INDEX_DOUTO.md organizes 8 legal domains into a navigable knowledge hierarchy using Obsidian-style wikilinks and structured frontmatter. - [Maps of Content (MOCs)](https://douto.sens.legal/features/knowledge-base/mocs): How MOC files catalog legal books by domain with structured metadata, processing status, and corpus statistics across 8 legal domains. - [Atomic Notes](https://douto.sens.legal/features/knowledge-base/atomic-notes): Planned atomic knowledge notes for the nodes/ directory — one per legal concept, auto-generated or curated from enriched chunks. ## Configuration - [Environment Variables](https://douto.sens.legal/configuration/environment): Complete reference for all environment variables used across the Douto pipeline. - [Settings & Configuration](https://douto.sens.legal/configuration/settings): Hardcoded settings, tunable parameters, and configuration constants across the Douto pipeline. - [External Integrations](https://douto.sens.legal/configuration/integrations): Setup and configuration for LlamaParse, MiniMax M2.5, HuggingFace, and the sens.legal ecosystem. ## Development - [Development Setup](https://douto.sens.legal/development/setup): How to set up a development environment for contributing to Douto. - [Coding Conventions](https://douto.sens.legal/development/conventions): Coding standards, naming patterns, and architectural conventions for Douto development. - [Testing](https://douto.sens.legal/development/testing): Current testing status and the planned testing strategy for Douto. - [Contributing Guide](https://douto.sens.legal/development/contributing): How to contribute to Douto — from reporting issues to submitting pull requests. ## Roadmap - [Roadmap](https://douto.sens.legal/roadmap): Douto's product roadmap — vision, current priorities, planned milestones, and top risks. - [Milestones](https://douto.sens.legal/roadmap/milestones): Detailed milestone definitions with features, acceptance criteria, prerequisites, and estimates. - [Changelog](https://douto.sens.legal/roadmap/changelog): History of significant changes, releases, and milestones in Douto. ## Reference - [Glossary](https://douto.sens.legal/reference/glossary): Definitions of legal, technical, and ecosystem terms used throughout the Douto documentation. - [FAQ](https://douto.sens.legal/reference/faq): Frequently asked questions about Douto — for developers, lawyers, and stakeholders. - [Troubleshooting](https://douto.sens.legal/reference/troubleshooting): Common problems and solutions when running the Douto pipeline. ## Meta - [Content Map](https://douto.sens.legal/content-map): Master index of all documentation files — what each contains, where information comes from, and writing priority.