Skip to content

Quickstart

This guide gets you running a doctrine search against Douto’s pre-built corpus in under 5 minutes. For full pipeline setup (including PDF extraction and enrichment), see Installation.

  • Python 3.10 or later
  • pip
  • ~4 GB RAM (for loading embeddings into memory)
  • Pre-built corpus files (JSON) — available from the project maintainer
Terminal window
git clone https://github.com/sensdiego/douto.git
cd douto
pip install -r pipeline/requirements.txt

Point Douto to the directory containing the pre-built corpus files:

Terminal window
export DATA_PATH="/path/to/your/corpus/data"

This directory must contain:

FileDescription
embeddings_doutrina.json768-dim embedding vectors for contract law chunks
search_corpus_doutrina.jsonMetadata for each chunk (title, author, instituto, etc.)
bm25_index_doutrina.jsonTokenized documents for BM25 keyword search
embeddings_processo_civil.jsonEmbeddings for civil procedure chunks
search_corpus_processo_civil.jsonMetadata for civil procedure chunks
bm25_index_processo_civil.jsonBM25 index for civil procedure
Terminal window
# Search contract law for "exceptio non adimpleti contractus"
python3 pipeline/search_doutrina_v2.py "exceptio non adimpleti contractus" --area contratos
# Search civil procedure for "tutela antecipada"
python3 pipeline/search_doutrina_v2.py "tutela antecipada requisitos" --area processo_civil
# Search all areas
python3 pipeline/search_doutrina_v2.py "boa-fé objetiva" --area all --verbose
Terminal window
python3 pipeline/search_doutrina_v2.py --interativo

In interactive mode, you get a REPL with these commands:

CommandDescription
/area contratos|processo_civil|allSwitch search area
/filtro instituto=X tipo=Y fase=ZSet metadata filters
/verboseToggle text preview of chunks
/top NChange number of results (default: 5)
/bm25Switch to keyword-only search
/semSwitch to semantic-only search
/hybridSwitch to hybrid search (default)
/quitExit
Terminal window
# Search for a specific legal concept (instituto)
python3 pipeline/search_doutrina_v2.py "contrato bilateral" --instituto "exceptio" --area contratos
# Search for definitions only
python3 pipeline/search_doutrina_v2.py "boa-fé objetiva" --tipo "definicao" --verbose

A typical search result looks like this:

1. [0.847] 📗 Da Exceção do Contrato Não Cumprido
📖 Contratos — Orlando Gomes (chunk 26/89) [contratos]
🏷️ exceptio non adimpleti contractus, contrato bilateral | definição, requisitos
ElementMeaning
1.Rank position
[0.847]Relevance score (0-1, higher is better)
📗 / 📘Area: 📗 = contratos, 📘 = processo_civil
First lineChunk title (section heading from the book)
📖Book title, author, chunk position within the book
🏷️Instituto tags and content type tags from enrichment

With --verbose, the actual chunk text (first 300 characters) is also shown.