Skip to content

Contributing Guide

Guidelines for contributing code, knowledge base content, or documentation to Douto.

  1. Read the introduction — understand what Douto does and its position in the sens.legal ecosystem. See the project overview.
  2. Set up your environment — follow the Development Setup guide.
  3. Review the conventions — read the Coding Conventions page.
  4. Check the roadmap — see current priorities to find high-impact work.

Bug fixes, optimizations, and new features for the 5 pipeline scripts. This is where most engineering work happens.

High-impact areas right now:

  • Standardizing environment variable usage across all scripts (F22, P0)
  • Extracting shared functions to pipeline/utils.py (F23, P1)
  • Pinning dependency versions in requirements.txt (F24, P1)
  • Populate empty MOCs (MOC_CONSUMIDOR, MOC_TRIBUTARIO, MOC_CONSTITUCIONAL, MOC_COMPLIANCE, MOC_SUCESSOES)
  • Catalog new books into existing MOCs
  • Create atomic notes (when the nodes/ system is implemented)
  • Improve these docs
  • Add inline code comments
  • Update README sections

File issues with clear reproduction steps. Include:

  • Which script and command you ran
  • The error message or unexpected behavior
  • Your environment (OS, Python version, dependency versions)
  • The values of relevant environment variables (redact API keys)

Check the existing issues and roadmap for work to pick up. If you’re fixing a new bug or proposing a feature, open an issue first to discuss the approach.

Terminal window
git fetch origin
git checkout main
git pull origin main
# Feature branch
git checkout -b feat/SEN-XXX-short-description
# Bug fix branch
git checkout -b fix/SEN-XXX-short-description
# Documentation branch
git checkout -b docs/short-description

Follow the Coding Conventions. Key checkpoints:

  • No hardcoded absolute paths
  • Type hints on public functions
  • --dry-run support if the script modifies data
  • Specific exception handling (no broad except Exception)
  • Structured logging for important events

Since there are no automated tests yet, verify manually:

Terminal window
# For pipeline changes: test with a small subset
python3 pipeline/rechunk_v3.py --dry-run --limit 5
# For search changes: run a known query and check results
python3 pipeline/search_doutrina_v2.py "boa-fe objetiva" --area contratos

When tests exist (v0.3+):

Terminal window
make test
make lint

Use Conventional Commits format:

Terminal window
git add pipeline/rechunk_v3.py
git commit -m "feat: add bibliography detection to rechunker -- SEN-XXX"
PrefixWhen to use
feat:New functionality
fix:Bug fix
docs:Documentation changes
refactor:Code restructuring without behavior change
test:Adding or updating tests
chore:Build, dependencies, tooling
Terminal window
git push -u origin feat/SEN-XXX-short-description

Then open a pull request on GitHub targeting main.

  • Focused scope — one feature or fix per PR. Don’t mix unrelated changes.
  • Clear description — explain what changed and why. Link to the issue or roadmap feature.
  • No breaking changes — unless discussed and approved in the issue.
  • Tests — add tests for new functionality (when the test framework exists).
  • Documentation — update docs if user-facing behavior changes.
  • No secrets — double-check that .env files, API keys, or large data files are not included.
  • No formatting noise — don’t include unrelated whitespace or style changes.
feat: short description of what this PR does
fix: what was broken and how it's fixed
docs: what documentation was updated
## What
Brief description of the change.
## Why
Link to issue, roadmap feature, or explain the motivation.
## How
Key implementation decisions. What alternatives were considered.
## Testing
How you verified the change works (manual steps or test commands).

Special guidelines for changes to the knowledge/ directory:

  1. Open the relevant MOC file (e.g., knowledge/mocs/MOC_CIVIL.md)
  2. Add the book entry with all required metadata:
### Titulo do Livro
- **Autor:** Nome do Autor
- **Editora:** Editora
- **Edicao:** Xa edicao, ANO
- **Chunks:** (pending processing)
- **Status:** catalogado
  1. Verify the book’s legal domain matches the MOC
  2. Do not update INDEX_DOUTO.md unless adding a new domain
  1. Create knowledge/mocs/MOC_{DOMAIN}.md with required frontmatter
  2. Add the domain to knowledge/INDEX_DOUTO.md with a wikilink
  3. Update the documentation
  • Use wikilinks ([[target]]) for all internal references
  • Follow the frontmatter schema defined in Conventions
  • Set status_enriquecimento correctly — never leave it as "pendente" after enrichment
  • Include book metadata: title, author, edition, publisher
  • Verify links resolve correctly (open in Obsidian to check)

For now:

  • File issues on GitHub with descriptive labels
  • Reference Linear ticket numbers in commits and PRs when applicable

Suggested labels:

LabelColorUse for
tech-debtredHardcoded paths, missing tests, duplicated code
pipelineblueChanges to pipeline scripts
knowledge-basegreenMOC updates, new books, atomic notes
integrationpurplesens.legal ecosystem integration
testingyellowTest infrastructure and coverage
documentationgrayDocs improvements

Douto is the exclusive property of Diego Sens (@sensdiego). All contributions are made under the project’s license terms.

When commits involve AI assistance, use the mandatory format:

Co-Authored-By (execucao): Claude Opus 4.6 <noreply@anthropic.com>

The term (execucao) indicates AI assisted with implementation. Conception, architecture, product decisions, and intellectual property remain with the author.