Contributing Guide
Contributing to Douto
Section titled “Contributing to Douto”Guidelines for contributing code, knowledge base content, or documentation to Douto.
Before You Start
Section titled “Before You Start”- Read the introduction — understand what Douto does and its position in the sens.legal ecosystem. See the project overview.
- Set up your environment — follow the Development Setup guide.
- Review the conventions — read the Coding Conventions page.
- Check the roadmap — see current priorities to find high-impact work.
Types of Contributions
Section titled “Types of Contributions”1. Pipeline Improvements
Section titled “1. Pipeline Improvements”Bug fixes, optimizations, and new features for the 5 pipeline scripts. This is where most engineering work happens.
High-impact areas right now:
- Standardizing environment variable usage across all scripts (F22, P0)
- Extracting shared functions to
pipeline/utils.py(F23, P1) - Pinning dependency versions in
requirements.txt(F24, P1)
2. Tests (Highest Impact)
Section titled “2. Tests (Highest Impact)”3. Knowledge Base Content
Section titled “3. Knowledge Base Content”- Populate empty MOCs (MOC_CONSUMIDOR, MOC_TRIBUTARIO, MOC_CONSTITUCIONAL, MOC_COMPLIANCE, MOC_SUCESSOES)
- Catalog new books into existing MOCs
- Create atomic notes (when the
nodes/system is implemented)
4. Documentation
Section titled “4. Documentation”- Improve these docs
- Add inline code comments
- Update README sections
5. Bug Reports
Section titled “5. Bug Reports”File issues with clear reproduction steps. Include:
- Which script and command you ran
- The error message or unexpected behavior
- Your environment (OS, Python version, dependency versions)
- The values of relevant environment variables (redact API keys)
Contribution Workflow
Section titled “Contribution Workflow”Step 1: Find or Create an Issue
Section titled “Step 1: Find or Create an Issue”Check the existing issues and roadmap for work to pick up. If you’re fixing a new bug or proposing a feature, open an issue first to discuss the approach.
Step 2: Create a Branch
Section titled “Step 2: Create a Branch”git fetch origingit checkout maingit pull origin main
# Feature branchgit checkout -b feat/SEN-XXX-short-description
# Bug fix branchgit checkout -b fix/SEN-XXX-short-description
# Documentation branchgit checkout -b docs/short-descriptionStep 3: Make Changes
Section titled “Step 3: Make Changes”Follow the Coding Conventions. Key checkpoints:
- No hardcoded absolute paths
- Type hints on public functions
-
--dry-runsupport if the script modifies data - Specific exception handling (no broad
except Exception) - Structured logging for important events
Step 4: Test Your Changes
Section titled “Step 4: Test Your Changes”Since there are no automated tests yet, verify manually:
# For pipeline changes: test with a small subsetpython3 pipeline/rechunk_v3.py --dry-run --limit 5
# For search changes: run a known query and check resultspython3 pipeline/search_doutrina_v2.py "boa-fe objetiva" --area contratosWhen tests exist (v0.3+):
make testmake lintStep 5: Commit
Section titled “Step 5: Commit”Use Conventional Commits format:
git add pipeline/rechunk_v3.pygit commit -m "feat: add bibliography detection to rechunker -- SEN-XXX"| Prefix | When to use |
|---|---|
feat: | New functionality |
fix: | Bug fix |
docs: | Documentation changes |
refactor: | Code restructuring without behavior change |
test: | Adding or updating tests |
chore: | Build, dependencies, tooling |
Step 6: Push and Open a PR
Section titled “Step 6: Push and Open a PR”git push -u origin feat/SEN-XXX-short-descriptionThen open a pull request on GitHub targeting main.
Pull Request Guidelines
Section titled “Pull Request Guidelines”PR Checklist
Section titled “PR Checklist”- Focused scope — one feature or fix per PR. Don’t mix unrelated changes.
- Clear description — explain what changed and why. Link to the issue or roadmap feature.
- No breaking changes — unless discussed and approved in the issue.
- Tests — add tests for new functionality (when the test framework exists).
- Documentation — update docs if user-facing behavior changes.
- No secrets — double-check that
.envfiles, API keys, or large data files are not included. - No formatting noise — don’t include unrelated whitespace or style changes.
PR Title Format
Section titled “PR Title Format”feat: short description of what this PR doesfix: what was broken and how it's fixeddocs: what documentation was updatedPR Body Template
Section titled “PR Body Template”## What
Brief description of the change.
## Why
Link to issue, roadmap feature, or explain the motivation.
## How
Key implementation decisions. What alternatives were considered.
## Testing
How you verified the change works (manual steps or test commands).Knowledge Base Contributions
Section titled “Knowledge Base Contributions”Special guidelines for changes to the knowledge/ directory:
Adding a New Book to a MOC
Section titled “Adding a New Book to a MOC”- Open the relevant MOC file (e.g.,
knowledge/mocs/MOC_CIVIL.md) - Add the book entry with all required metadata:
### Titulo do Livro- **Autor:** Nome do Autor- **Editora:** Editora- **Edicao:** Xa edicao, ANO- **Chunks:** (pending processing)- **Status:** catalogado- Verify the book’s legal domain matches the MOC
- Do not update
INDEX_DOUTO.mdunless adding a new domain
Adding a New Domain/MOC
Section titled “Adding a New Domain/MOC”- Create
knowledge/mocs/MOC_{DOMAIN}.mdwith required frontmatter - Add the domain to
knowledge/INDEX_DOUTO.mdwith a wikilink - Update the documentation
- Use wikilinks (
[[target]]) for all internal references - Follow the frontmatter schema defined in Conventions
- Set
status_enriquecimentocorrectly — never leave it as"pendente"after enrichment - Include book metadata: title, author, edition, publisher
- Verify links resolve correctly (open in Obsidian to check)
Issue Tracking
Section titled “Issue Tracking”For now:
- File issues on GitHub with descriptive labels
- Reference Linear ticket numbers in commits and PRs when applicable
Suggested labels:
| Label | Color | Use for |
|---|---|---|
tech-debt | red | Hardcoded paths, missing tests, duplicated code |
pipeline | blue | Changes to pipeline scripts |
knowledge-base | green | MOC updates, new books, atomic notes |
integration | purple | sens.legal ecosystem integration |
testing | yellow | Test infrastructure and coverage |
documentation | gray | Docs improvements |
Ownership & Attribution
Section titled “Ownership & Attribution”Douto is the exclusive property of Diego Sens (@sensdiego). All contributions are made under the project’s license terms.
When commits involve AI assistance, use the mandatory format:
Co-Authored-By (execucao): Claude Opus 4.6 <noreply@anthropic.com>The term (execucao) indicates AI assisted with implementation. Conception, architecture, product decisions, and intellectual property remain with the author.