Skip to content

feat: add directory parsing support to OpenViking#194

Merged
MaojiaSheng merged 4 commits intovolcengine:mainfrom
shaoeric:main
Feb 16, 2026
Merged

feat: add directory parsing support to OpenViking#194
MaojiaSheng merged 4 commits intovolcengine:mainfrom
shaoeric:main

Conversation

@shaoeric
Copy link
Contributor

Description

  • Implemented DirectoryParser to handle local directories with mixed document types.
  • Enhanced add_resource function to support directory imports with options for including, excluding, and ignoring specific directories.
  • Updated client and service layers to forward additional parsing options.
  • Added unit tests for DirectoryParser to ensure correct functionality and error handling.
  • Improved user feedback with rich table summaries for processed, failed, unsupported, and skipped files during directory imports.

Related Issue

issue: #80
discussion: #83

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

demo directory
demo_dir.tar.gz

(openviking) amax@amax:~/projects/OpenViking$ tree -a demo_dir
demo_dir
├── assets
│   ├── 图片.png
│   └── README.md
├── code
│   ├── 1.py
│   ├── README.md
│   └── src
│       └── a.py
├── .ignore
│   └── ignore.txt
├── README.md
└── resource
    ├── 1503.pdf
    ├── b
    │   └── c.txt
    └── README.md

6 directories, 10 files

add script:

cd examples/query/
python add.py  path/to/demo_dir --data mydata_dir --exclude "*.png" --ignore-dirs "code/src"
image

the file structure and result below:
mydata_dir.tar.gz

query script:

python query.py "告诉我论文一共多少个参考文献" --data mydata_dir
image

回答结果正确
image

Additional Notes

- Implemented DirectoryParser to handle local directories with mixed document types.
- Enhanced add_resource function to support directory imports with options for including, excluding, and ignoring specific directories.
- Updated client and service layers to forward additional parsing options.
- Added unit tests for DirectoryParser to ensure correct functionality and error handling.
- Improved user feedback with rich table summaries for processed, failed, unsupported, and skipped files during directory imports.
@MaojiaSheng
Copy link
Collaborator

great job

@MaojiaSheng MaojiaSheng requested a review from qin-ctx February 16, 2026 07:27
@MaojiaSheng MaojiaSheng merged commit 780e36a into volcengine:main Feb 16, 2026
5 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants