2025-01-06 14:10:12 -05:00
# DataBridge Core
2025-03-13 00:48:11 -04:00
[](https://github.com/databridge-org/databridge-core/tree/main?tab=License-1-ov-file#readme )
[](https://pypi.org/project/databridge-client/)
[](https://discord.gg/BwMtv3Zaju)
2025-01-06 14:10:12 -05:00
2025-02-07 20:17:26 -05:00

2025-03-04 21:30:43 -05:00
**Note:** DataBridge is planning to launch a hosted service soon! Please sign up for the [waitlist ](https://docs.google.com/forms/d/1gFoUKzECICugInLkRlAlgwrkRVorfNywAgkmcjmVGkE/edit ) if interested!!
2025-01-06 14:10:12 -05:00
DataBridge is a powerful document processing and retrieval system designed for building intelligent document-based applications. It provides a robust foundation for semantic search, document processing, and AI-powered document interactions.
2025-01-07 00:45:28 -05:00
## Documentation
For detailed information about installation, usage, and development:
- [Installation Guide ](https://databridge.gitbook.io/databridge-docs/getting-started/installation )
- [Quick Start Guide ](https://databridge.gitbook.io/databridge-docs/getting-started/quickstart )
- [API Reference ](https://databridge.gitbook.io/databridge-docs/api-reference/overview )
2025-01-06 14:10:12 -05:00
## Core Features
- 🔍 **Semantic Search & Retrieval**
- Intelligent chunk-based document splitting
- Two-stage ranking with vector similarity and neural reranking
- Advanced filtering and metadata support
- Configurable similarity thresholds and result limits
- 📄 **Document Processing**
- Support for PDFs, Word documents, text files, and more
- Intelligent text extraction with structure preservation
- Video content parsing with transcription and metadata extraction
- Automatic chunk generation and embedding
- Metadata and access control management
- 🔌 **Extensible Architecture**
- Modular design with swappable components
- Support for custom parsers and embedding models
- Flexible storage backends (S3, local, etc.)
- Vector store integrations (PostgreSQL with pgvector)
- 🔐 **Security & Access Control**
- Fine-grained document access control
- Reader/Writer/Admin permission levels
- JWT-based authentication
- API key management
- 💻 **Deployment Options**
- Full local deployment support with Ollama for embeddings
- Cloud deployment with managed services
- Hybrid deployment options
- Docker container support
## Key Endpoints
- **Document Operations**
- `POST /ingest/text` : Ingest text content
- `POST /ingest/file` : Ingest file (PDF, DOCX, video, etc.)
- `GET /documents` : List all documents
- `GET /documents/{doc_id}` : Get document details
- `DELETE /documents/{doc_id}` : Delete a document
- **Search & Retrieval**
- `POST /retrieve/chunks` : Search document chunks
- `POST /retrieve/docs` : Search complete documents
- `POST /query` : Generate completions using context
- `GET /documents/{doc_id}/chunks` : Get document chunks
- **System Operations**
- `GET /health` : System health check
- `GET /usage/stats` : Get usage statistics
- `GET /usage/recent` : Get recent operations
- `POST /api-keys` : Generate API keys
2024-11-25 18:10:13 -05:00
## License
This project is licensed under the MIT License - see the [LICENSE ](LICENSE ) file for details.
## Contributing
We welcome contributions! Please open an issue or submit a pull request.
---
2025-01-06 14:10:12 -05:00
Built with ❤️ by DataBridge