morphik-core/README.md

85 lines
3.3 KiB
Markdown
Raw Normal View History

2025-01-06 14:10:12 -05:00
# DataBridge Core
2025-03-13 00:48:11 -04:00
[![License](https://img.shields.io/badge/license-MIT-blue)](https://github.com/databridge-org/databridge-core/tree/main?tab=License-1-ov-file#readme)
[![PyPI - Version](https://img.shields.io/pypi/v/databridge-client)](https://pypi.org/project/databridge-client/)
[![Discord](https://img.shields.io/discord/1336524712817332276?logo=discord&label=discord)](https://discord.gg/BwMtv3Zaju)
2025-01-06 14:10:12 -05:00
2025-02-07 20:17:26 -05:00
![DataBridge Demo](db_atf_demo_hq.gif)
2025-03-04 21:30:43 -05:00
**Note:** DataBridge is planning to launch a hosted service soon! Please sign up for the [waitlist](https://docs.google.com/forms/d/1gFoUKzECICugInLkRlAlgwrkRVorfNywAgkmcjmVGkE/edit) if interested!!
2025-01-06 14:10:12 -05:00
DataBridge is a powerful document processing and retrieval system designed for building intelligent document-based applications. It provides a robust foundation for semantic search, document processing, and AI-powered document interactions.
2025-01-07 00:45:28 -05:00
## Documentation
For detailed information about installation, usage, and development:
- [Installation Guide](https://databridge.gitbook.io/databridge-docs/getting-started/installation)
- [Quick Start Guide](https://databridge.gitbook.io/databridge-docs/getting-started/quickstart)
- [API Reference](https://databridge.gitbook.io/databridge-docs/api-reference/overview)
2025-01-06 14:10:12 -05:00
## Core Features
- 🔍 **Semantic Search & Retrieval**
- Intelligent chunk-based document splitting
- Two-stage ranking with vector similarity and neural reranking
- Advanced filtering and metadata support
- Configurable similarity thresholds and result limits
- 📄 **Document Processing**
- Support for PDFs, Word documents, text files, and more
- Intelligent text extraction with structure preservation
- Video content parsing with transcription and metadata extraction
- Automatic chunk generation and embedding
- Metadata and access control management
- 🔌 **Extensible Architecture**
- Modular design with swappable components
- Support for custom parsers and embedding models
- Flexible storage backends (S3, local, etc.)
- Vector store integrations (PostgreSQL with pgvector)
- 🔐 **Security & Access Control**
- Fine-grained document access control
- Reader/Writer/Admin permission levels
- JWT-based authentication
- API key management
- 💻 **Deployment Options**
- Full local deployment support with Ollama for embeddings
- Cloud deployment with managed services
- Hybrid deployment options
- Docker container support
## Key Endpoints
- **Document Operations**
- `POST /ingest/text`: Ingest text content
- `POST /ingest/file`: Ingest file (PDF, DOCX, video, etc.)
- `GET /documents`: List all documents
- `GET /documents/{doc_id}`: Get document details
- `DELETE /documents/{doc_id}`: Delete a document
- **Search & Retrieval**
- `POST /retrieve/chunks`: Search document chunks
- `POST /retrieve/docs`: Search complete documents
- `POST /query`: Generate completions using context
- `GET /documents/{doc_id}/chunks`: Get document chunks
- **System Operations**
- `GET /health`: System health check
- `GET /usage/stats`: Get usage statistics
- `GET /usage/recent`: Get recent operations
- `POST /api-keys`: Generate API keys
2024-11-25 18:10:13 -05:00
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Contributing
We welcome contributions! Please open an issue or submit a pull request.
---
2025-01-06 14:10:12 -05:00
Built with ❤️ by DataBridge