Models & Datasets

Open-source foundation models and high-quality datasets for the community.

Language Model

1.2M

15k

OmniCortex-7B

A highly efficient 7B parameter model optimized for reasoning and coding tasks. Outperforms Llama-2-13B on standard benchmarks.

NLPCodingReasoning

Multimodal

850k

12k

OmniVision-Pro

State-of-the-art vision-language model capable of detailed image captioning, visual QA, and object detection.

VisionMultimodal

Audio Generation

500k

Cortex-Audio-2

High-fidelity text-to-audio generation model supporting sound effects, music, and speech in 40+ languages.

AudioTTSMusic

Dataset

2.5M

18k

Omni-Instruct-Dataset

A curated dataset of 5M high-quality instruction-following pairs, filtered for safety and educational value.

DatasetSFTRLHF

Code Model

300k

CodeCortex-34B

Specialized coding model trained on 1T tokens of code across 50+ programming languages. Supports FIM and long context.

CodePythonJavaScript

Benchmark

100k

Safety-Bench-v2

Comprehensive safety evaluation suite for testing LLM robustness against jailbreaks, bias, and toxicity.

SafetyEval

Explore our full library on Hugging Face

Access all our models, datasets, and demos directly on the Hugging Face Hub. Join the community discussion and contribute.