Models & Datasets

Open-source foundation models and high-quality datasets for the community.

Language Model
1.2M
15k
OmniCortex-7B

A highly efficient 7B parameter model optimized for reasoning and coding tasks. Outperforms Llama-2-13B on standard benchmarks.

NLPCodingReasoning
Multimodal
850k
12k
OmniVision-Pro

State-of-the-art vision-language model capable of detailed image captioning, visual QA, and object detection.

VisionMultimodal
Audio Generation
500k
8k
Cortex-Audio-2

High-fidelity text-to-audio generation model supporting sound effects, music, and speech in 40+ languages.

AudioTTSMusic
Dataset
2.5M
18k
Omni-Instruct-Dataset

A curated dataset of 5M high-quality instruction-following pairs, filtered for safety and educational value.

DatasetSFTRLHF
Code Model
300k
5k
CodeCortex-34B

Specialized coding model trained on 1T tokens of code across 50+ programming languages. Supports FIM and long context.

CodePythonJavaScript
Benchmark
100k
3k
Safety-Bench-v2

Comprehensive safety evaluation suite for testing LLM robustness against jailbreaks, bias, and toxicity.

SafetyEval

Explore our full library on Hugging Face

Access all our models, datasets, and demos directly on the Hugging Face Hub. Join the community discussion and contribute.