📋 Dataset Details & Samples
Text
Standard chat messages and channel posts.
Videos
Circle videos and round message files.
Voices
Voice messages and audio recordings.
Photos
Compressed and high-res image assets.
Code
Source code repositories and metadata.
📊 Text Statistics
Total messages
200,000,000,000
Total words
6,500,000,000,000
Unique tokens
450B
Languages
100+
Top languages
RU, EN, FA, ZH, AR
Time range
2015 - 2024
📝 Sample Record JSON
{
"message_id": "0089005807967876799",
"user_id_masked": "13838278652382397942",
"group_id_masked": "2046477359139353561",
"message_type": "COMMENT",
"message_date": "2017-01-24 00:10:58",
"message_text": "I am happy to run it on our servers, if you have some instructions? Is it just swapping -ltcmalloc with he?",
"reactions": "[]",
"discussion_top_message_id": "92",
"dissucssion_reply_to_id": "93",
"message_public_url": "https://t.me/clickhouse_en/93"
} 💬 Text Data Stream
Preview UnavailableOpen in Telegram
Preview UnavailableOpen in Telegram
Preview UnavailableOpen in Telegram
Preview UnavailableOpen in Telegram
Preview UnavailableOpen in Telegram
Preview UnavailableOpen in Telegram
Preview UnavailableOpen in Telegram
Preview UnavailableOpen in Telegram
Preview UnavailableOpen in Telegram
Preview UnavailableOpen in Telegram
🌋 Longest Reply Chain
Chronological deep-dive into the longest identified conversation thread.