15 Billion Message ; Py Telegram Bot Architecture

TheRealSenpai · November 4, 2025

Every day, 15 billion messages pass through the Telegram bot ecosystem—not in theory, but right now, just as you’re reading this. Several million requests are being processed somewhere. There are more than ten million active bots, serving upwards of one billion users. These aren’t just autoresponders with a couple of commands.

When I launched my first bot on a $5 VPS about five years ago, it coped fine with 200 users. At 300, memory usage became a pain, and by 500 users, the bot froze. Classic mistakes: synchronous code, polling every second, using SQLite without indexes. Writing a bot isn’t just about handling a couple of commands—the moment it goes into production, things get complicated.

A real bot architecture faces everything from concurrent database access, race conditions, and memory leaks to Telegram’s API rate limits—like 30 messages per second, max. I’ve learned the hard way; try a mass mailing to 100,000 users and Telegram will ban your bot for flooding. And that's just the start—scaling, fault tolerance, monitoring for production bugs come next.

Python is double-edged here. Rapid development is possible thanks to asyncio (since 3.7) and thousands of libraries. But then there’s dynamic typing, a garbage collector that drags at the worst moment, and the GIL—so you have to know the traps and squeeze out performance to handle tens of thousands of requests per second. I’ve kept pure Python bots running on a single server for over half a million users.

How I Run Telegram Bots

A bot is the middleman between my code and Telegram’s servers. There’s no persistent connection—it would be too expensive for such huge infrastructure. Instead, it’s classic HTTPS. Every bot is created through BotFather. I get a token like this:
```
1234567890:ABCdefGHIjklMNOpqrsTUVwxyz-1234567
```
First segment: bot ID. Second: 45-char secret key. Guard this—anyone with the token can control your bot.

Every API request passes this token directly in the URL:
```
https://api.telegram.org/bot<TOKEN>/methodName
```
It’s HTTPS only, or you’ll have a token leak. Telegram takes parameters in JSON (POST), or in form-data for files (like photos/music). When sending media, the file_id’s only reusable in the same bot. Files are held on Telegram’s CDN, and accessed by specific endpoints.

Each user message becomes an Update object—a big JSON with metadata (chat ID, user ID, timestamp, message type, etc). These can be kilobytes if there’s embedded media.

There are two ways to pick up updates : getUpdates (polling) and webhook .

- Polling: The bot keeps asking “anything new?” every second. Long polling keeps the connection open for up to 60s. If the bot’s slow or crashes, updates pile up—Telegram will only hold 100, then starts dropping older updates. I once found my processing bot lost dozens of orders overnight after I had updated the code. Telegram quietly dropped them from the queue.

- Webhook: I give Telegram my endpoint (e.g., `yourserver.com`). Telegram posts updates directly, with no delay. But requirements are strict—a CA-signed SSL cert, public IP, specific ports (443, 80, 88, 8443), no tunnels. With webhooks, Telegram expects HTTP 200 OK response within *60 seconds*. If not, Telegram retries—so duplicate messages can appear. My approach: receive update, immediately return 200 OK, and queue background task (Celery, aioredis). Never process heavy stuff synchronously inside the webhook handler.

API rate limits are harsh and only partly documented.
- Official: ~30 messages/sec per chat
- Global bot limit: 20–30 RPS for `sendMessage`, `sendPhoto`, `editMessageText`, etc.
- Go over: `429 Too Many Requests` with a Retry-After.
- Methods like `getUpdates`, `getMe`, `getChat` are rarely blocked.

File API: Up to 20MB can be downloaded via getFile. Construct a download URL:
```
https://api.telegram.org/file/bot<TOKEN>/<file_path>
```
Big files often come split. You can upload a file once, get a file_id, and then reuse it. The bandwidth savings are huge, especially for stickers or frequently used images.

Inline mode: If someone writes `@yourbot query`, Telegram sends inline_query Update to the bot. It must reply within 30s with answerInlineQuery. Results are cached, so no real-time updates. I’ve seen people hack in dynamic product search with unique IDs, but cache invalidation gets tricky.

Callback queries: Data payload max is 64 bytes. For advanced commands, I always store only IDs in the payload, passing the actual state in the database.

Telegram doesn’t guarantee order for Updates by webhook. Sometimes Update #102 arrives before #101, especially if handled by different Telegram servers. If order matters, I sort by `update_id` or timestamp—it adds latency, but keeps things from breaking.

Webhook vs Polling

Long polling wastes resources. Even with zero users, it’s 86,400 HTTP requests/day —traffic, CPU wasted on JSON, network open and close. The timeout parameter on getUpdates lets you hold connection up to a minute for near-realtime; when a message arrives, the server responds instantly. Polling works everywhere (localhost, VPS, dynamic IP, no SSL needed). Deployment’s simple, good for MVPs/internal bots.

Scaling fails with polling. Horizontal scale—launch three instances, race condition: all do getUpdates, Telegram distributes updates randomly between them. Same user’s state lands in different processes. Only workaround: distributed locks in Redis (hacky). Webhooks solve scaling: Telegram posts updates to one endpoint, so I use nginx or HAProxy to load balance POSTs across my bot servers. Order isn’t guaranteed, so I synchronize state via Redis or DB.

Latency: Webhook is near-instant (50–150ms), polling has 0.5s jitter or more—critical for real-time apps like quizzes. I had to switch to webhook on a quiz bot; responses were processed too slowly via polling.

SSL for webhook is tedious if not self-hosting. Let's Encrypt helps but requires auto-renew, watching for expiration, etc. Telegram accepts only CA-certified certs now.

Hardcoded ports: 443, 80, 88, 8443—can’t pick port 3000 for dev. Have to use a reverse proxy.

A webhook crash is worse than polling —if the webhook fails for over 60 seconds, updates are lost forever. Polling and webhook can be mixed: polling for dev, webhook for production.

Cost: Polling on a $5 VPS copes with 50,000 users if optimized. Webhook setup needs domain, SSL, reverse proxy. Not expensive, but more work.

Async and asyncio in Practice

Synchronous code kills scaling.
Example:

Code:

```python
# Wrong - blocks event loop
async def get_user_data(user_id):
    result = requests.get(f"https://api.example.com/users/{user_id}")
    return result.json()

# Correct - asynchronous HTTP client
async def get_user_data(user_id):
    async with aiohttp.ClientSession() as session:
        async with session.get(f"https://api.example.com/users/{user_id}") as resp:
            return await resp.json()

```
requests.get blocks for the full request; aiohttp lets the loop handle other tasks. With 100 requests at 500ms each, the first approach takes 50s, the async version just 500ms.

Database: psycopg2 is sync and blocking. For async: asyncpg, Motor for MongoDB, SQLAlchemy (since 1.4) supports async with AsyncSession/engine. Connection pooling is vital—asyncpg does this out-of-the-box:

Code:

```python
db_pool = await asyncpg.create_pool(DATABASE_URL)
```
CPU-bound tasks freeze the loop. I use  executors :
```python
import asyncio
from concurrent.futures import ProcessPoolExecutor

async def process_image(image_data):
    loop = asyncio.get_event_loop()
    result = await loop.run_in_executor(None, heavy_image_processing, image_data)
    return result
```

Backpressure: I use `asyncio.Queue(maxsize)` to prevent memory leaks, shunting heavy ops to Celery via Redis.

Debugging async: I log keypoints and use correlation IDs through processing chain. Deadlocks happen—use timeouts `asyncio.wait_for()` everywhere you wait for external resources.

Graceful shutdown: Catch SIGTERM, stop accepting new Updates, wait for active tasks via `asyncio.gather()`, close pools, stop event loop.

User State Handling

Managing user state is the #1 headache. Conversations are never just single messages—a user fills a form, answers several questions, and the bot needs context. The first instinct is a global dictionary:

Code:

```python
user_states = {}
```

It works locally but in production is a disaster: per-process memory, lost on restart, race conditions via load balancer, desynchronized states.

Redis solves persistence and sharing. Key: `user_state:{user_id}`, value: JSON:

Code:

```python
import aioredis
import json

class StateManager:
    def __init__(self, redis_url):
        self.redis = aioredis.from_url(redis_url)

    async def get_state(self, user_id: int):
        data = await self.redis.get(f"user_state:{user_id}")
        return json.loads(data) if data else None

    async def set_state(self, user_id: int, state: str, data: dict = None):
        state_data = {"state": state, "data": data or {}}
        await self.redis.setex(f"user_state:{user_id}", 3600, json.dumps(state_data))

    async def clear_state(self, user_id: int):
        await self.redis.delete(f"user_state:{user_id}")
```

A TTL is critical. Without it, Redis fills with abandoned states—I’ve seen 8GB of memory used for 5,000 actives, containing everyone who ever started `/start`.

Race conditions: Two messages arrive 100ms apart, both handlers read/write Redis almost simultaneously, the second overwrites the first. Use Redis WATCH/MULTI or Lua scripts for atomicity, or Redis Streams for logging—all tradeoffs.

FSM formalizes state:
```python
from aiogram.fsm.state import State, StatesGroup

Code:

class OrderForm(StatesGroup):
    waiting_for_product = State()
    waiting_for_quantity = State()
    waiting_for_address = State()
    confirming = State()
```

Handlers:

Code:

```python
@router.message(OrderForm.waiting_for_quantity)
async def process_quantity(message: Message, state: FSMContext):
    if not message.text.isdigit():
        await message.answer("Enter a number")
        return
    await state.update_data(quantity=int(message.text))
    await state.set_state(OrderForm.waiting_for_address)
    await message.answer("Enter the shipping address")
```

State data is stored separately from the state itself—merge new data with existing (`state.update_data()`), get everything as a dictionary (`state.get_data()`).

Dialog and Business Logic Layer

FSMs control dialogue, but shouldn’t pack business logic. A classic anti-pattern is putting validations/calculations/database ops into handlers:

Code:

```python
# Bad
@router.message(OrderForm.waiting_for_promo)
async def process_promo(message: Message, state: FSMContext):
    promo = message.text.upper().strip()
    if len(promo) < 4 or len(promo) > 20:
        await message.answer("Promo code must be between 4 and 20 chars")
        return
    # DB, business logic tangled up...
```

Proper separation:

Code:

```python
@router.message(OrderForm.waiting_for_promo)
async def process_promo(message: Message, state: FSMContext):
    promo_code = message.text.strip()
    result = await promo_service.validate_and_apply(promo_code)
    if not result.is_valid:
        await message.answer(result.error_message)
        return
    await state.update_data(promo_code=promo_code, discount=result.discount_amount)
    await state.set_state(OrderForm.confirming)
    await message.answer(f"Promo applied! Discount: {result.discount_amount}%")
```

Business services don’t depend on Telegram—they’re pure Python classes. Easily unit-testable and reusable.

Dependency Injection (DI)

From Aiogram v3.x, DI is standard:

Code:

```python
router = Router()

async def setup_dependencies(dp: Dispatcher):
    db_pool = await asyncpg.create_pool(DATABASE_URL)
    dp["db_pool"] = db_pool
    dp["promo_service"] = PromoService(db_pool)

@router.message(Command("start"))
async def cmd_start(message: Message, order_service: OrderService, state: FSMContext):
    orders = await order_service.get_user_orders(message.from_user.id)
```

DI beats globals/singletons—great for testing/mocks and running different configs.

Context Actions and Fallbacks

Universal handlers are a must (e.g., `/cancel` in any state):

Code:

```python
@router.message(Command("cancel"))
async def cmd_cancel(message: Message, state: FSMContext):
    current_state = await state.get_state()
    if current_state is None:
        await message.answer("Nothing to undo")
        return
    await state.clear()
    await message.answer("Dialog canceled. Start again with /start")
```
Fallbacks catch and explain incorrect user input:
```python
@router.message(OrderForm.waiting_for_quantity)
async def invalid_quantity(message: Message):
    await message.response("Please enter the quantity as a number. For example: 5")
```

Middleware: Making Architecture Flexible

Middleware runs before/after the handler and is perfect for cross-cutting concerns (auth, logging, rate limiting, etc):

Code:

```python
from aiogram import BaseMiddleware

class AuthMiddleware(BaseMiddleware):
    async def __call__(self, handler, event, data):
        user_id = event.from_user.id
        if not await self.check_access(user_id):
            await event.response("You don't have access to this function")
            return
        return await handler(event, data)

    async def check_access(self, user_id: int) -> bool:
        return user_id in ALLOWED_USERS
```

Middleware order is key—rate limits, logging, then auth. If you get it wrong, bots can crash under load attacks.

User loading example:

Code:

```python
class UserLoaderMiddleware(BaseMiddleware):
    async def __call__(self, handler, event, data):
        user = await db.get_user(event.from_user.id)
        data["db_user"] = user
        return await handler(event, data)

@router.message(Command("profile"))
async def show_profile(message: Message, db_user: User):
    await message.answer(f"Your balance: {db_user.balance}")
```

Error Handling:

Code:

```python
class ErrorHandlerMiddleware(BaseMiddleware):
    async def __call__(self, handler, event, data):
        try:
            return await handler(event, data)
        except ValidationError as e:
            await event.answer(f"Validation error: {e}")
            logger.warning(f"Validation error user {event.from_user.id}: {e}")
        except DatabaseError as e:
            await event.answer("Technical problems, try again later")
            logger.error(f"Database error: {e}", exc_info=True)
        except Exception as e:
            await event.answer("Something went wrong")
            logger.critical(f"Unhandled exception: {e}", exc_info=True)
```

Middleware can be attached per-router for admin/fallbacks, saving the main chain from bloat.

A final note:
- These practices let me run bots at scale, handling everything from rate-limits and async IO to robust state/middleware design.
- Every code snippet and architectural insight here comes straight from my own production experience. Feel free to use, copy, or adapt for your own projects.

15 Billion Message ; Py Telegram Bot Architecture

More options

TheRealSenpai

New Member

Similar threads

15 Billion Message ; Py Telegram Bot Architecture

TheRealSenpai

New Member

Similar threads

Privacy & Transparency

Privacy & Transparency