Elderly Care Monitoring: AI Event Detection & Alerts

Context & Challenge

Continuous monitoring of elderly people living alone has for decades been addressed either by invasive methods (video surveillance) or passive devices (panic buttons, fitness bracelets). Both have fundamental operational limitations. Cameras invade privacy, require continuous video storage, and depend on operator attention. Buttons and bracelets rely on conscious user action at the moment of crisis – often impossible during a fall, loss of consciousness, or disorientation. The market has long needed an autonomous system that continuously analyses acoustic environment and motion patterns, distinguishes normal activity from real threats, and delivers verified alerts to relatives or caregivers without manual intervention.

The project’s requirements were technically strict and operationally precise: build a system that accepts per‑second motion data and minute‑long audio fragments from a wearable device, processes them in real time, applies multiple specialised neural models for anomaly detection and context verification, and upon incident confirmation raises an alert with escalation to Telegram and, if needed, a voice call. At the same time, the system must preserve user privacy, remain stable under continuous data flow, avoid resource overload, and provide relatives with an objective picture of daily activity without demanding constant attention.

This is not a prototype or research setup. It is a production system designed for long‑term operation where the cost of error is measured in human well‑being, and false positives destroy trust at the most critical moment. Architecture, processing pipelines, memory management, message routing, and user interfaces were built around one principle: engineering discipline over technological hype.

Risks & Architectural Constraints

Designing such a system begins not with model selection but with acknowledging the risks that can destroy its operational value. Five fundamental constraints were identified, each requiring a dedicated architectural solution.

False Positives & Alert Fatigue

Fall detection based solely on accelerometer or simple audio thresholds inevitably produces high noise levels. A bag drop, a chair falling, a loud cough, or TV background noise can be misinterpreted as an incident. If the system sends notifications for every such event, alert fatigue sets in. Within a week, the user stops reacting to alerts, and a real fall goes ignored. Reliability is determined not by alert frequency but by verification accuracy.

Latency & Blocking Synchronous Calls

Neural models for speech recognition, distress analysis, and voice verification are computationally heavy. If audio is fed into a model synchronously inside an HTTP request, the thread blocks until inference completes. With minute‑by‑minute streams from dozens of devices, this leads to cascading queue accumulation, timeouts, and full API stoppage. The system must be asynchronous by design, with clear separation of data ingestion, routing, and computation.

GPU Overload & Memory Management

Models like Qwen2‑Audio or Whisper natively consume 4–8 GB of video memory. With continuous queued messages, workers run inference in parallel, quickly causing OOM and process crashes. The solution requires lazy model loading, LRU caching, automatic unloading of unused weights, 4‑bit quantization, and explicit PyTorch cache clearing after each processing cycle.

Privacy & Compliance

Continuously sending raw audio to the server introduces leak risks, complicates compliance with 152‑FZ, and creates psychological discomfort. Architecture must ensure that raw audio never leaves the device perimeter unless necessary. Only transcripts, vector embeddings, aggregated metrics, and (for confirmed incidents) references to recordings are transmitted. All inferences are logged with timestamps for audit, but full audio files are not stored in the clear.

Network Instability & Context Loss

Wearable devices operate under variable Wi‑Fi quality, intermittent disconnections, and limited battery life. The system must handle dropped packets, restore state after disconnection, store commands for delivery on next ping, and preserve multi‑stage verification context. This requires explicit state management in Redis, timeout monitoring, and deterministic command routing.

Architecture & Solution

The system is built on an event‑driven pipeline with clear separation of responsibilities between devices, API gateway, asynchronous workers, data storage, and user interface.

Minute Sync & Telemetry Collection

The device operates on a strict timer: every 60 seconds it connects to the /alive endpoint, sends the last minute’s motion array and audio fragment, then polls the server for commands. This deterministic heartbeat allows predictable load planning and device status tracking.

Upon receiving the request, FastAPI atomically updates device status in PostgreSQL (is_online, last_sync, last_movement), stores the motion matrix in a compact string format, retrieves pending commands from a Redis queue, and returns them in the response. If no commands are pending, the device continues its loop. If a command exists (e.g., special_ask or restart), the device executes it and confirms completion on the next ping. This design prevents command loss during brief disconnections – commands remain in Redis until acknowledged.

Asynchronous Routing via RabbitMQ

After audio preprocessing (resampling to 16 kHz, noise reduction, normalisation), data is not processed synchronously. It is published to RabbitMQ across four isolated channels:

fall_queue – fall detection via AST classifier;
dla_queue – daily activity analysis (eating, hygiene, sleep, speech);
set_queue – speech recognition, trigger word detection, voice verification;
fall_confirm_queue – distress analysis using Qwen2‑Audio to confirm a real incident.

Queue isolation prevents blocking the critical path, allows independent worker scaling, and enables different durability and retry policies per pipeline.

Multi‑Stage Fall Verification

Fall detection is never a single model call. It proceeds through several stages, each filtering noise and raising confidence.

Stage 1 (AST classification): audio is fed into an Audio Spectrogram Transformer. If confidence is below threshold (0.15) – event is marked as background.
Stage 2 (motion correlation): motion matrix is analysed. No motion → event treated as noise.
Stage 3 (special_ask): device is commanded to ask for confirmation. User responds by voice.
Stage 4 (distress analysis): response is processed by Qwen2‑Audio (4‑bit) to detect pain/panic.
Stage 5 (resident verification): voice is encoded into an embedding (ECAPA‑TDNN) and compared with stored embedding via pgvector cosine similarity.

Only when all factors align, an entry is created in alerts, a Telegram notification is sent, and (if configured) a voice call is initiated.

DLA Aggregation & Reports

Events from dla_queue are aggregated into events_mart with key (serial_number, date). Once per day a personalised report is generated, interpreting activity against age norms (e.g., “Activity is below average for this age – consider increasing physical activity”). Reports are delivered via Telegram bot with configurable time and detail level.

Privacy by Design

Raw audio never leaves the device. Only transcripts, embeddings, and aggregated metrics are sent to the server. Full recordings are created exclusively for confirmed incidents. All inferences are logged for audit, but complete audio files are not stored in plaintext.

Configuration Management & Hot Reload

Thresholds, STT models, trigger words, and timeouts are stored in the algorithm_config table with support for profiles (default, high_sensitivity, low_sensitivity). They are loaded dynamically via ConfigManager. Changes apply without service restart.

Outcomes & Operational Maturity

Stable response with privacy preserved: incident confirmation takes <30 seconds, false positives drastically reduced by multi‑factor verification.
Objective picture of daily activity: daily AI‑generated reports with age‑norm comparison help detect negative trends before a crisis.
Resilience under load: lazy model loading, LRU caching, 4‑bit quantization, and explicit memory control prevent OOM.
Manageability & audit: dynamic configuration, logging of all alerts and inferences, full‑cycle Telegram bot for administration.

Technology Stack

FastAPI – asynchronous HTTP gateway
RabbitMQ – reliable message broker
Redis – state cache and command queue
PostgreSQL + pgvector – event and embedding storage
PyTorch + SpeechBrain + Whisper + AST + Qwen2‑Audio (4‑bit) + ECAPA‑TDNN – model stack
Pyrogram – Telegram bot
Docker Compose – orchestration
Prefect + YandexGPT – ETL and summarisation

Engineering Conclusion

AI does not always need to “understand meaning”. Sometimes it is enough to detect patterns in data (audio, motion), trigger predefined verification scenarios, and raise an alert only when multiple independent factors agree. This is more reliable, cheaper, and simpler for compliance. The system described here does not try to replace human judgment. It builds an engineering perimeter that filters noise, preserves privacy, aggregates metrics, and delivers only confirmed signals. Multi‑stage verification, asynchronous routing, memory management, and dynamic configuration turn a “smart box” into an operationally mature system that can be trusted. Calm, precise, long‑term.