In a time of rapid digital transformation, demographic crowd analysis has become a mainstay of innovative city applications, especially in crowd management, crisis forecasting, and sustainable urban planning. Traditional crowd analysis models lack integration, accumulate errors due to the separation of detection and classification, exhibit poor performance in crowded conditions or inadequate lighting, and pose difficulties for real- time operation on edge modules (Elbishlawi et al., 2022; Gao et al., 2024). The research problem: the absence of a unified model capable of operating efficiently in real time while maintaining high accuracy and specificity (Bai et al., 2022; Li et al., 2024). The research aims to design an intelligent, field-deployable system for monitoring and classifying individuals during a visit to the fortieth, and predicting crowd behavior, within a framework that respects data privacy via federated learning. It evaluates it via the digital twin in a Simulink environment. Despite the rapid progress in computer vision technologies over the past decade, traditional systems are still unable to meet the requirements of practical operation in high-density environments or non-ideal contexts, due to three fundamental gaps: − The structural separation between the detection and classification tasks leads to an accumulation of errors by up to 42%. − Contextual fragility, where the classification accuracy drops to 64% in the absence of faces or deterioration of image quality. − The intractable trade-off between accuracy and time performance hinders the operation of real-time systems in resource-limited edge environments [10].