Artificial intelligence-collaborative folk music composition system based on gesture recognition: A real-time interactive framework integrating computer vision and folk music generation
Artificial intelligence (AI) and gesture recognition offer new creative possibilities, yet culturally sensitive, real-time systems for gestural folk music composition remain largely undeveloped. This study develops an AI-collaborative folk music composition system that integrates computer vision-based gesture recognition with specialized folk music generation algorithms to create a real-time interactive framework that supports traditional music composition while preserving cultural musical characteristics across multiple folk traditions. The system employs a four-layer architecture encompassing gesture acquisition, computer vision processing, interpretation, and generation layers. A comprehensive dataset of 1,643 folk music compositions from established repositories representing English, American, Irish, and Chinese traditional music (Nottingham Dataset, Irish Traditional Corpus, and self-recorded materials) was curated, supplemented by 6,127 successfully tracked gesture samples collected from 47 participants across 12 folk music gesture categories. The evaluation framework assessed gesture recognition accuracy, cultural authenticity preservation, real-time performance, and collaborative effectiveness through extensive experimental validation. The system achieved robust gesture recognition performance with 88.9% accuracy and 23.4 ms processing latency, while maintaining end-to-end response times of 86.8–91.6 ms during collaborative sessions. Cultural authenticity scores ranged from 7.6 to 8.3 across different regional folk styles, with a user satisfaction rating of 7.8 and a 28% improvement in musical coherence compared to baseline approaches. The framework successfully supports up to eight concurrent users while maintaining sub-100 ms real-time performance requirements. The integrated system successfully demonstrates effective coordination between gesture recognition and folk music generation subsystems, validating the architectural design and optimization strategies for culturally sensitive AI applications across diverse folk music traditions. The validated framework provides a foundation for educational, performance, and cultural preservation applications, contributing methodological insights for multimodal human–AI interaction systems and culturally aware creative technologies applicable to traditional music contexts.
Berkowitz, A.E. (2024). Artificial intelligence and musicking: A philosophical inquiry. Music Perception: An Interdisciplinary Journal, 41(5), 393–412. https://doi.org/10.1525/mp.2024.41.5.393
Bian, W., Song, Y., Gu, N., Chan, T.Y., Lo, T.T., Li, T.S., et al. (2023). MoMusic: A motion-driven human-AI collaborative music composition and performing system. Proceedings of the AAAI Conference on Artificial Intelligence, 37(13), 16057–16062. https://doi.org/10.1609/aaai.v37i13.26907
Borovik, I., & Viro, V. (2023). Co-performing music with AI: Real-time performance control using speech and gestures. In: HHAI 2023: Augmenting Human Intellect. IOS Press, Amsterdam, p340-350. https://doi.org/10.3233/FAIA230097
Boulkroune, A., Hamel, S., Zouari, F., Boukabou, A., & Ibeas, A. (2017). Output‐feedback controller based projective lag‐synchronization of uncertain chaotic systems in the presence of input nonlinearities. Mathematical Problems in Engineering, 2017(1), 8045803. https://doi.org/10.1155/2017/8045803
Boulkroune, A., Zouari, F., & Boubellouta, A. (2025). Adaptive fuzzy control for practical fixed-time synchronization of fractional-order chaotic systems. Journal of Vibration and Control, 10775463251320258.
Chang, J., Wang, Z., & Yan, C. (2024). MusicARLtrans Net: A multimodal agent interactive music education system driven via reinforcement learning. Frontiers in Neurorobotics, 18, 1479694. https://doi.org/10.3389/fnbot.2024.1479694
Chen, Y., Huang, L., & Gou, T. (2024). Applications and Advances of Artificial Intelligence in Music Generation: A Review. [arXiv Preprint]. https://doi.org/10.48550/arXiv.2409.03715
Cheng, L. (2025). The impact of generative AI on school music education: Challenges and recommendations. Arts Education Policy Review, 126, 255–262. https://doi.org/10.1080/10632913.2025.2451373
Civit, M., Civit-Masot, J., Cuadrado, F., & Escalona, M.J. (2022). A systematic review of artificial intelligence-based music generation: Scope, applications, and future trends. Expert Systems with Applications, 209, 118190. https://doi.org/10.1016/j.eswa.2022.118190
Dalmazzo, D., Waddell, G., & Ramírez, R. (2021). Applying deep learning techniques to estimate patterns of musical gesture. Frontiers in Psychology, 11, 575971. https://doi.org/10.3389/fpsyg.2020.575971
Dash, A., & Agres, K. (2024). AI-based affective music generation systems: A review of methods and challenges. ACM Computing Surveys, 56(11), 1–34. https://doi.org/10.1145/3672554
Dawande, A., Chourasia, U., & Dixit, P. (2023). Music Generation and Composition Using Machine Learning. In: Proceedings of 3rd International Conference on Artificial Intelligence: Advances and Applications: ICAIAA 2022, p547–566. https://doi.org/10.1007/978-981-19-7041-2_46
Dritsas, E., Trigka, M., Troussas, C., & Mylonas, P. (2025). Multimodal interaction, interfaces, and communication: A survey. Multimodal Technologies and Interaction, 9(1), 6. https://doi.org/10.3390/mti9010006
Fan, M. (2022). Application of music industry based on the deep neural network. Scientific Programming, 2022(1), 4068207. https://doi.org/10.1155/2022/4068207
Ferreira, P., Limongi, R., & Fávero, L.P. (2023). Generating music with data: Application of deep learning models for symbolic music composition. Applied Sciences, 13(7), 4543. https://doi.org/10.3390/app13074543
Fu, Y., Newman, M., Going, L., Feng, Q., & Lee, J.H. (2025). Exploring the Collaborative Co-Creation Process with AI: A Case Study in Novice Music Production. In: Proceedings of the 2025 ACM Designing Interactive Systems Conference, p1298-1312. https://doi.org/10.1145/3715336.3735829
Gao, X., Rogel, A., Sankaranarayanan, R., Dowling, B., & Weinberg, G. (2024). Music, body, and machine: Gesture-based synchronization in human-robot musical interaction. Frontiers in Robotics and AI, 11, 1461615. https://doi.org/10.3389/frobt.2024.1461615
Graf, M., Opara, H.C., & Barthet, M. (2021). An Audio-Driven System for Real-Time Music Visualisation. [arXiv Preprint].
Hansen, N.C., Højlund, A., Møller, C., Pearce, M., & Vuust, P. (2022). Musicians show more integrated neural processing of contextually relevant acoustic features. Frontiers in Neuroscience, 16, 907540. https://doi.org/10.3389/fnins.2022.907540
Hernandez-Olivan, C., & Beltran, J.R. (2022). Music composition with deep learning: A review. In: Advances in Speech and Music Technology: Computational Aspects and Applications. Springer Nature, Germany, p25-50. https://doi.org/10.1007/978-3-031-18444-4_2
Huang, J., Weber, C.J., & Rothe, S. (2025). An AI-driven Music Visualization System for Generating Meaningful Audio-Responsive Visuals in Real-Time. In: Proceedings of the 2025 ACM International Conference on Interactive Media Experiences, p258-274. https://doi.org/10.1145/3706370.3727869
Ji, S., Yang, X., & Luo, J. (2023). A survey on deep learning for symbolic music generation: Representations, algorithms, evaluations, and challenges. ACM Computing Surveys, 56(1), 1-39. https://doi.org/10.1145/3597493
Jia, J., He, Y., & Le, H. (2020). A Multimodal Human-Computer Interaction System and Its Application in Smart Learning Environments. In: International Conference on Blended Learning, p3-14.
Johansen, S.S., Van Berkel, N., & Fritsch, J. (2022). Characterising Soundscape Research in Human-Computer Interaction. In: Proceedings of the 2022 ACM Designing Interactive Systems Conference, p1394-1417. https://doi.org/10.1145/3532106.3533458
Kapoor, S. (2025). The Many Faces of Uncertainty Estimation in Machine Learning. New York University. Available from: https://www.proquest.com/openview/92ed381924762b1c4afbf2a168231b2f/1?pq-origsite=gscholar&cbl=18750&diss=y
Kim, G., Kim, D.K., & Jeong, H. (2024). Spontaneous emergence of rudimentary music detectors in deep neural networks. Nature Communications, 15(1), 148. https://doi.org/10.1038/s41467-023-44516-0
Krol, S.J., Llano Rodriguez, M.T., & Loor Paredes, M.J. (2025). Exploring the Needs of Practising Musicians in Co-Creative AI Through Co-Design. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, p1-13. https://doi.org/10.1145/3706598.3713894
Lee, K.J.M., Pasquier, P., & Yuri, J. (2025). Revival: Collaborative Artistic Creation through Human-AI Interactions in Musical Creativity. [arXiv Preprint]. https://doi.org/10.48550/arXiv.2503.15498
Li, J., Xu, W., Cao, Y., Liu, W., & Cheng, W. (2020). Robust piano music transcription based on computer Vision. In: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference and 2020 3rd International Conference on Big Data and Artificial Intelligence, p92-97. https://doi.org/10.1145/3409501.3409540
Liang, J. (2023). Harmonizing minds and machines: Survey on transformative power of machine learning in music. Frontiers in Neurorobotics, 17, 1267561. https://doi.org/10.3389/fnbot.2023.1267561
Otsu, K., Yuan, J., Fukuda, H., Kobayashi, Y., Kuno, Y., & Yamazaki, K. (2021). Enhancing Multimodal Interaction between Performers and Audience Members During Live Music Performances. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, p1-6. https://doi.org/10.1145/3411763.3451584
Pricop, T.C., & Iftene, A. (2024). Music generation with machine learning and deep neural networks. Procedia Computer Science, 246, 1855-1864. https://doi.org/10.1016/j.procs.2024.09.692
Rezwana, J., & Maher, M.L. (2023). Designing creative AI partners with COFI: A framework for modeling interaction in human-AI co-creative systems. ACM Transactions on Computer-Human Interaction, 30(5), 1-28. https://doi.org/10.1145/3519026
Rigatos, G., Abbaszadeh, M., Sari, B., Siano, P., Cuccurullo, G., & Zouari, F. (2023). Nonlinear optimal control for a gas compressor driven by an induction motor. Results in Control and Optimization, 11, 100226. https://doi.org/10.1016/j.rico.2023.100226
Roche, F. (2020). Music Sound Synthesis using Machine Learning: Towards a Perceptually Relevant Control Spac. Université Grenoble Alpes. Available from: https://theses.hal.science/tel-03102796v1 [Last accessed on 2024 Mar 12].
Sturm, B.L., & Ben-Tal, O. (2021). Folk the Algorithms: (Mis) Applying Artificial Intelligence to Folk Music. In: Handbook of Artificial Intelligence for Music: Foundations, Advanced Approaches, and Developments for Creativity. Springer, Germany, p423-454. https://doi.org/10.1007/978-3-030-72116-9_16
Vear, C., Benford, S., Avila, J.M., & Moroz, S. (2023). Human-AI Musicking: A Framework for Designing AI for Music Co-creativity. AIMC 2023. Available from: https://aimc2023.pubpub.org/pub/zd46ltn3[Last accessed on 2024 Apr 25].
Yimer, M.H., Yu, Y., Adu, K., Favour, E., Liyih, S.M., & Patamia, R.A. (2023). Music Genre Classification using Deep Neural Networks. In: 2023 35th Chinese Control and Decision Conference (CCDC), p2384-2391. https://doi.org/10.1109/CCDC58219.2023.10327367
Zhao, Y., Yang, M., Lin, Y., Zhang, X., Shi, F., Wang, Z., et al. (2025). AI-enabled text-to-music generation: A comprehensive review of methods, frameworks, and future directions. Electronics, 14(6), 1197. https://doi.org/10.3390/electronics14061197
Zhu, T., Liu, H., Jiang, Z., & Zheng, Z. (2024). Symbolic Music Generation with Fine-grained Interactive Textural Guidance. Available from: https://openreview.net/forum?id=Qt5sBi0u7I [Last accessed on 2025 Jan 15].
Zouari, F., Saad, K.B., & Benrejeb, M. (2012). Robust neural adaptive control for a class of uncertain nonlinear complex dynamical multivariable systems. International Review on Modelling and Simulations, 5(5), 2075-2103. https://doi.org/10.1109/TNN.2010.2042611
Zouari, F., Saad, K.B., & Benrejeb, M. (2013a). Adaptive backstepping control for a class of uncertain single input single output nonlinear systems. In: 10th International Multi-Conferences on Systems, Signals and Devices 2013 (SSD13), p1-6. https://doi.org/10.1109/SSD.2013.6564134
Zouari, F., Saad, K.B., & Benrejeb, M. (2013b). Adaptive Backstepping Control for a Single-Link Flexible Robot Manipulator Driven DC Motor. In: 2013 International Conference on Control, Decision and Information Technologies (CoDIT), p864-871. https://doi.org/10.1109/codit.2013.6689656
