Artificial intelligence-collaborative folk music composition system based on gesture recognition: A real-time interactive framework integrating computer vision and folk music generation

Qinghao Liu¹, Tazul Izan Tajuddin^1,2*

Show Less

¹ Faculty of Music, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia

² Institut Seni Kreatif Nusantara, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia

IJOSI 2025 , 9(6), 44–62; https://doi.org/10.6977/IJoSI.202512_9(6).0004

Submitted: 11 August 2025 | Revised: 7 November 2025 | Accepted: 2 December 2025 | Published: 29 December 2025

© 2025 by the Author (s). Licensee AccScience Publishing, USA. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC BY-NC 4.0) ( https://creativecommons.org/licenses/by-nc/4.0/ )

Abstract

Artificial intelligence (AI) and gesture recognition offer new creative possibilities, yet culturally sensitive, real-time systems for gestural folk music composition remain largely undeveloped. This study develops an AI-collaborative folk music composition system that integrates computer vision-based gesture recognition with specialized folk music generation algorithms to create a real-time interactive framework that supports traditional music composition while preserving cultural musical characteristics across multiple folk traditions. The system employs a four-layer architecture encompassing gesture acquisition, computer vision processing, interpretation, and generation layers. A comprehensive dataset of 1,643 folk music compositions from established repositories representing English, American, Irish, and Chinese traditional music (Nottingham Dataset, Irish Traditional Corpus, and self-recorded materials) was curated, supplemented by 6,127 successfully tracked gesture samples collected from 47 participants across 12 folk music gesture categories. The evaluation framework assessed gesture recognition accuracy, cultural authenticity preservation, real-time performance, and collaborative effectiveness through extensive experimental validation. The system achieved robust gesture recognition performance with 88.9% accuracy and 23.4 ms processing latency, while maintaining end-to-end response times of 86.8–91.6 ms during collaborative sessions. Cultural authenticity scores ranged from 7.6 to 8.3 across different regional folk styles, with a user satisfaction rating of 7.8 and a 28% improvement in musical coherence compared to baseline approaches. The framework successfully supports up to eight concurrent users while maintaining sub-100 ms real-time performance requirements. The integrated system successfully demonstrates effective coordination between gesture recognition and folk music generation subsystems, validating the architectural design and optimization strategies for culturally sensitive AI applications across diverse folk music traditions. The validated framework provides a foundation for educational, performance, and cultural preservation applications, contributing methodological insights for multimodal human–AI interaction systems and culturally aware creative technologies applicable to traditional music contexts.

Keywords

Artificial Intelligence-Collaborative Music Composition

Computer Vision

Folk Music Generation

Gesture Recognition

Real-Time Interactive Framework

Traditional Music

Funding

This study is supported by Journal Support Fund, Universiti Teknologi MARA (UiTM).

References

Berkowitz, A.E. (2024). Artificial intelligence and musicking: A philosophical inquiry. Music Perception: An Interdisciplinary Journal, 41(5), 393–412. https://doi.org/10.1525/mp.2024.41.5.393

Bian, W., Song, Y., Gu, N., Chan, T.Y., Lo, T.T., Li, T.S., et al. (2023). MoMusic: A motion-driven human-AI collaborative music composition and performing system. Proceedings of the AAAI Conference on Artificial Intelligence, 37(13), 16057–16062. https://doi.org/10.1609/aaai.v37i13.26907

Borovik, I., & Viro, V. (2023). Co-performing music with AI: Real-time performance control using speech and gestures. In: HHAI 2023: Augmenting Human Intellect. IOS Press, Amsterdam, p340-350. https://doi.org/10.3233/FAIA230097

Boulkroune, A., Hamel, S., Zouari, F., Boukabou, A., & Ibeas, A. (2017). Output‐feedback controller based projective lag‐synchronization of uncertain chaotic systems in the presence of input nonlinearities. Mathematical Problems in Engineering, 2017(1), 8045803. https://doi.org/10.1155/2017/8045803

Boulkroune, A., Zouari, F., & Boubellouta, A. (2025). Adaptive fuzzy control for practical fixed-time synchronization of fractional-order chaotic systems. Journal of Vibration and Control, 10775463251320258.

Chang, J., Wang, Z., & Yan, C. (2024). MusicARLtrans Net: A multimodal agent interactive music education system driven via reinforcement learning. Frontiers in Neurorobotics, 18, 1479694. https://doi.org/10.3389/fnbot.2024.1479694

Chen, Y., Huang, L., & Gou, T. (2024). Applications and Advances of Artificial Intelligence in Music Generation: A Review. [arXiv Preprint]. https://doi.org/10.48550/arXiv.2409.03715

Cheng, L. (2025). The impact of generative AI on school music education: Challenges and recommendations. Arts Education Policy Review, 126, 255–262. https://doi.org/10.1080/10632913.2025.2451373

Civit, M., Civit-Masot, J., Cuadrado, F., & Escalona, M.J. (2022). A systematic review of artificial intelligence-based music generation: Scope, applications, and future trends. Expert Systems with Applications, 209, 118190. https://doi.org/10.1016/j.eswa.2022.118190

Dalmazzo, D., Waddell, G., & Ramírez, R. (2021). Applying deep learning techniques to estimate patterns of musical gesture. Frontiers in Psychology, 11, 575971. https://doi.org/10.3389/fpsyg.2020.575971

Dash, A., & Agres, K. (2024). AI-based affective music generation systems: A review of methods and challenges. ACM Computing Surveys, 56(11), 1–34. https://doi.org/10.1145/3672554

Dawande, A., Chourasia, U., & Dixit, P. (2023). Music Generation and Composition Using Machine Learning. In: Proceedings of 3rd International Conference on Artificial Intelligence: Advances and Applications: ICAIAA 2022, p547–566. https://doi.org/10.1007/978-981-19-7041-2_46

Dritsas, E., Trigka, M., Troussas, C., & Mylonas, P. (2025). Multimodal interaction, interfaces, and communication: A survey. Multimodal Technologies and Interaction, 9(1), 6. https://doi.org/10.3390/mti9010006

Fan, M. (2022). Application of music industry based on the deep neural network. Scientific Programming, 2022(1), 4068207. https://doi.org/10.1155/2022/4068207

Ferreira, P., Limongi, R., & Fávero, L.P. (2023). Generating music with data: Application of deep learning models for symbolic music composition. Applied Sciences, 13(7), 4543. https://doi.org/10.3390/app13074543

Fu, Y., Newman, M., Going, L., Feng, Q., & Lee, J.H. (2025). Exploring the Collaborative Co-Creation Process with AI: A Case Study in Novice Music Production. In: Proceedings of the 2025 ACM Designing Interactive Systems Conference, p1298-1312. https://doi.org/10.1145/3715336.3735829

Gao, X., Rogel, A., Sankaranarayanan, R., Dowling, B., & Weinberg, G. (2024). Music, body, and machine: Gesture-based synchronization in human-robot musical interaction. Frontiers in Robotics and AI, 11, 1461615. https://doi.org/10.3389/frobt.2024.1461615

Graf, M., Opara, H.C., & Barthet, M. (2021). An Audio-Driven System for Real-Time Music Visualisation. [arXiv Preprint].

Hansen, N.C., Højlund, A., Møller, C., Pearce, M., & Vuust, P. (2022). Musicians show more integrated neural processing of contextually relevant acoustic features. Frontiers in Neuroscience, 16, 907540. https://doi.org/10.3389/fnins.2022.907540

Hernandez-Olivan, C., & Beltran, J.R. (2022). Music composition with deep learning: A review. In: Advances in Speech and Music Technology: Computational Aspects and Applications. Springer Nature, Germany, p25-50. https://doi.org/10.1007/978-3-031-18444-4_2

Huang, J., Weber, C.J., & Rothe, S. (2025). An AI-driven Music Visualization System for Generating Meaningful Audio-Responsive Visuals in Real-Time. In: Proceedings of the 2025 ACM International Conference on Interactive Media Experiences, p258-274. https://doi.org/10.1145/3706370.3727869

Ji, S., Yang, X., & Luo, J. (2023). A survey on deep learning for symbolic music generation: Representations, algorithms, evaluations, and challenges. ACM Computing Surveys, 56(1), 1-39. https://doi.org/10.1145/3597493

Jia, J., He, Y., & Le, H. (2020). A Multimodal Human-Computer Interaction System and Its Application in Smart Learning Environments. In: International Conference on Blended Learning, p3-14.

Johansen, S.S., Van Berkel, N., & Fritsch, J. (2022). Characterising Soundscape Research in Human-Computer Interaction. In: Proceedings of the 2022 ACM Designing Interactive Systems Conference, p1394-1417. https://doi.org/10.1145/3532106.3533458

Kapoor, S. (2025). The Many Faces of Uncertainty Estimation in Machine Learning. New York University. Available from: https://www.proquest.com/openview/92ed381924762b1c4afbf2a168231b2f/1?pq-origsite=gscholar&cbl=18750&diss=y

Kim, G., Kim, D.K., & Jeong, H. (2024). Spontaneous emergence of rudimentary music detectors in deep neural networks. Nature Communications, 15(1), 148. https://doi.org/10.1038/s41467-023-44516-0

Krol, S.J., Llano Rodriguez, M.T., & Loor Paredes, M.J. (2025). Exploring the Needs of Practising Musicians in Co-Creative AI Through Co-Design. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, p1-13. https://doi.org/10.1145/3706598.3713894

Lee, K.J.M., Pasquier, P., & Yuri, J. (2025). Revival: Collaborative Artistic Creation through Human-AI Interactions in Musical Creativity. [arXiv Preprint]. https://doi.org/10.48550/arXiv.2503.15498

Li, J., Xu, W., Cao, Y., Liu, W., & Cheng, W. (2020). Robust piano music transcription based on computer Vision. In: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference and 2020 3rd International Conference on Big Data and Artificial Intelligence, p92-97. https://doi.org/10.1145/3409501.3409540

Liang, J. (2023). Harmonizing minds and machines: Survey on transformative power of machine learning in music. Frontiers in Neurorobotics, 17, 1267561. https://doi.org/10.3389/fnbot.2023.1267561

Otsu, K., Yuan, J., Fukuda, H., Kobayashi, Y., Kuno, Y., & Yamazaki, K. (2021). Enhancing Multimodal Interaction between Performers and Audience Members During Live Music Performances. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, p1-6. https://doi.org/10.1145/3411763.3451584

Pricop, T.C., & Iftene, A. (2024). Music generation with machine learning and deep neural networks. Procedia Computer Science, 246, 1855-1864. https://doi.org/10.1016/j.procs.2024.09.692

Rezwana, J., & Maher, M.L. (2023). Designing creative AI partners with COFI: A framework for modeling interaction in human-AI co-creative systems. ACM Transactions on Computer-Human Interaction, 30(5), 1-28. https://doi.org/10.1145/3519026

Rigatos, G., Abbaszadeh, M., Sari, B., Siano, P., Cuccurullo, G., & Zouari, F. (2023). Nonlinear optimal control for a gas compressor driven by an induction motor. Results in Control and Optimization, 11, 100226. https://doi.org/10.1016/j.rico.2023.100226

Roche, F. (2020). Music Sound Synthesis using Machine Learning: Towards a Perceptually Relevant Control Spac. Université Grenoble Alpes. Available from: https://theses.hal.science/tel-03102796v1 [Last accessed on 2024 Mar 12].

Sturm, B.L., & Ben-Tal, O. (2021). Folk the Algorithms: (Mis) Applying Artificial Intelligence to Folk Music. In: Handbook of Artificial Intelligence for Music: Foundations, Advanced Approaches, and Developments for Creativity. Springer, Germany, p423-454. https://doi.org/10.1007/978-3-030-72116-9_16

Vear, C., Benford, S., Avila, J.M., & Moroz, S. (2023). Human-AI Musicking: A Framework for Designing AI for Music Co-creativity. AIMC 2023. Available from: https://aimc2023.pubpub.org/pub/zd46ltn3[Last accessed on 2024 Apr 25].

Yimer, M.H., Yu, Y., Adu, K., Favour, E., Liyih, S.M., & Patamia, R.A. (2023). Music Genre Classification using Deep Neural Networks. In: 2023 35th Chinese Control and Decision Conference (CCDC), p2384-2391. https://doi.org/10.1109/CCDC58219.2023.10327367

Zhao, Y., Yang, M., Lin, Y., Zhang, X., Shi, F., Wang, Z., et al. (2025). AI-enabled text-to-music generation: A comprehensive review of methods, frameworks, and future directions. Electronics, 14(6), 1197. https://doi.org/10.3390/electronics14061197

Zhu, T., Liu, H., Jiang, Z., & Zheng, Z. (2024). Symbolic Music Generation with Fine-grained Interactive Textural Guidance. Available from: https://openreview.net/forum?id=Qt5sBi0u7I [Last accessed on 2025 Jan 15].

Zouari, F., Saad, K.B., & Benrejeb, M. (2012). Robust neural adaptive control for a class of uncertain nonlinear complex dynamical multivariable systems. International Review on Modelling and Simulations, 5(5), 2075-2103. https://doi.org/10.1109/TNN.2010.2042611

Zouari, F., Saad, K.B., & Benrejeb, M. (2013a). Adaptive backstepping control for a class of uncertain single input single output nonlinear systems. In: 10th International Multi-Conferences on Systems, Signals and Devices 2013 (SSD13), p1-6. https://doi.org/10.1109/SSD.2013.6564134

Zouari, F., Saad, K.B., & Benrejeb, M. (2013b). Adaptive Backstepping Control for a Single-Link Flexible Robot Manipulator Driven DC Motor. In: 2013 International Conference on Control, Decision and Information Technologies (CoDIT), p864-871. https://doi.org/10.1109/codit.2013.6689656

Conflict of interest

The authors declare that they have no competing interests.

Previous article in this issue

Next article in this issue

International Journal of Systematic Innovation, Electronic ISSN: 2077-8767 Print ISSN: 2077-7973, Published by AccScience Publishing

Publisher's Core Philosophy

We are committed to support the scientific community by publishing impactful research and enhancing communication among scientists. At AccScience Publishing, we are continuously looking for ways to accelerate scientific progress and to strive for transparency and open communication, making knowledge freely accessible without barrier.

9 Raffles Place, Republic Plaza 1 #06-00 Singapore 048619

+65 8182 1586

editorial@accscience.com