AccScience Publishing / IJOSI / Volume 9 / Issue 5 / DOI: 10.6977/IJoSI.202510_9(5).0003
Cite this article
4
Download
17
Citations
43
Views
Journal Browser
Volume | Year
Issue
Search
News and Announcements
View All
ARTICLE

A graphics processing unit-based parallel simplified swarm optimization algorithm for enhanced performance and precision

Wenbo Zhu1 Shang-Ke Huang2 Wei-Chang Yeh2,3* Zhenyao Liu4 Chia-Ling Huang5
Show Less
1 School of Mechatronical Engineering and Automation, Foshan University, Foshan, Guangdong, China
2 Integration and Collaboration Laboratory, Department of Industrial Engineering and Engineering Management, College of Engineering, National Tsing Hua University, Hsinchu, Taiwan
3 Department of Industrial and Systems Engineering, College of Electrical Engineering and Computer Science, Chung Yuan Christian University, Taoyuan, Taiwan
4 School of Economics and Management, Taizhou University, Taizhou, Jiangsu, China
5 Department of International Logistics and Transportation Management, School of Transportation and Tourism, Kainan University, Taoyuan, Taiwan
Submitted: 13 January 2025 | Revised: 2 August 2025 | Accepted: 4 August 2025 | Published: 16 October 2025
© 2025 by the Publisher. Licensee AccScience Publishing, USA. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC BY-NC 4.0) ( https://creativecommons.org/licenses/by-nc/4.0/ )
Abstract

Graphics processing units (GPUs) have emerged as powerful platforms for parallel computing, enabling personal computers to solve complex optimization tasks effectively. Although swarm intelligence algorithms naturally lend themselves to parallelization, a GPU-based implementation of the simplified swarm optimization (SSO) algorithm has not been reported in the literature. This paper introduces a compute CUDA-SSO algorithm on the CUDA platform, with a time complexity analysis of O (Ngen × Nsol × Nvar), where Ngen is the number of iterations, Nsol is the population size (i.e., number of fitness function evaluations), and Nvar represents the required pairwise comparisons. By eliminating resource preemption of personal best and global best updates, CUDA-SSO significantly reduces the overall complexity and prevents concurrency conflicts. Numerical experiments demonstrate that the proposed approach achieves an order-of-magnitude improvement in run time with superior solution precision relative to central processing unit-based SSO, making it a compelling methodology for large-scale, data-parallel optimization tasks.

Keywords
Compute Unified Device Architecture
Graphics Processing Unit
Parallelism
Simplified Swarm Optimization
Swarm Intelligence Algorithms
Funding
This research was supported in part by the Natural Science Foundation of China (Grant No. 62106048) and the Ministry of Science and Technology (MOST), China, under grants MOST 107-2221-E-007-072-MY3, MOST 110-2221-E-007-107-MY3, and NSTC 113-2221-E-007-117-MY3.
References

Abbasi, M., Rafiee, M., Khosravi, M.R., Jolfaei, A., Menon, V.G., & Koushyar, J. M. (2020). An efficient parallel genetic algorithm solution for vehicle routing problem in cloud implementation of the intelligent transportation systems. Journal of Cloud Computing, 9(6), 6. https://doi.org/10.1186/s13677-020-0157-4

 

AlZubi, S., Shehab, M., Al-Ayyoub, M., Jararweh, Y., & Gupta, B. (2020). Parallel implementation for 3D medical volume fuzzy segmentation. Pattern Recognition Letters, 130, 312–318. https://doi.org/10.1016/j.patrec.2018.07.026

 

Chung, Y.Y., & Wahid, N. (2012). A hybrid network intrusion detection system using simplified swarm optimization (SSO). Applied Soft Computing, 12(9), 3014–3022. https://doi.org/10.1016/j.asoc.2012.04.020

 

Corley, H.W., Rosenberger, J., Yeh, W.C., & Sung, T.K. (2006). The cosine simplex algorithm. The International Journal of Advanced Manufacturing Technology, 27, 1047–1050. https://doi.org/10.1007/s00170-004-2278-1

 

Corral, J.M.R., Civit-Masot, J., Luna-Perejón, F., Díaz-Cano, I., Morgado-Estévez, A., & Domínguez-Morales, M. (2024). Energy efficiency in edge TPU vs. Embedded GPU for computer-aided medical imaging segmentation and classification. Engineering Applications of Artificial Intelligence, 127, 107298. https://doi.org/10.1016/j.engappai.2023.107298

 

Friedman, J.H. (1994). An overview of predictive learning and function approximation. In: From Statistics to Neural Networks. Vol. 136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-79119-2_1

 

Gordon, V.S., & Whitley, D. (1993). Serial and parallel genetic algorithms as function optimizers. In: Proceedings of the 5th International Conference on Genetic Algorithms (ICGA). Urbana-Champaign, IL, USA.

 

Hachaj, T., & Piekarczyk, M. (2023). High-level hessian-based image processing with the frangi neuron. Electronics, 12(19), 4159. https://doi.org/10.3390/electronics12194159

 

Hadley, G. (1964). A Nonlinear and Dynamics Programming. Addison-Wesley Professional, Reading, MA, USA.

 

Hager, G., Zeiser, T., & Wellein, G. (2008). Data Access Optimizations for Highly Threaded Multi-Core Cpus with Multiple Memory Controllers. In: Proceedings of 2008 IEEE International Symposium on Parallel and Distributed Processing, p1–7. https://doi.org/10.1109/IPDPS.2008.4536341

 

Hussain, M.M., Hattori, H., & Fujimoto, N. (2016). A CUDA implementation of the standard particle swarm optimization. In: Proceedings of 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) IEEE, Timisoara, Romania. p24–27. https://doi.org/10.1109/SYNASC.2016.043

 

Krömer, P., Platoš, J., Snášel, V., & Abraham, A. (2011). A comparison of many-threaded differential evolution and genetic algorithms on CUDA. In: Proceedings of 2011 Third World Congress on Nature and Biologically Inspired Computing: IEEE, Salamanca, Spain, p19–21. https://doi.org/10.1109/NaBIC.2011.6089641

 

Lee, W.C., Chuang, M.C., & Yeh, W.C. (2012). Uniform parallel-machine scheduling to minimize makespan with position-based learning curves. Computers and Industrial Engineering, 63(4), 813–818. https://doi.org/10.1016/j.cie.2012.05.003

 

Li, W., & Zhang, Z. (2011). A Cuda-Based Multi-Channel Particle Swarm Algorithm. In: Proceedings of 2011 International Conference on Control, Automation and Systems Engineering (CASE): IEEE, Singapore. https://doi.org/10.1109/ICCASE.2011.5997786

 

Luo, C., Sun, B., Yang, K., Lu, T., & Yeh, W.C. (2019). Thermal infrared and visible sequences fusion tracking based on a hybrid tracking framework with adaptive weighting scheme. Infrared Physics and Technology, 99, 265–276. https://doi.org/10.1016/j.infrared.2019.04.017

 

Mittal, S., & Vetter, J.S. (2014). A survey of methods for analyzing and improving GPU energy efficiency. ACM Computing Surveys (CSUR), 47(2), 1–23. https://doi.org/10.1145/263634

 

Mortezazadeh, M., Wang, L.L., Albettar, M., & Yang, S. (2022). CityFED - city fast fluid dynamics for Urban microclimate simulations on graphics processing units. Urban Climate, 41, 101063. https://doi.org/10.1016/j.uclim.2021.101063

 

Mussi, L., Daolio, F., & Cagnoni, S. (2011). Evaluation of parallel particle swarm optimization algorithms within the CUDA™ architecture. Information Sciences, 181(20), 4642–4657. https://doi.org/10.1016/j.ins.2010.08.045

 

Navarro, C.A., Hitschfeld-Kahler, N., & Mateu, L. (2014). A survey on parallel computing and its applications in data-parallel problems using GPU architectures. Communications in Computational Physics, 15(2), 285–329. https://doi.org/10.4208/cicp.110113.010813a

 

NVIDIA Corporation. (2012). CUDA C Best Practices Guide. Ver. 4.1. [Technical Report]. United States: NVIDIA Corporation.

 

NVIDIA. (n.d.). CUDA C Programming Guide. NVIDIA Documentation. Available from: https://docs.nvidia.com/cuda/cuda-c-programming-guide [Last accessed on 2024 Nov 01].

 

Pllana, S., & Xhafa, F. (2017). Programming Multicore and Many-Core Computing Systems. John Wiley and Sons, United States. https://doi.org/10.1002/9781119332015

 

Ruder, S. (2016). An Overview of Gradient Descent Optimization Algorithms. [arXiv Preprint].

 

Tan, Y., & Ding, K. (2015). A survey on GPU-based implementation of swarm intelligence algorithms. IEEE Transactions on Cybernetics, 46(9), 2028–2041. https://doi.org/10.1109/TCYB.2015.2460261

 

Tsutsui, S., & Fujimoto, N. (2009). Solving quadratic assignment problems by genetic algorithms with GPU computation: A case study. In: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers. Montreal, Québec, Canada, 2009. p8–12. https://doi.org/10.1145/1570256.157035

 

Wolpert, D.H., & Macready, W.G. (1995). No Free Lunch Theorems for Search. Technical Report SFI-TR-95-02-010. Santa Fe Institute, United States.

 

Wong, M.L. (2009). Parallel multi-objective evolutionary algorithms on graphics processing units. In: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers. Montreal, QC, Canada. https://doi.org/10.1145/1570256.1570354

 

Yeh, W.C. (2009). A two-stage discrete particle swarmoptimization for the problem of multiple multi-level redundancy allocation in series systems. Expert Systems with Applications, 36(5), 9192–9200. https://doi.org/10.1016/j.eswa.2008.12.024

 

Yeh, W.C. (2012). Simplified swarm optimization in disassembly sequencing problems with learning effects. Computers and Operations Research, 39(9), 2168–2177. https://doi.org/10.1016/j.cor.2011.10.027

 

Yeh, W.C. (2013). New parameter-free simplified swarm optimization for artificial neural network training and its application in the prediction of time series. IEEE Transactions on Neural Networks and Learning Systems, 24(4), 661–665. https://doi.org/10.1109/TNNLS.2012.2232678

 

Yeh, W.C. (2014). Orthogonal simplified swarm optimization for the series-parallel redundancy allocation problem with a mix of components. Knowledge Based Systems, 64, 1–12. https://doi.org/10.1016/j.knosys.2014.03.011

 

Yeh, W.C. (2015). An improved simplified swarm optimization. Knowledge Based Systems, 82, 60–69. https://doi.org/10.1016/j.knosys.2015.02.022

 

Yeh, W.C. (2017). A new exact solution algorithm for a novel generalized redundancy allocation problem. Information Sciences, 408, 182–197. https://doi.org/10.1016/j.ins.2017.04.019

 

Yeh, W.C., & Wei, S.C. (2012). Economic-based resource allocation for reliable grid-computing service based on Grid Bank. Future Generation Computer Systems, 28(7), 989–1002. https://doi.org/10.1016/j.future.2012.03.005

 

Yildirim, E., Arslan, E., Kim, J., & Kosar, T. (2015). Application-level optimization of big data transfers through pipelining, parallelism and concurrency. IEEE Transactions on Cloud Computing, 4(1), 63–75. https://doi.org/10.1109/TCC.2015.2415804

 

Zhu, H., Guo, Y., Wu, J., Gu, J., & Eguchi, K. (2011). Paralleling euclidean particle swarm optimization in CUDA. In: Proceedings of 2011 4th International Conference on Intelligent Networks and Intelligent Systems: IEEE, Kuming, China. https://doi.org/10.1109/ICINIS.2011.35

 

Zhu, W. (2011). Massively parallel differential evolution-pattern search optimization with graphics hardware acceleration: An investigation on bound constrained optimization problems. Journal of Global Optimization, 50(3), 417–437. https://doi.org/10.1007/s10898-010-9590-0

Conflict of interest
The authors declare that there are no conflicts of interest.
Share
Back to top
International Journal of Systematic Innovation, Electronic ISSN: 2077-8767 Print ISSN: 2077-7973, Published by AccScience Publishing