As online shopping and same-day delivery become the norm, the pressure on logistics companies to optimize “last-mile” delivery—getting packages from distribution centers to customers’ doorsteps—has never been higher.
A new study by Aigerim Bogyrbayeva, Bissenbay Dauletbayev, and Meraryslan Meraliyev from SDU University, Kazakhstan, proposes a cutting-edge approach using reinforcement learning (RL) to dramatically improve efficiency in drone-assisted vehicle routing (Bogyrbayeva et al., 2025).
Rethinking Delivery Logistics
Last-mile delivery has traditionally relied on trucks, vans, or even bicycles. However, increased traffic congestion and rising emissions are making conventional approaches less sustainable.
According to the researchers, urban delivery vehicles are expected to increase by 36% between 2019 and 2030, adding six million tonnes of CO₂ emissions to already congested cities. Drones, with their ability to bypass traffic and reach hard-to-access areas, offer a promising solution.
But coordinating multiple drones alongside traditional vehicles presents a complex computational problem known as the Vehicle Routing Problem with Drones (VRPD). This challenge involves determining optimal routes, launch and retrieval points, and balancing drone battery limitations.
An Intelligent Approach Using Reinforcement Learning
To tackle this problem, the study formulates VRPD as a Markov Decision Process (MDP) and trains a reinforcement learning model with a neural network architecture combining an attention encoder and a recurrent decoder.
This design enables the system to dynamically decide which vehicles and drones should visit specific customers and where they should rendezvous, maximizing efficiency while minimizing delivery times.
Unlike previous models, which often assumed unlimited drone battery life or single-drone scenarios, this RL-based approach accounts for multiple drones, multiple trucks, and real-world battery constraints.
The RL model was trained and tested on benchmark datasets simulating urban delivery environments. Researchers employed three decoding strategies—greedy selection, sampling multiple solutions, and an ensemble of models saved at different training intervals—to extract optimal routes.
Computational resources included high-end GPUs to ensure rapid training and evaluation.
Striking Results
The study’s computational experiments reveal impressive results. For small-scale delivery scenarios (8–10 customer nodes), the RL sampling model achieved near-optimal solutions while operating up to 70 times faster than traditional solvers.
Even as problem sizes increased to 20–50 nodes, the ensemble RL model consistently delivered high-quality routes with minimal cost gaps compared to benchmark heuristics.
The research also evaluated drones with limited flying ranges, a critical factor in real-world operations. When drones could cover only 60% of the maximum distance between delivery points, the RL system still adapted effectively, keeping solution quality within 2–3% of optimal results while maintaining significant speed advantages.
Ablation studies further highlighted the importance of modeling drone-truck group interactions: the system’s ability to understand coordinated behaviors between multiple vehicles proved essential for minimizing overall delivery costs.
Visualizations of routing solutions show the RL model’s intuitive clustering of delivery nodes, balancing workload between drones and trucks even without explicitly programming the behavior. Interestingly, adding more drones did not always improve delivery times in smaller scenarios, underscoring the nuanced optimization the RL model achieves.
Broader Implications for Logistics and Sustainability
This research carries significant implications for both logistics efficiency and environmental sustainability. Faster and more adaptable delivery routes reduce fuel consumption for trucks and allow drones to complement traditional vehicles in areas where road congestion or poor infrastructure is a challenge.
By optimizing last-mile delivery, logistics providers can cut emissions, reduce traffic congestion, and respond more flexibly to time-sensitive deliveries, such as medical supplies.
Moreover, the speed of RL-based routing—generating high-quality solutions in seconds—offers a clear advantage for dynamic delivery systems where new orders and real-time traffic conditions must be incorporated.
This adaptability could transform urban logistics, making drone-assisted deliveries more practical and scalable.
Looking Ahead
While the study demonstrates strong performance in benchmark settings, further research is needed to fully realize the potential of RL-based delivery systems.
Future directions include modeling complex recharging schedules for drones, testing the system with real-world traffic and weather conditions, and exploring multi-objective optimization balancing cost, time, and environmental impact.
In conclusion, the study by Bogyrbayeva and colleagues marks a significant step toward fully integrated, AI-driven logistics systems.
By leveraging reinforcement learning, drones and trucks can operate in harmony, delivering packages faster, more efficiently, and with a smaller carbon footprint—paving the way for smarter, greener cities.
Reference:
Bogyrbayeva, A., Dauletbayev, B., & Meraliyev, M. (2025). Reinforcement Learning for Efficient Drone-Assisted Vehicle Routing. Applied Sciences, 15(4), 2007.