Sahil Khose

SKYSCENES Dataset Could Lead to Safe, Reliable Autonomous Flying Vehicles

Is it a building or a street? How tall is the building? Are power lines located nearby?

These are details autonomous flying vehicles need to know to function safely. However, few aerial image datasets exist that can adequately train the computer vision algorithms that would pilot these vehicles.

That’s why Georgia Tech researchers created a new benchmark dataset of computer-generated aerial images.

Judy Hoffman, an assistant professor in Georgia Tech’s School of Interactive Computing, worked with students in her lab to create SKYSCENES. The dataset contains over 33,000 aerial images of cities curated from a computer simulation program.

Hoffman said sufficient training datasets could unlock the potential of autonomous flying vehicles. Constructing those datasets is a challenge the computer vision research community has been working for years to overcome.

“You can’t crowdsource it the same way you would standard internet images,” Hoffman said. “Trying to collect it manually would be very slow and expensive — akin to what the self-driving industry is doing driving around vehicles, but now you’re talking about drones flying around. 

“We must fix those problems to have models that work reliably and safely for flying vehicles.”

Many existing datasets aren’t annotated well enough for algorithms to distinguish objects in the image. For example, the algorithms may not recognize the surface of a building from the surface of a street.

Working with Hoffman, Ph.D. student Sahil Khose tried a new approach — constructing a synthetic image data set from a ground-view, open-source simulator known as CARLA.

Image
Sahil Khose
Ph.D. student Sahil Khose worked with Assistant Professor Judy Hoffman to curate SKYSCENES, a new benchmark dataset that provides well-annotated aerial images of cities that computer vision algorithms can use to operate autonomous flying vehicles. Photos by Kevin Beasley/College of Computing.

CARLA was originally designed to provide ground-view simulation for self-driving vehicles. It creates an open-world virtual reality that allows users to drive around in computer-generated cities.

Khose and his collaborators adjusted CARLA’s interface to support aerial views that mimic views one might get from unmanned aerial vehicles (UAVs). 

What's the Forecast?

The team also created new virtual scenarios to mimic the real world by accounting for changes in weather, times of day, various altitudes, and population per city. The algorithms will struggle to recognize the objects in the frame consistently unless those details are incorporated into the training data.

“CARLA’s flexibility offers a wide range of environmental configurations, and we take several important considerations into account while curating SKYSCENES images from CARLA,” Khose said. “Those include strategies for obtaining diverse synthetic data, embedding real-world irregularities, avoiding correlated images, addressing skewed class representations, and reproducing precise viewpoints.”

SKYSCENES is not the largest dataset of aerial images to be released, but a paper co-authored by Khose shows that it performs better than existing models. 

Khose said models trained on this dataset exhibit strong generalization to real-world scenarios, and integration with real-world data enhances their performance. The dataset also controls variability, which is essential to perform various tasks.

“This dataset drives advancements in multi-view learning, domain adaptation, and multimodal approaches, with major implications for applications like urban planning, disaster response, and autonomous drone navigation,” Khose said. “We hope to bridge the gap for synthetic-to-real adaptation and generalization for aerial images.”

Seeing the Whole Picture

For algorithms, generalization is the ability to perform tasks based on new data that expands beyond the specific examples on which they were trained.

“If you have 200 images, and you train a model on those images, they’ll do well at recognizing what you want them to recognize in that closed-world initial setting,” Hoffman said. “But if we were to take aerial vehicles and fly them around cities at various times of the day or in other weather conditions, they would start to fail.”

That’s why Khose designed algorithms to enhance the quality of the curated images.

“These images are captured from 100 meters above ground, which means the objects appear small and are challenging to recognize,” he said. “We focused on developing algorithms specifically designed to address this.”

Those algorithms elevate the ability of ML models to recognize small objects, improving their performance in navigating new environments.

“Our annotations help the models capture a more comprehensive understanding of the entire scene — where the roads are, where the buildings are, and know they are buildings and not just an obstacle in the way,” Hoffman said. “It gives a richer set of information when planning a flight.

“To work safely, many autonomous flight plans might require a map given to them beforehand. If you have successful vision systems that understand exactly what the obstacles in the real world are, you could navigate in previously unseen environments.”

For more information about Georgia Tech Research at ECCV 2024, click here.