How generative AI could help make construction sites safer

To combat the shortcuts and risk-taking, Lorenzo is working on a tool for the San Francisco–based company DroneDeploy, which sells software that creates daily digital models of work progress from videos and images, known in the trade as “reality capture.” The tool, called Safety AI, analyzes each day’s reality capture imagery and flags conditions that violate Occupational Safety and Health Administration (OSHA) rules, with what he claims is 95% accuracy.

That means that for any safety risk the software flags, there is 95% certainty that the flag is accurate and relates to a specific OSHA regulation. Launched in October 2024, it’s now being deployed on hundreds of construction sites in the US, Lorenzo says, and versions specific to the building regulations in countries including Canada, the UK, South Korea, and Australia have also been deployed.

Safety AI is one of multiple AI construction safety tools that have emerged in recent years, from Silicon Valley to Hong Kong to Jerusalem. Many of these rely on teams of human “clickers,” often in low-wage countries, to manually draw bounding boxes around images of key objects like ladders, in order to label large volumes of data to train an algorithm.

Lorenzo says Safety AI is the first one to use generative AI to flag safety violations, which means an algorithm that can do more than recognize objects such as ladders or hard hats. The software can “reason” about what is going on in an image of a site and draw a conclusion about whether there is an OSHA violation. This is a more advanced form of analysis than the object detection that is the current industry standard, Lorenzo claims. But as the 95% success rate suggests, Safety AI is not a flawless and all-knowing intelligence. It requires an experienced safety inspector as an overseer.

A visual language model in the real world

Robots and AI tend to thrive in controlled, largely static environments, like factory floors or shipping terminals. But construction sites are, by definition, changing a little bit every day.

Lorenzo thinks he’s built a better way to monitor sites, using a type of generative AI called a visual language model, or VLM. A VLM is an LLM with a vision encoder, allowing it to “see” images of the world and analyze what is going on in the scene.

Using years of reality capture imagery gathered from customers, with their explicit permission, Lorenzo’s team has assembled what he calls a “golden data set” encompassing tens of thousands of images of OSHA violations. Having carefully stockpiled this specific data for years, he is not worried that even a billion-dollar tech giant will be able to “copy and crush” him.

To help train the model, Lorenzo has a smaller team of construction safety pros ask strategic questions of the AI. The trainers input test scenes from the golden data set to the VLM and ask questions that guide the model through the process of breaking down the scene and analyzing it step by step the way an experienced human would. If the VLM doesn’t generate the correct response—for example, it misses a violation or registers a false positive—the human trainers go back and tweak the prompts or inputs. Lorenzo says that rather than simply learning to recognize objects, the VLM is taught “how to think in a certain way,” which means it can draw subtle conclusions about what is happening in an image.

Source link