FP-LGN: Feature Pyramid Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations

Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations

¹ Autonomous Systems Lab,

² Dept. of Computer Engineering,

³ Dept. of Mechanical Engineering,

Faculty of Engineering, Chulalongkorn University

^† Equal contribution ^* Corresponding author

Abstract

Fusing information from human observations can help robots overcome sensing limitations in collaborative tasks. However, an uncertainty-aware fusion framework requires a grounded likelihood representing the uncertainty of human inputs. This paper presents a Feature Pyramid Likelihood Grounding Network (FP-LGN) that grounds spatial language by learning relevant map image features and their relationships with spatial relation semantics. The model is trained as a probability estimator to capture aleatoric uncertainty in human language using three-stage curriculum learning. Results showed that FP-LGN matched expert-designed rules in mean Negative Log-Likelihood (NLL) and demonstrated greater robustness with lower standard deviation. Collaborative sensing results demonstrated that the grounded likelihood successfully enabled uncertainty-aware fusion of heterogeneous human language observations and robot sensor measurements, achieving significant improvements in human–robot collaborative task performance.

BibTeX

@inproceedings{sitdhipol2025fplgn, title = {Spatial language likelihood grounding network for {B}ayesian fusion of human-robot observations}, author = {Supawich Sitdhipol and Waritwong Sukprasongdee and Ekapol Chuangsuwanich and Rina Tse}, booktitle = {Proceedings of the {IEEE} International Conference on Systems, Man, and Cybernetics ({SMC})}, address = {Vienna, Austria}, month = {Oct.}, year = {2025} }