Proanimer-My personal website.

论文撰写解析


论文撰写解析

首先推荐一些值得关注的账号 x:@ScholarshipfPhd/ X

Communication-Efficient Collaborative Perception via Information Filling with Codebook

intro

Collaborative perception aims to enhance the perceptual ability of each individual agent by facilitating the exchange of complementary perceptual information among multiple agents

It fundamentally overcomes the occlusion and long-range issues in single-agent perception

As the forefront of autonomous systems, collaborative perception shows significant potential in enormous real-world applications, particularly vehicle-to-everything-communication aided autonomous driving

In this emerging field, a central challenge lies in optimizing the trade-off between perception performance and communication cost inherent in agents sharing perceptual data [10, 12, 16, 19, 20, 30, 41, 42]. Given inevitable practical constraints of communication systems, efficient utilization of communication resources is the prerequisite for collaborative perception

What2comm: Towards Communication-efficient Collaborative Perception via Feature Decoupling

intro

Precise environmental perception is essential for ensuring the driving safety of Autonomous Vehicles (AVs).

Benefiting from advances in deep learning-based technologies, numerous studies are devoted to optimizing the accuracy of in-vehicle vision applications, including object detection and instance segmentation

However, this kind of single-agent perception paradigm is inevitably restricted by several natural conditions, such as occlusion, limited detection range , and severe weather , making it more challenging to achieve robust vehicular perception. Recently, multi-agent collaborative perception has developed as a promising solution to overcome the above physical limitations.

This novel perception system facilitates information sharing among on-road agents via Vehicle-to-Everything (V2X) communication, leading to a more holistic perception of surrounding driving scenarios. Based on the emerging collaborative perception datasets, existing efforts seek a trade-off between performance and bandwidth via seminal communication and collaboration mechanisms. Despite recent advancements, challenges remain due to various collaboration noises, including transmission delay, localization errors , and agent heterogeneity.

​ As for communication mechanisms, current feature compression based methods ignore the spatial heterogeneity of feature maps. In addition, spatial filtering-based algorithms rely on the trained confidence maps, which may only focus on the high confidence areas and fail to extract the complementary information among agents. Meanwhile, it is difficult for existing strategies to handle the data discrepancies caused by the agent heterogeneity in the sensor type and installation height. Figures 1(a)&(b) demonstrate the collaborative perception scenario involving two agents and the fused point cloud. Intuitively, the orange box denotes the exclusive perception region of the collaborator, which can serve as complementary information for the ego vehicle to expand the detection range and complete the occluded areas. The overlapping perception range of the green box maintains the common semantic information and is beneficial to bridge the data distribution gap . Overall, considering exclusive and common information can facilitate efficient and pragmatic communication patterns.

​ Moreover, temporal asynchrony caused by transmission delay potentially degrades the collaboration performance . Figures 1(c)&(d) show the fused point clouds in the time-synchronous and time-asynchronous cases, respectively. The moving vehicles inside the green circles produce the fusion errors of the two-side point clouds due to the time delay, resulting in position misalignment and false detection results. However, the existing single-frame perception pattern restricts delay compensation methods and leads to performance bottlenecks. Also, the localization errors of agents may cause feature misalignment and harm the detection precision of per-agent/location message fusion efforts

​ Motivated by the above observations, we propose What2comm, a unified communication-efficient multi-agent collaborative perception framework to address the existing challenges in an end-to-end manner. From Figure 2, What2comm contains three core components: i) a Decoupling-based Communication Mechanism (DCM), which captures the exclusive and common representations among distinct agents via feature disentanglement to determine what messages to communicate. DCM provides a communication-efficient information sharing pattern through feature specificity and consistency supervision; ii) a Spatio-Temporal Collaboration Module (STCM), which aggregates perceptually complementary information from exclusive feature maps shared by collaborators and egocentered temporal semantics. STCM mitigates feature misalignment due to transmission delay and localization errors by joint spatiotemporal modeling; iii) a Common-Aware Fusion (CAF) strategy, which extracts high-dimensional information from collaboratorshared common representations to eliminate the data distribution gap across agents. Benefiting from the above-customized communication and collaboration components, What2comm takes a solid step progressed to a communication-efficient and noise-robust collaborative perception system. We evaluate the performance of

How2comm: Communication-Efficient and Collaboration-Pragmatic Multi-Agent Perception

aa“”

Communication-Efficient Collaborative Perception via Information Filling with Codebook