Learning Hierarchical Orthogonal Prototypes for Generalized Few-Shot 3D Point Cloud Segmentation

Yifei Zhao; Fanyu Zhao; Zhongyuan Zhang; Shengtang Wu; Yixuan Lin; Yinsheng Li

Learning Hierarchical Orthogonal Prototypes for Generalized Few-Shot 3D Point Cloud Segmentation

Yifei Zhao, Fanyu Zhao, Zhongyuan Zhang, Shengtang Wu, Yixuan Lin, Yinsheng Li

Accepted by ICME 2026

Abstract

Generalized few-shot 3D point cloud segmentation aims to adapt to novel classes from only a few annotations while maintaining strong performance on base classes, but this remains challenging due to the inherent stability--plasticity trade-off: adapting to novel classes can interfere with shared representations and cause base-class forgetting. We present HOP3D, a unified framework that learns hierarchical orthogonal prototypes with an entropy-based few-shot regularizer to enable robust novel-class adaptation without degrading base-class performance. HOP3D introduces hierarchical orthogonalization that decouples base and novel learning at both the gradient and representation levels, effectively mitigating base--novel interference. To further enhance adaptation under sparse supervision, we incorporate an entropy-based regularizer that leverages predictive uncertainty to refine prototype learning and promote balanced predictions. Extensive experiments on ScanNet200 and ScanNet++ demonstrate that HOP3D consistently outperforms state-of-the-art baselines under both 1-shot and 5-shot settings. The code will be publicly released upon acceptance.

Method

We propose HOP3D, a unified framework for generalized few-shot 3D point cloud segmentation that reduces base–novel interference and improves few-shot adaptation robustness. HOP3D integrates HOP-Net (hierarchical orthogonalization) and HOP-Ent (entropy-based few-shot regularizer).

HOP-Grad: projects Phase-2 novel gradients onto the orthogonal complement of base-task gradient directions to mitigate base forgetting.
HOP-Rep: learns orthogonal prototype subspaces for a base/novel representation decomposition and more stable decision geometry.
HOP-Ent: dual-entropy regularization to encourage confident and balanced novel predictions under sparse supervision.

Overview of the HOP3D framework (HOP-Net + HOP-Ent) — Overview of HOP3D: two-phase training with hierarchical orthogonalization (HOP-Grad + HOP-Rep) and entropy-guided refinement (HOP-Ent).

Qualitative Results

Qualitative comparison between GFS-VL and HOP3D on ScanNet200 — Qualitative comparison on ScanNet200: compared with GFS-VL, HOP3D produces more consistent base/novel predictions and reduces typical confusions.

Experimental Results

Metrics: mean IoU on base classes (B), novel classes (N), all classes (A), and their harmonic mean (HM). Results for 1-shot and 5-shot on ScanNet200 / ScanNet++.

ScanNet200

Method	5-shot				1-shot
Method	B	N	A	HM	B	N	A	HM
Fully Sup.	68.70	39.32	45.51	50.02	68.70	39.32	45.51	50.02
PIFS	28.78	3.82	9.07	6.71	17.84	2.87	6.02	4.88
attMPTI	37.13	4.99	11.76	8.79	54.84	3.28	14.14	6.17
COSeg	57.67	5.21	16.25	9.54	47.03	4.03	13.09	7.42
GW	59.28	8.30	19.03	14.55	55.23	6.47	16.74	11.56
GFS-VL	67.17	31.18	38.76	42.59	67.25	28.89	36.97	40.42
HOP3D (ours)	67.36	34.38	41.32	45.52	68.45	31.80	39.52	43.42

ScanNet++

Method	5-shot				1-shot
Method	B	N	A	HM	B	N	A	HM
Fully Sup.	65.45	37.24	48.53	47.47	65.45	37.24	48.53	47.47
PIFS	39.98	5.74	19.44	10.03	36.66	4.95	17.63	8.71
attMPTI	55.89	4.19	24.87	7.78	53.16	3.55	23.40	6.66
COSeg	59.34	6.96	27.91	12.45	58.49	6.24	27.14	11.26
GW	51.35	11.03	27.16	18.15	46.71	6.63	22.66	11.59
GFS-VL	60.49	21.40	37.04	31.61	60.02	17.90	34.75	27.56
HOP3D (ours)	62.40	23.70	39.18	34.34	61.72	19.23	36.23	29.32

Key Findings

Consistent novel gains: HOP3D improves mIoU-N and HM over the strongest baseline (GFS-VL) on both datasets under 1-shot and 5-shot.
Base retention: HOP3D maintains strong base-class performance while improving novel recognition.
Entropy regularization helps: HOP-Ent improves confidence and class balance during Phase-2 adaptation.

Ablation & Additional Analysis

Prototype cosine similarity matrices (1x4) — Prototype cosine-similarity matrices of ℓ2-normalized prototypes (Phase 1/2, with/without HOP-Net).

HOP-Ent analysis (confidence and class balance) — HOP-Ent analysis: confidence distribution and class-frequency distribution (2×1).

HOP-Net ablation (lambda_orth and adaptation ratio) — HOP-Net ablation: impact of orthogonality weight (λ_orth) and adaptation ratio (AR).

Qualitative ablation results — Qualitative ablation: HOP3D corrects typical base–novel confusions compared with variants without HOP-Net/HOP-Ent.

BibTeX

@inproceedings{HOP3D2026,
  title={Learning Hierarchical Orthogonal Prototypes for Generalized Few-Shot 3D Point Cloud Segmentation},
  author={Zhao, Yifei and Zhao, Fanyu and Zhang, Zhongyuan and Wu, Shengtang and Lin, Yixuan and Li, Yinsheng},
  booktitle={IEEE International Conference on Multimedia \& Expo (ICME)},
  year={2026},
  url={https://arxiv.org/abs/2603.19788}
}