Learning Hierarchical Orthogonal Prototypes for Generalized Few-Shot 3D Point Cloud Segmentation

Yifei Zhao, Fanyu Zhao, Zhongyuan Zhang, Shengtang Wu, Yixuan Lin, Yinsheng Li
Accepted by ICME 2026

Abstract

Generalized few-shot 3D point cloud segmentation aims to adapt to novel classes from only a few annotations while maintaining strong performance on base classes, but this remains challenging due to the inherent stability--plasticity trade-off: adapting to novel classes can interfere with shared representations and cause base-class forgetting. We present HOP3D, a unified framework that learns hierarchical orthogonal prototypes with an entropy-based few-shot regularizer to enable robust novel-class adaptation without degrading base-class performance. HOP3D introduces hierarchical orthogonalization that decouples base and novel learning at both the gradient and representation levels, effectively mitigating base--novel interference. To further enhance adaptation under sparse supervision, we incorporate an entropy-based regularizer that leverages predictive uncertainty to refine prototype learning and promote balanced predictions. Extensive experiments on ScanNet200 and ScanNet++ demonstrate that HOP3D consistently outperforms state-of-the-art baselines under both 1-shot and 5-shot settings. The code will be publicly released upon acceptance.

Method

We propose HOP3D, a unified framework for generalized few-shot 3D point cloud segmentation that reduces base–novel interference and improves few-shot adaptation robustness. HOP3D integrates HOP-Net (hierarchical orthogonalization) and HOP-Ent (entropy-based few-shot regularizer).

  • HOP-Grad: projects Phase-2 novel gradients onto the orthogonal complement of base-task gradient directions to mitigate base forgetting.
  • HOP-Rep: learns orthogonal prototype subspaces for a base/novel representation decomposition and more stable decision geometry.
  • HOP-Ent: dual-entropy regularization to encourage confident and balanced novel predictions under sparse supervision.
Overview of the HOP3D framework (HOP-Net + HOP-Ent)
Overview of HOP3D: two-phase training with hierarchical orthogonalization (HOP-Grad + HOP-Rep) and entropy-guided refinement (HOP-Ent).

Qualitative Results

Qualitative comparison between GFS-VL and HOP3D on ScanNet200
Qualitative comparison on ScanNet200: compared with GFS-VL, HOP3D produces more consistent base/novel predictions and reduces typical confusions.

Experimental Results

Metrics: mean IoU on base classes (B), novel classes (N), all classes (A), and their harmonic mean (HM). Results for 1-shot and 5-shot on ScanNet200 / ScanNet++.

ScanNet200

Method 5-shot 1-shot
BNAHM BNAHM
Fully Sup. 68.7039.3245.5150.02 68.7039.3245.5150.02
PIFS 28.783.829.076.71 17.842.876.024.88
attMPTI 37.134.9911.768.79 54.843.2814.146.17
COSeg 57.675.2116.259.54 47.034.0313.097.42
GW 59.288.3019.0314.55 55.236.4716.7411.56
GFS-VL 67.1731.1838.7642.59 67.2528.8936.9740.42
HOP3D (ours) 67.3634.3841.3245.52 68.4531.8039.5243.42

ScanNet++

Method 5-shot 1-shot
BNAHM BNAHM
Fully Sup. 65.4537.2448.5347.47 65.4537.2448.5347.47
PIFS 39.985.7419.4410.03 36.664.9517.638.71
attMPTI 55.894.1924.877.78 53.163.5523.406.66
COSeg 59.346.9627.9112.45 58.496.2427.1411.26
GW 51.3511.0327.1618.15 46.716.6322.6611.59
GFS-VL 60.4921.4037.0431.61 60.0217.9034.7527.56
HOP3D (ours) 62.4023.7039.1834.34 61.7219.2336.2329.32

Key Findings

  • Consistent novel gains: HOP3D improves mIoU-N and HM over the strongest baseline (GFS-VL) on both datasets under 1-shot and 5-shot.
  • Base retention: HOP3D maintains strong base-class performance while improving novel recognition.
  • Entropy regularization helps: HOP-Ent improves confidence and class balance during Phase-2 adaptation.

Ablation & Additional Analysis

Prototype cosine similarity matrices (1x4)
Prototype cosine-similarity matrices of ℓ2-normalized prototypes (Phase 1/2, with/without HOP-Net).
HOP-Ent analysis (confidence and class balance)
HOP-Ent analysis: confidence distribution and class-frequency distribution (2×1).
HOP-Net ablation (lambda_orth and adaptation ratio)
HOP-Net ablation: impact of orthogonality weight (λorth) and adaptation ratio (AR).
Qualitative ablation results
Qualitative ablation: HOP3D corrects typical base–novel confusions compared with variants without HOP-Net/HOP-Ent.

BibTeX

@inproceedings{HOP3D2026,
  title={Learning Hierarchical Orthogonal Prototypes for Generalized Few-Shot 3D Point Cloud Segmentation},
  author={Zhao, Yifei and Zhao, Fanyu and Zhang, Zhongyuan and Wu, Shengtang and Lin, Yixuan and Li, Yinsheng},
  booktitle={IEEE International Conference on Multimedia \& Expo (ICME)},
  year={2026},
  url={https://arxiv.org/abs/2603.19788}
}