Nalaganje...

HATNet: Hierarchical Attention Transformer With RS-CLIP Patch Tokens for Remote Sensing Image Captioning

Remote sensing image captioning (RSIC) aims to generate natural language descriptions of critical visual content in overhead-view remote sensing images. However, existing methods often produce descriptions with significant omissions and misidentifications of scene types or object counts, particularl...

Popoln opis

Shranjeno v:
Bibliografske podrobnosti
Main Authors: Zhongle Ren, Jianhua Meng, Cheng Zhang, Shulin Gao, Biao Hou, Weibin Li, Hao Zhu, Licheng Jiao
Format: Artigo
Jezik:Inglês
Izdano: IEEE 2025-01-01
Serija:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Teme:
Online dostop:https://ieeexplore.ieee.org/document/11214231/
Oznake: Označite
Brez oznak, prvi označite!