HaSPeR: An Image Repository for Hand Shadow Puppet Recognition

Published in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops (WCCA Oral), 2025

Citation (IEEE format): S. R. Raiyan, Z. Z. Amio, and S. Ahmed, “HaSPeR: An Image Repository for Hand Shadow Puppet Recognition,” arXiv preprint arXiv:2408.10360, 2024.

arXiv PDF Code/Data

BibT_eX

@misc{raiyan2024hasper,
title={HaSPeR: An Image Repository for Hand Shadow Puppet Recognition},
author={Syed Rifat Raiyan and Zibran Zarif Amio and Sabbir Ahmed},
year={2024},
eprint={2408.10360},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

Authors: Syed Rifat Raiyan, Zibran Zarif Amio, Sabbir Ahmed.
Abstract: Hand shadow puppetry, also known as shadowgraphy or ombromanie, is a form of theatrical art and storytelling where hand shadows are projected onto flat surfaces to create illusions of living creatures. The skilled performers create these silhouettes by hand positioning, finger movements, and dexterous gestures to resemble shadows of animals and objects. Due to the lack of practitioners and a seismic shift in people’s entertainment standards, this art form is on the verge of extinction. To facilitate its preservation and proliferate it to a wider audience, we introduce HaSPeR, a novel dataset consisting of 15,000 images of hand shadow puppets across 15 classes extracted from both professional and amateur hand shadow puppeteer clips. We provide a detailed statistical analysis of the dataset and employ a range of pretrained image classification models to establish baselines. Our findings show a substantial performance superiority of skip-connected convolutional models over attention-based transformer architectures. We also find that lightweight models, such as MobileNetV2, suited for mobile applications and embedded devices, perform comparatively well. We surmise that such low-latency architectures can be useful in developing ombromanie teaching tools, and we create a prototype application to explore this surmission. Keeping the best-performing model ResNet34 under the limelight, we conduct comprehensive feature-spatial, explainability, and error analyses to gain insights into its decision-making process and explore architectural improvements. To the best of our knowledge, this is the first documented dataset and research endeavor to preserve this dying art for future generations, with computer vision approaches. Our code and data are publicly available at this https URL.

Share on

Twitter Facebook LinkedIn