File(s) under permanent embargo
Learning temporal information from spatial information using CapsNets for human action recognition
conference contribution
posted on 2019-01-01, 00:00 authored by A M Algamdi, V Sanchez, Chang-Tsun LiChang-Tsun LiCapsule Networks (CapsNets) are recently introduced to overcome some of the shortcomings of traditional Convolutional Neural Networks (CNNs). CapsNets replace neurons in CNNs with vectors to retain spatial relationships among the features. In this paper, we propose a CapsNet architecture that employs individual video frames for human action recognition without explicitly extracting motion information. We also propose weight pooling to reduce the computational complexity and improve the classification accuracy by appropriately removing some of the extracted features. We show how the capsules of the proposed architecture can encode temporal information by using the spatial features extracted from several video frames. Compared with a traditional CNN of the same complexity, the proposed CapsNet improves action recognition performance by 12.11% and 22.29% on the KTH and UCF-sports datasets, respectively.
History
Event
IEEE Signal Processing Society. Conference (44th : 2019 : Brighton, Eng.)Series
IEEE Signal Processing Society ConferencePagination
3867 - 3871Publisher
Institute of Electrical and Electronics EngineersLocation
Brighton, Eng.Place of publication
Piscataway, N.J.Publisher DOI
Start date
2019-05-12End date
2019-05-17ISSN
1520-6149ISBN-13
9781479981311Language
engPublication classification
E1 Full written paper - refereedCopyright notice
2019, IEEEEditor/Contributor(s)
[Unknown]Title of proceedings
ICASSP 2019 : Proceedings of the 2019 44th IEEE International Conference on Acoustics, Speech and Signal ProcessingUsage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC