Encoding Feature Maps of CNNs for Action Recognition_第1页
Encoding Feature Maps of CNNs for Action Recognition_第2页
Encoding Feature Maps of CNNs for Action Recognition_第3页
Encoding Feature Maps of CNNs for Action Recognition_第4页
Encoding Feature Maps of CNNs for Action Recognition_第5页
已阅读5页,还剩14页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Encoding Feature Maps of CNNs for Action RecognitionXiaojiang Peng, Cordelia SchmidLEAR-Inria, Grenoble, FranceSummary of LEAR SubmissionImproved DT and Fisher vectorCNN features from very deep ConvNets Key component: Encoding CNN feature mapsImproved DT and Fisher vector Overview Details: Set video

2、s to be at most 320p wide Preprocess IDT features by PCA-Whiten with a factor of 2 Perform power+intra+L2 normalization for FVsInput videoIDT (HOG/HOF/MBH)Fisher vectorPower+Intra+L2-normIntra-normalization Good way to suppress bursty visual elements Perform L2 normalization for each FV block (mean

3、and variation components are separated) independently1-21Arandjelovic, R., Zisserman, A.: All about VLAD. CVPR,2013.2X. Peng, Y. Qiao: Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. arXiv:1405.4506, 2014. CNN features (1) Very deep ConvNets (VGG

4、19) fine-tuningTraining data: all frames on UCF101 and every 5 frames on Thumos 2015 validation set (256x256x3)Data augmentation: cropping and flipping at four cornels and the center, inputs are 224x224x3Batch size: 16 (Our GPU memory limitation)Dropout: 0.5 for fc6 and fc7Max iterations: 20kCNN fea

5、tures (2) Feature extractionRescale frames to 224x224x3 and feed-forward themKeep conv5_4 maps and fc6 activationsCNN features (3) Conv5_4 feature maps are local featuresConv5.Local features for Fi512512whwh.Video.FiEach pixel (pink square in the middle image) in the Conv5 feature map is actually a

6、feature for the corresponding patch in original frame. We obtain w*h 512-D features for frame fiCNN features (4) Encoding Feature MapsPreprocess Conv5_4 local features by PCA-Whiten with a factor of 2Construct a codebook of size 256 using k-meansApply VLAD encoding and power+intra+L2 normalization f

7、or video representationsInput videoVideo framesIDT featuresFisher vectorPower +Intra + L2 normFusion& SVMVGG19 Conv5VLADExperiments(1) Train/test setupTr1: train on UCF101, test on validation set and report the mAPTr2: 10 train/test folds on validation set, report the mean and the standard devia

8、tion of mAP. Baseline: IDT+FVTr1Tr2IDT(HOG/HOF/MBH)+FV52.23%69.190.8%Experiments(2) Evaluation of CNN features and pooling methods Conclusions: Conv5_4 is better than fc6 without fine-tuningOriginal CNN model is trained to abstract concepts for object classification rather than action recognitionEnc

9、oding Conv5_4 feature maps significantly outperforms others Tr1Tr2Avg-poolingMax-poolingVLADVLADConv5_446.02%34.3%56.95%68.71.1%fc639.38%28.38%-Table 1. Evaluation of Conv5_4 and fc6 without fine-tuningExperiments(3) Evaluation of CNN features and pooling methods Conclusions: Fine-tuning does improv

10、e performanceLarge improvement can be obtained by Conv5+VLAD When using “Tr1”, but a little bit when using “Tr2”.The difference between “Tr1” and “Tr2” suggests the appearance of UCF101 and validation set is very different.Table 2. Evaluation of Conv5_4 and fc6 with fine-tuningTr1Tr2Avg-poolingVLADA

11、vg-poolingVLADConv5_457.47%63.87%69.321.1%74.361.3%fc647.07%-72.321.1%-Experiments(4) Feature combinationsIndexMethodTr1Tr21IDT (HOG+HOF+MBH)52.23%69.190.8%2Conv5-VLAD63.87%74.361.3%3Conv5-avg57.47%69.321.1%4fc6-avg47.07%72.321.1%5IDT+ Conv5-VLAD65.11%76.211.0%6IDT+ Conv5-avg62.95%75.380.8%7IDT+ fc6

12、-avg58.59%76.11.0%8IDT+Conv5-VLAD+Conv5-avg66.17%77.690.9%9IDT+ Conv5-VLAD+fc6-avg64.84%79.361.0%10IDT+ Conv5-VLAD+Conv5-avg+fc6-avg66.64%79.521.1%11IDT+ Conv5-VLAD+Conv5-avg+softmax-87.450.8%Experiments(4) Conclusions:CNN features complement the IDT featuresAll independent CNN based methods outperf

13、orm IDT+FV when using “Tr2”Conv5-avg and fc6-avg complement the IDT and Conv5-VLAD features, see 5 vs. 8 vs. 10 (table in last slide)Conv5-avg and fc6-avg capture global information while Conv5-VLAD does notTest resultsMethod ( + IDT-FV)Tr1Tr2Post-proTest mAPRun1Conv5-VLAD+Conv5-avg(6FPS)66.17%77.69

14、0.9%-68.13%Run2Conv5-VLAD+Conv5-avg(6FPS)66.17%77.690.9%+68.11%Run3Conv5-VLAD+Conv5-avg + softmax (1FPS)-87.450.8%+53.95%Run4Conv5-VLAD+Conv5-avg (1FPS)-+67.38%Run5Conv5-VLAD+Conv5-avg+fc6-avg (6FPS)66.64%79.521.1%+67.93%ConclusionsCombining fc6 features doesnt improve test resultsCombining softmax

15、scores leads to overfittingTrain/test setup is important since different observations can be obtained on different setup Code available http:/lear.inrialpes.fr/software Improved dense trajectories Fisher vector encoding VGG19 fine-tuning model (coming soon)Statistics on validation set (1)The three easiest classes: UnevenBars, RockClimbingIndoor, Skiing Statistics on validation set (2)The three easiest classes: CricketShot, Haircut, BasketballEasiest classes Snapshots from

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论