NIPS20198754-factor-group-sparse-regularization-for-efficient-low-rank-matrix-recovery_第1页
NIPS20198754-factor-group-sparse-regularization-for-efficient-low-rank-matrix-recovery_第2页
NIPS20198754-factor-group-sparse-regularization-for-efficient-low-rank-matrix-recovery_第3页
NIPS20198754-factor-group-sparse-regularization-for-efficient-low-rank-matrix-recovery_第4页
NIPS20198754-factor-group-sparse-regularization-for-efficient-low-rank-matrix-recovery_第5页
已阅读5页,还剩6页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Factor Group-Sparse Regularization for Effi cient Low-Rank Matrix Recovery Jicong Fan Cornell University Ithaca, NY 14850 Lijun Ding Cornell University Ithaca, NY 14850 Yudong Chen Cornell University Ithaca, NY 14850 Madeleine Udell Cornell Un

2、iversity Ithaca, NY 14850 Abstract This paper develops a new class of nonconvex regularizers for low-rank matrix recovery. Many regularizers are motivated as convex relaxations of the matrix rank function. Our new factor group-sparse regularizers are motivated as a relaxation of the

3、 number of nonzero columns in a factorization of the matrix. These nonconvex regularizers are sharper than the nuclear norm; indeed, we show they are related to Schatten-pnorms with arbitrarily small0 p 1. Moreover, these factor group-sparse regularizers can be written in a factored form that enable

4、s effi cient and effective nonconvex optimization; notably, the method does not use singular value decomposition. We provide generalization error bounds for low-rank matrix completion which show improved upper bounds for Schatten-pnorm reglarization aspdecreases. Compared to the max norm and the fac

5、tored formulation of the nuclear norm, factor group-sparse regularizers are more effi cient, accurate, and robust to the initial guess of rank. Experiments show promising performance of factor group-sparse regularization for low-rank matrix completion and robust principal component analysis. 1Introd

6、uction Low-rank matrices appear throughout the sciences and engineering, in fi elds as diverse as computer science, biology, and economics 1. One canonical low-rank matrix recovery problem is low-rank matrix completion (LRMC) 2,3,4,5,6,7,8,9,10, which aims to recover a low-rank matrix from a few ent

7、ries. LRMC has been used to impute missing data, make recommendations, discover latent structure, perform image inpainting, and classifi cation 11,12,1. Another important low-rank recovery problem is robust principal components analysis (RPCA) 13,14,15,16,17, which aims to recover a low-rank matrix

8、from sparse but arbitrary corruptions. RPCA is often used for denoising and image/video processing 18. LRMCTake LRMC as an example. SupposeM Rmnis a low-rank matrix withrank(M) = r ? min(m,n). We wish to recoverMfrom a few observed entries. Let m nindex the observed entries. Supposecard() , the numb

9、er of observations, is suffi ciently large. A natural idea is to recover the missing entries by solving minimize X rank(X), subject to P(X) = P(M),(1) 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. where the operatorP: Rmn Rmnacts on anyX Rmnin the follow

10、ing way: (P(X)ij= Xijif (i,j) and 0 if (i,j) / . However, since the direct rank minimiza- tion problem(1)is NP-hard, a standard approach is to replace the rank with a tractable surrogae R(X) and solve minimize X R(X), subject to P(X) = P(M).(2) Below we review typical choices ofR(X)to provide contex

11、t for our factor group-sparse regularizers. Nuclear and Schatten-p normsOne popular convex surrogate function for the rank function is the nuclear norm (also called trace norm), which is defi ned as the sum of singular values: kXk:= min(m,n) X i=1 i(X),(3) wherei(X)denotes thei-th largest singular v

12、alue ofX Rmn. Variants of the nuclear norm, including the truncated nuclear norm 19 and weighted nuclear norm 20, sometimes perform better empirically on imputation tasks. The Schatten-pnorms1with0 p 121,22,23 form another important class of rank surrogates: kXkSp:= ?min(m,n) X i=1 p i(X) ?1/p .(4)

13、Forp = 1, we havekXk1 S1 = kXk, the nuclear norm. For0 p 1,kXkp Sp is a nonconvex surrogate forrank(X). In the extreme casep = 0,kXk0 S0 = rank(X), which is exactly what we wish to minimize. Thus we seekXkp Sp with0 p 1interpolates between the rank function and the nuclear norm. Instead of (1), we h

14、ope to solve minimize X kXkp Sp, subject to P(X) = P(M), (5) with0 p 1. Smaller values ofp(0 p 1) are better approximations of the rank function and may lead to better recovery performance for LRMC and RPCA. However, for0 p 1the problem (5) is nonconvex, and it is not generally possible to guarantee

15、 we fi nd a global optimal solution. Even worse, common algorithms for minimizing the nuclear norm and Schatten-pnorm cannot scale to large matrices because they compute the singular value decomposition (SVD) in every iteration of the optimization 2, 3, 24. Factor regularizationsA few SVD-free metho

16、ds have been develoepd to recover large low-rank matrices. For example, the work in 25, 26 uses the well-known fact that kXk=min AB=X kAkFkBkF=min AB=X 1 2 ?kAk2 F + kBk2 F ?, (6) where A Rmd, B Rdn, and d rank(X). For LRMC they suggest solving minimize A,B 1 2kP(M AB)k 2 F + 2 ?kAk2 F + kBk2 F ?. (

17、7) In this paper, we use the name factored nuclear norm (F-nuclear norm for short) for the variational characterization of nuclear norm as minAB=X 1 2 ?kAk2 F + kBk2 F ? in (6). This expression matches the nuclear norm whendis chosen large enough. Srebro and Salakhutdinov 27 proposed a weighted F-nu

18、clear norm; the corresponding formulation of matrix completion is similar to (7). Note that to solve (7) we must fi rst choose the value ofd. We required r := rank(M)to be able to recover (or even represent)M. Anyd rgives the same solutionABto (7). However, asdincreases fromr, the diffi culty of opt

19、imizing the objective increases. Indeed, we observe in our experiments that the recovery error is larger for largedusing standard algorithms, particularly when the proportion of observed entries is low. In practice, it is diffi cult to guessr, and generally a very largedis required. The methods of 2

20、8 and 29 estimate r dynamically. 1Note that formally k kSpwith0 p 1is a quasi-norm, not a norm; abusively, we still use the term “norm” in this paper. 2 Another SVD-free surrogate of rank is the max norm, proposed by Srebro and Shraibman 30: kXkmax= min AB=X ? max i kaik?max j kbjk?, (8) whereaiandb

21、jdenotes thei-th row ofAand thej-th row ofBTrespectively. Lee et al. 31 proposed several effi cient algorithms to solve optimization problems with the max norm. Foygel and Srebro 5 provided recovery guarantees for LRMC using the max norm as a regularizer. Another very different approach uses implici

22、t regularization. Gunasekar et al. 32 show that for full dimensional factorization without any regularization, gradient descent with small enough step size and initialized close enough to the origin converges to the minimum nuclear norm solution. However, convergence slows as the initial point and s

23、tep size converge to zero, making this method impractical. Shang et al. 33 provided the following characterization of the Schatten-1/2 norm: kXkS1/2=min AB=X kAkkBk=min AB=X ?kAk+kBk 2 ?2. (9) Hence instead of directly minimizingkXk1/2 S1/2, one can minimizekAk+ kBk, which is much easier whenr d ? m

24、in(m,n). But again, this method and its extensionkAk+ 1 2kBk 2 F proposed in 34 required r, and the computational cost increases with largerd. Figure 1(d) shows these approaches are nearly as expensive as directly minimizingkXkp Sp whendis large. We call the regularizersminAB=X(kAk+kBk)andminAB=X(kA

25、k+ 1 2kBk 2 F)the Bi-nuclear norm and F2+nuclear norm respectively. Our methods and contributionsIn this paper, we propose a new class of factor group-sparse regularizers (FGSR) as a surrogate for the rank ofX. To derive our regularizers, we introduce the factorizationAB = Xand seek to minimize the

26、number of nonzero columns ofAorBT. Each factor group-sparse regularizer is formed by taking the convex relaxation of the number of nonzero columns. These regularizers are convex functions of the factorsAandBbut capture the nonconvex Schatten-p (quasi-)norms of X using the nonconvex factorization con

27、straint X = AB. We show that these regularizers match arbitrarily sharp Schatten-pnorms: for each0 0, and chooseq 1, 1 2, 1 4,. For any matrixX R mn withrank(X) = r d min(m,n), we have min X=Pd j=1ajb T j d X j=1 1 q kajkq+ kbjk =(1 + 1/q)q/(q+1) r X j=1 q/(q+1) j (X),(15) min X=Pd j=1ajb T j d X j=

28、1 1 q kajkq+ 2 kbjk2=(1/2 + 1/q)q/(q+2) r X j=1 2q/(2+q) j (X).(16) 4 By choosing an appropriateq, these representations give arbitrarily tight approximations to the rank, since kXkp Sp rank(X) as p 0. For example, use (16) in Theorem 2 when q = 1 4 to see min Pd j=1ajb T j=X d X j=1 1 1/4kajk 1/4 +

29、 2 kbjk2= 4.51/9 d X i=1 2/9 i (X) = 4.51/9kXk2/9 S2/9. (17) Equality holds in equation (15) whenA = 1/(q+1)UXS1/(q+1) X andB = 1/(q+1)Sq/(q+1) X V T X; in equation (16), when A = 1/(q+2)UXS2/(q+2) X and B = 1/(q+2)Sq/(q+2) X V T X. 4Application to low-rank matrix completion As an application, we mo

30、del noiseless matrix completion using FGSR as a surrogate for the rank: minimize X FGSR(X),subject to P(X) = P(M).(18) Take FGSR2/3as an example. We rewrite (18) as minimize X,A,B kAk2,1+ 2 kBk2 F, subject to X = AB, P(X) = P(M).(19) This problem is separable in the three blocks of unknownsX,A, andB

31、. We propose to use the Alternating Direction Method of Multipliers (ADMM) 35,36,37 with linearization to solve this problem, as the ADMM subproblem forAhas no closed-form solution. Details are in the supplement. Now consider an application to noisy matrix completion. Suppose we observeP(Me)with Me=

32、 M + E, where E represents measurement noise. Model the problem using FGSR2/3as minimize A,B kAk2,1+ 2 kBk2 F + 2 kP(Me AB)k2 F. (20) We can still solve the problem via linearized ADMM. However, proximal alternating linearized minimization (PALM) 38, 39 gives a more effi cient method. Details are in

33、 the supplement. Motivated by Theorem 2, we can also model noisy matrix completion with a sharper rank surrogate: minimize A,B 1 2kP(Me AB)k2 F + ?1 q kAkq 2,q+ 2 kBTk2 F ? ,(21) whereq 1, 1 2, 1 4,and kAk2,q:= ?Pd j=1kajk q?1/q. When q 0, consider a solution(A,B)to (21). Let kABkp Sp = Rp. Then use

34、 Theorem 2 to see that the following problem has the same solution, minimize kXkp SpRp,rank(X)d kP(Me X)k2 F. (22) Therefore, we may solve (21) using the methods described above to fi nd a solution to (22) effi ciently. In this section, we provide generalization error bounds for the solution M of (2

35、2). 5 5.1Bound with optimal solution Without loss of generality, we may assumekMk /mnfor some constant. Hence it is reasonable to assume that? = ?0/mnfor some constant?0. The following theorem provides a generalization error bound for the solution of (22). Theorem 3.SupposekMkp Sp Rp, Mis the optima

36、l solution of (22), and| 32 3 nlog2n. Denote := maxkMk,k Mk. Then there exist numerical constantsc1andc2such that the following inequality holds with probability at least 1 5n2 kM Mk2 F max c1 2nlogn | ,(5.5 + 10)R p (43?0+ c2)2 nlogn | !1p/2 .(23) When| is suffi ciently large, we see that the secon

37、d term in the brace of (23) is the dominant term, which decreases aspdecreases. A more complicated but more informative bound can be found in the supplement (inequality (24). In sum, Theorem 3 shows it is possible to reduce the matrix completion error by using a smaller p in (22) or a smaller q in (

38、21). 5.2Bound with arbitrary A and B Since (21) and (22) are nonconvex problems, it is diffi cult to guarantee that an optimization method has found a globally optimal solution. The following theorem provides a generalization bound for any feasible point (A, B) of (21): Theorem 4.SupposeMe= M + E. F

39、or any Aand B), let M = ABanddbe the number of nonzero columns of A . Defi ne := maxkMk,k Mk. Then there exists a numerical constant C0, such that with probability at least 1 2exp(n), the following equality holds: kM MkF mn kP(Me M)kF p|+ kEkF mn+ C0 ?ndlogn | ?1/4 . Theorem 4 indicates that if the

40、training errorkP(Me AB)kFand the numberdof nonzero columns of Aare small, the matrix completion error is small. In particular, ifE = 0andP(Me AB) = 0, the matrix completion error is upper-bounded byC0?ndlog n | ?1/4. We hope that a smaller qin (21) can lead to smaller training error anddsuch that th

41、e upper bound of matrix completion error is smaller. Indeed, in our experiments, we fi nd that smallerqoften leads to smaller matrix completion error, but the improvement saturates quickly asq decreases. We fi ndq = 1or 1 2 (corresponding to a Schatten-pnorm withp = 2 3 or 2 5) are enough to provide

42、 high matrix completion accuracy and outperform max norm and nuclear norm. 6Application to robust PCA Suppose a fraction of entries in a matrix are corrupted in random locations. Formally, we observe Me= M + E,(24) whereMis a low-rank matrix andEis the sparse corruption matrix whose nonzero entries

43、may be arbitrary. The robust principal component analysis (RPCA) asks to recoverMfromMe; a by-now classic approach uses nuclear norm minimization 13. We propose to use FGSR instead, and solve minimize A,B,E 1 q kAkq 2,q+ 2 kBk2 F + kEk1,subject to Me= AB + E,(25) where q 1, 1 2, 1 4,. An optimizatio

44、n algorithm is detailed in the supplement. 7Numerical results 7.1Matrix completion Baseline methodsWe compare the FGSR regularizers with the nuclear norm, truncated nuclear norm 19, weighted nuclear norm 20, F-nuclear norm, max norm 31, Riemannian pursuit 29, 6 Schatten-pnorm, Bi-nuclear norm 33, an

45、d F2+nuclear norm 34. We choose the parameters of all methods to ensure they perform as well as possible. Details about the optimizations, parameters, evaluation metrics are in the supplement. All experiments present the average of ten trials. Noiseless synthetic dataWe generate random matrices of s

46、ize500 500and rank50. More details about the experiment are in the supplement. In Figure 1(a), the factored methods all use factors of sized = 1.5r. We see the Schatten-pnorm (p = 2 3, 1 2, 1 4), Bi-nuclear norm, F 2+nuclear norm, FGSR2/3, and FGSR1/2have similar performances and outperform other me

47、thods when the missing rate (proportion of unobserved entries) is high. In particular, the F-nuclear norm outperforms the nuclear norm because the bounddon the rank is binding. In Figure 1(b) and (c), in which the missing rates are high, the max norm and F-nuclear norm are sensitive to the initial r

48、ankd, while the F2+nuclear norm, Bi-nuclear norm, FGSR2/3, and FGSR1/2always have nearly zero recovery error. Interestingly, the max norm and F-nuclear norm are robust to the initial rank when the missing rate is much lower than 0.6 in this experiment. In Figure 1(d), we compare the computational ti

49、me in the case of missing rate= 0.7, in which, for fair comparison, the optimization algorithms of all methods were stopped when the relative change of the recovered matrix was less than105or the number of iterations reached 1000. The computational cost of nuclear norm, truncated nuclear norm, weigh

50、ted nuclear norm, and Schatten-1 2 norm are especially large, as they require computing the SVD in every iteration. The computational costs of max norm, F-nuclear norm, F2+nuclear norm, and Bi-nuclear norm increase quickly as the initial rankdincreases. In contrast, our FGSR2/3and FGSR1/2are very ef

51、fi cient even when the initial rank is large, because they are SVD-free and able to reduce the size of the factors in the progress of optimization. While Riemannian pursuit is a bit faster than FGSR, FGSR has lower error. Note that the Riemannian pursuit code mixes C and MATLAB, while all other methods are written in pure MATLAB, explaining (part of) i

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论