iccv2019论文全集8360-multiway-clustering-via-tensor-block-models_第1页
iccv2019论文全集8360-multiway-clustering-via-tensor-block-models_第2页
iccv2019论文全集8360-multiway-clustering-via-tensor-block-models_第3页
iccv2019论文全集8360-multiway-clustering-via-tensor-block-models_第4页
iccv2019论文全集8360-multiway-clustering-via-tensor-block-models_第5页
已阅读5页,还剩6页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Multiway clustering via tensor block models Miaoyan Wang University of Wisconsin Madison Yuchen Zeng University of Wisconsin Madison Abstract We consider the problem of identifying multiway block structure from a large noisy tensor. Such problems arise frequently

2、 in applications such as genomics, recommendation system, topic modeling, and sensor network localization. We propose a tensor block model, develop a unifi ed least-square estimation, and obtain the theoretical accuracy guarantees for multiway clustering. The statistical convergence of the estimator

3、 is established, and we show that the associated clustering procedure achieves partition consistency. A sparse regularization is further developed for identifying important blocks with elevated means. The proposal handles a broad range of data types, including binary, continuous, and hybrid observat

4、ions. Through simulation and application to two real datasets, we demonstrate the outperformance of our approach over previous methods. 1Introduction Higher-order tensors have recently attracted increased attention in data-intensive fi elds such as neuroscience 1, social networks 2, computer vision

5、3, and genomics 4,5. In many applications, the data tensors are often expected to have underlying block structure. One example is multi-tissue expression data 4 , in which genome-wide expression profi les are collected from different tissues in a number of individuals. There may be groups of genes s

6、imilarly expressed in subsets of tissues and individuals; mathematically, this implies an underlying three-way block structure in the data tensor. In a different context, block structure may emerge in a binary-valued tensor. Examples include multilayer network data 2, with the nodes representing the

7、 individuals and the layers representing the multiple types of relations. Here a planted block represents a community of individuals that are highly connected within a class of relationships. InputOutputInputOutput Figure 1:Examples of tensor block model (TBM). (a) Our TBM method is used for multiwa

8、y clustering and for revealing the underlying checkerbox structure in a noisy tensor. (b) The sparse TBM method is used for detecting sub-tensors of elevated means. This paper presents a new method and the associated theory for tensors with block structure. We develop a unifi ed least-square estimat

9、ion procedure for identifying multiway block structure. The proposal applies to a broad range of data types, including binary, continuous, and hybrid observations. We establish a high-probability error bound for the resulting estimator, and show that the procedure enjoys consistency guarantees on th

10、e block structure recovery as the dimension of the data tensor grows. Furthermore, we develop a sparse extension of the tensor block model for block selections. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. Figure 1 shows two immediate examples of our me

11、thod. When the data tensor possesses a checkerbox pattern modulo some unknown reordering of entries, our method amounts to multiway clustering that simultaneously clusters each mode of the tensor (Figure 1a). When the data tensor has no full checkerbox structure but contains a small numbers of sub-t

12、ensors of elevated means, we develop a sparse version of our method to detect these sub-tensors of interest (Figure 1b). Related work.Our work is closely related to, but also clearly distinctive from, the low-rank tensor decomposition. A number of methods have been developed for low-rank tensor esti

13、mation, including CANDECOMP/PARAFAC (CP) decomposition 6 and Tucker decomposition 7. The CP model decomposes a tensor into a sum of rank-1 tensors, whereas Tucker model decomposes a tensor into a core tensor multiplied by orthogonal matrices in each mode. In this paper we investigate an alternative

14、block structure assumption, which has yet to be studied for higher-order tensors. Note that a block structure automatically implies low-rankness. However, as we will show in Section 4, a direct application of low rank estimation to the current setting will result in an inferior estimator. Therefore,

15、 a full exploitation of the block structure is necessary; this is the focus of the current paper. Our work is also connected to biclustering 8 and its higher-order extensions 9,10. Existing multiway clustering methods 9,10,5,11 typically take a two-step procedure, by fi rst estimating a low-dimensio

16、n representation of the data tensor and then applying clustering algorithms to the tensor factors. In contrast, our tensor block model takes a single shot to perform estimation and clustering simultaneously. This approach achieves a higher accuracy and an improved interpretability. Moreover, earlier

17、 solutions to multiway clustering 12,9 focus on the algorithm effectiveness, leaving the statistical optimality of the estimators unaddressed. Very recently, Chi et al 13 provides an attempt to study the statistical properties of the tensor block model. We will show that our estimator obtains a fast

18、er convergence rate than theirs, and the power is further boosted with a sparse regularity. 2Preliminaries We begin by reviewing a few basic factors about tensors 14. We useY = Jyi1,.,iKK Rd1dK to denote an order-K (d1,.,dK)-dimensional tensor. The multilinear multiplication of a tensor Y Rd1dKby ma

19、trices Mk= Jm(k) ik,jkK R skdk is defi ned as Y 1M1. KMK= J X i1,.,iK yi1,.,iKm(1) i1,j1.m (K) iK,jKK, which results in an order-Ktensor(s1,.,sK)-dimensional tensor. For any two tensorsY = Jyi1,.,iKK,Y0= Jy0 i1,.,iKK of identical order and dimensions, their inner product is defi ned as hY,Y0i = P i1

20、,.,iKyi1,.,iKy 0 i1,.,iK. The Frobenius norm of tensor Y is defi ned askYkF= hY,Yi1/2; it is the Euclidean norm ofYregarded as an Q kdk-dimensional vector. The maximum norm of tensorY is defi ned askYkmax= maxi1,.,iK|yi1,.,iK|. An order-(K-1) slice ofYis a sub-tensor of Y obtained by holding the ind

21、ex in one mode fi xed while letting other indices vary. A clustering ofdobjects is a partition of the index setd := 1,2,.,dintoRdisjoint non-empty subsets. We refer to the number of clusters,R, as the clustering size. Equivalently, the clustering (or partition) can be represented using the “membersh

22、ip matrix”. A membership matrixM RRdis an incidence matrix whose(i,j)-entry is 1 if and only if the elementjbelongs to the clusteri, and 0 otherwise. Throughout the paper, we will use the terms “clustering”, “partition”, and “membership matrix” exchangeably. For a higher-order tensor, the concept of

23、 index partition applies to each of the modes. A block is a sub-tensor induced by the index partitions along each of theKmodes. We use the term “cluster” to refer to the marginal partition on modek, and reserve the term “block” for the multiway partition of the tensor. 3Tensor block model LetY = Jyi

24、1,.,iKK Rd1dKdenote an order-K,(d1,.,dK)-dimensional data tensor. The main assumption of tensor block model (TBM) is that the observed data tensorYis a noisy realization of an underlying tensor that exhibits a checkerbox structure (see Figure 1a). Specifi cally, suppose that thek-th mode of the tens

25、or consists ofRkclusters. If the tensor entryyi1,.,iKbelongs to the block determined by the rkth cluster in the mode k for rk Rk, then we assume that yi1,.,iK= cr1,.,rK+ i1,.,iK,for (i1,.,iK) d1 dK,(1) 2 wherecr1,.,rKis the mean of the tensor block indexed by(r1,.,rK), andi1,.,iKs are indepen- dent,

26、 mean-zero noise terms to be specifi ed later. Our goal is to (i) fi nd the clustering along each of the modes, and (ii) estimate the block meanscr1,.,rK, such that a corresponding blockwise-constant checkerbox structure emerges in the data tensor. The tensor block model(1)falls into a general class

27、 of non-overlapping, constant-mean clustering models 15, in that each tensor entry belongs to exactly one block with a common mean. The TBM can be equivalently expressed as a special tensor Tucker model, Y = C 1M12 KMK+ E,(2) whereC = Jcr1,.,rKK RR1RKis a core tensor consisting of block means,Mk 0,1

28、dkRk isamembershipmatrixindicatingtheblockallocationsalongmodekfork K, andE = Ji1,.,iKK is the noise tensor. We view the TBM (2) as a super-sparse Tucker model, in the sense that the each column of Mkconsists of one copy of 1s and massive 0s. We make a general assumption on the noise tensorE. The no

29、ise termsi1,.,iKs are assumed to be independent, mean-zero-subgaussian, where 0is the subgaussianity parameter. More precisely, Eei1,.,iK e 22/2, for all (i1,.,iK) d1 dK and all R.(3) Th assumption(3)incorporates common situations such as Gaussian noise, Bernoulli noise, and noise with bounded suppo

30、rt. In particular, we consider two important examples of the TBM: Example 1 (Gaussian tensor block model)LetYbe a continuous-valued tensor. The Gaussian tensor block model (GTBM)yi1,.,iKi.i.d.N(cr1,.,rK,2)is a special case of model(1), with the subgaussianity parameterequal to the error variance. Th

31、e GTBM serves as the foundation for many tensor clustering algorithms 12, 4, 13. Example 2 (Stochastic tensor block model)LetYbe a binary-valued tensor. The stochastic tensor block model (STBM)yi1,.,iKi.i.d.Bernoulli(cr1,.,rK)is a special case of model(1), with the subgaussianity parameterequal to 1

32、 4. The STBM can be viewed as an extension, to higher-order tensors, of the popular stochastic block model 16,17 for matrix-based network analysis. In the fi led of community detection, multi-layer stochastic model has also been developed for multi-relational network data analysis 18, 19. More gener

33、ally, our model also applied to hybrid error distributions, in which different types of distribution are allowed for different portions of the tensor. This scenario may happen, for example, when the data tensor Y represents concatenated measurements from multiple data sources. Before we discuss the

34、estimation, we present the identifi ability of the TBM. Assumption 1 (Irreducible core)The core tensorCis called irreducible if it cannot be written as a block tensor with the number of mode-k clusters smaller than Rk, for any k K. In the matrix case(K = 2), the irreducibility is equivalent to sayin

35、g thatChas no two identical rows and no two identical columns. In the higher-order case, the assumption requires that none of order- (K-1) slices of C are identical. Note that irreducibility is a weaker assumption than full-rankness. Proposition 1 (Identifi ability)Consider a Gaussian or Bernoulli T

36、BM(1). Under Assumption 1, the factor matrices Mk s are identifi able up to permutations of cluster labels. The identifi ability property for the TBM outperforms that for the classical factor model 20,21. In the Tucker 22,14 and many other factor analyses 20,21 , the factors are identifi able only u

37、p to orthogonal rotations. Those models recover only the (column) space spanned byMk, but not the individual factors. In contrast, our model does not suffer from rotational invariance, and as we show in Section 4, every individual factor is consistently estimated in high dimensions. This brings a be

38、nefi t to the interpretation of factors in the tensor block model. We propose a least-square approach for estimating the TBM. Let = C 1M12 KMK denote the mean signal tensor with block structure. The mean tensor is assumed to belong to the 3 following parameter space PR1,.,RK= ? Rd 1dK: = C 1M12 K MK

39、,with some membership matrices Mks and a core tensor C RR1RK ?. In the following theoretical analysis, we assume the clustering sizeR = (R1,.,RK)is known and simply writePfor short. The adaptation of unknownRwill be addressed in Section 5.2. The least-square estimator for the TBM (1) is = argmin P ?

40、2hY,i + kk2 F ? .(4) The objective is equal (ignoring constants) to the sum of squareskY k2 F and hence the name of our estimator. 4Statistical convergence In this section, we establish the convergence rate of the least-squares estimator (4) for two measure- ments. The fi rst measurement is mean squ

41、ared error (MSE): MSE(true,) = 1 Q kdk ktrue k2 F, wheretrue, Pare the true and estimated mean tensors, respectively. While the loss function corresponds to the likelihood for the Gaussian tensor model, the same assertion does not hold for other types of distribution such as stochastic tensor block

42、model. We will show that, with very high probability, a simple least-square estimator achieves a fast convergence rate in a general class of block tensor models. Theorem 1 (Convergence rate of MSE)Let be the least-square estimator oftrueunder model (1). There exists two constants C1,C2 0 such that,

43、MSE(true,) C12 Q kdk Y k Rk+ X k dklogRk ! (5) holds with probability at least1 exp(C2(QkRk+ P kdklogRk)uniformly overtrue P and all error distribution satisfying (3). The convergence rate of MSE in(5) consists of two parts. The fi rst part Q kRk is the number of parameters in the core tensorC, whil

44、e the second part P kdklogRk refl ects the the complexity for estimating Mks. It is the price that one has to pay for not knowing the locations of the blocks. We compare our bound with existing literature. The Tucker tensor decomposition has a minimax convergence rate proportional to P kdkR 0 k 22,

45、whereR0 k is the multilinear rank in the modek. Applying Tucker decomposition to the TBM yields P kdkRk, because the mode-krank is bounded by the number of mode-kclusters. Now, as both the dimensiondmin= minkdkand clustering size Rmin= minkRk tend to infi nity, we have Q kRk + P kdklogRk ? P kdkRk.

46、Therefore, by fully exploiting the block structure, we obtain a better convergence rate than previously possible. Recently, 13 proposed a convex relaxation for estimating the TBM. In the special case when the tensor dimensions are equal at every moded1= . = dK= d, their estimator has a convergence r

47、ate of orderO(d1)for allK 2. As we see from(5), our estimate obtains a much better convergence rate O(d(K1), which is especially favorable as the order increases. The bound(5)generalizes the previous results on structured matrix estimation in network analysis 23, 16. Earlier work 16 suggests the fol

48、lowing heuristics on the sample complexity for the matrix case: (number of parameters) + log(complexity of models) number of samples .(6) Our result supports this important principle for generalK 2. Note that, in the TBM, the sample size is the total number of entries Q kdk, the number of parameters

49、 is Q kRk, and the combinatoric complexity for estimating block structure is of order Q kR dk k . 4 Next we study the consistency of partition. To defi ne the misclassifi cation rate (MCR), we need to introduce some additional notation. LetMk= Jm(k) i,rK, Mk= J m(k) i,r0K be two mode- kmembership ma

50、trices, andD(k)= JD(k) r,r0K be the mode-kconfusion matrix with element D(k) r,r0 = 1 dk Pdk i=11m (k) i,r = m(k) i,r0 = 1, wherer,r0 Rk. Note that the row/column sum ofD(k) represents the nodes proportion in each cluster defi ned byMkor Mk. We restrict ourselves to non-degenerating clusterings; tha

51、t is, the row/column sums ofD(k)are lowered bounded by 0. With a little abuse of notation, we still useP = P()to denote the parameter space with the non-degenerating assumption. The least-square estimator(4)should also be interpreted with this constraint imposed. We defi ne the mode-k misclassifi ca

52、tion rate (MCR) as MCR(Mk, Mk) =max rRk,a6=a0Rk min n D(k) a,r, D (k) a0,r o . In other words, MCR is the element-wise maximum of the confusion matrix after removing the largest entry from each column. Under the non-degenerating assumption,MCR = 0if and only if the confusion matrixD(k)is a permutati

53、on of a diagonal matrix; that is, the estimated partition matches with the true partition, up to permutations of cluster labels. Theorem 2 (Convergence rate of MCR)Consider a tensor block model(2)with sub-Gaussian parameter . Defi ne the minimal gap between the blocksmin= mink(k), where(k)= minrk6=r

54、0 k maxr1,.,rk1,rk+1,.,rK(cr1,.,rk,.,rK cr1,.,r0 k,.,rK) 2. Let Mk,truebe the true mode-kmember- ship, Mkbe the estimator from (4). Then, for any 0,1, P(MCR( Mk,Mk,true) ) 21+ P kdkexp C 22 min 3K2QK k=1dk 2kCk2 max ! , where C 0 is a positive constant, and 0 the lower bound of cluster proportions.

55、The above theorem shows that our estimator consistently recovers the block structure as the dimension of the data tensor grows. The block-mean gapminserves the role of the eigen-separation as in the classical tensor Tucker decomposition 22. Table 1 summarizes the comparison of various tensor methods

56、 in the special case when d1= = dK= d and R1= = RK= R. MethodRecovery error (MSE)Clustering error (MCR)Block detection (see Section 6) Tucker 22dR-No CoCo 13dK1-No TBM (this paper)dlogR kCkmaxlog d min(3K2)/2 d(K1)/2Yes Table 1: Comparison of various tensor decomposition methods. 5Numerical implemen

57、tation 5.1Alternating optimization We introduce an alternating optimization for solving(4). Estimating consists of fi nding both the core tensor C and the membership matrices Mks. The optimization (4) can be written as (C, Mk) =argmin CRR1RK, membership matrices Mks f(C,Mk), wheref(C,Mk) = kY C 1M12

58、. KMKk2 F. The decision variables consist ofK + 1blocks of variables, one for the core tensorCandKfor the membership matricesMks. We notice that, if anyKout of theK + 1blocks of variables are known, then the last block of variables can be solved explicitly. This observation suggests that we can iteratively update one block of variables at a time while keeping others

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论