A hyperbolic penalty method to solve structured convex minimization problems

This paper presents a decomposition algorithm based on the smooth hyperbolic penalty, which leads to a scheme suitable for parallelized computations. The proposed algorithm can be seen as a separable version of the earlier hyperbolic penalty method built, and its main idea is related to a penalty-type scheme mixed with a kind of resource allocation approach to decompose large scale separable constrained minimization programs.


Introduction
*There has been considerable recent interest in solving large-scale optimization problems. Such problems arise in many specialties, for instance, multistage stochastic optimization, distributed model predictive control, transportation, telecommunication models, networks, and deep learning. More recently, the technology of big and fast computers favorable to parallel computations has helped a lot and encouraged many researchers to continue tackling larger and larger models. But it is still a challenging area where many real-world engineering problems are waiting for more and more bright ideas (classical or non-classical ones) to cope with their multi-millions decision variables. Basic references are Bertsekas and Tsitsiklis (1989) and Lasdon (1970), where the main motivation for decomposing appear separately, mainly:  Splitting into many weakly coupled subsystems.  Partitioning variable and/or constraint sets to isolate easier subproblems; decentralizing global optimal decisions among local decision levels.  Parallelizing or distributing computations on specific parallel computing device.
Depending on the motivation, the way to split or partition the model may rely on different structural properties of a given problem. It is well-known that the classical coordination functions are in general non-smooth and, therefore, hard to be optimized, turning the price to pay for decomposition too high for computational purposes. The computational cost is not the only drawback as the non-smoothness of the coordination function almost always implies that non-unique solutions are introduced in the subproblems, which reduce the possibility of decentralizing them completely. These are the reasons why some penalization ideas, regularization techniques like the proximal point method look attractive. Besides the possibility of treating nonsmooth convex coordination problems efficiently, the introduction of the quadratic penalty terms, for instance, in the subproblems, can impose unique solutions to guarantee decentralized procedures. Classical primal penalty quadratic methods (Fiacco and McCormick, 1990;Fletcher, 1975;Kort and Bertsekas, 1976;Polyak, 2001), when applied to large separable constrained models, lose the separability which allows decomposition of the subproblems' penalized objective. Indeed, this is removed by the presence of quadratic terms in the penalized potential functional. The same diagnostic can be reached by applying many other penalty schemes. Particularly, the Hyperbolic penalty method introduced and studied by Xavier (2001) and used in Melo et al. (2011;2012) and Evirgen (2017).
Motivated by the effective results obtained in Melo et al. (2011) to solve some difficult mathematical problems with complementarity constraints (MPCC) and those obtained in Evirgen (2017), where a nonlinear dynamic system was constructed using the hyperbolic penalty function for a certain class of inequality constrained optimization problems. We aim to use this intriguing hyperbolic penalty function in order to develop a decomposition scheme favorable to parallel computations. Polyak (2001) proposed a nonlinear re-scaling algorithm based on the log-sigmoid functional, and he obtained some nice results and properties. This again motivated us to re-launch the use of the new hyperbolic functional (Xavier, 2001), which is twice continuously differentiable and is combining features of both exterior and interior penalty methods. Our approach consists in mixing the hyperbolic penalty algorithmic scheme with some recent decomposition algorithms developed by Hamdi et al. (1997), Hamdi and Mahey (2000) and Hamdi (2005aHamdi ( , 2005b. These decomposition class of methods known as separable augmented Lagrangian algorithms (SALA) can be derived from the resource directive sub-problems associated with the coupling constraints. A complete review of decomposition methods for convex and non-convex optimization minimization problems can be found in Hamdi and Mishra (2011).
The paper is based upon the idea originally discussed by Hamdi et al. (1997), where the authors proposed a decomposable scheme that overcame the non-separability of the obtained augmented Lagrangian function. But here, we limit ourselves to propose a primal method based on the hyperbolic penalty functional far from augmented lagrangians where we have separable penalized subproblems.
This idea adds to the large number of publications that we find in the literature, aiming to build separable subproblems having strong legitimate theoretical properties. We limit ourselves to smooth convex cases, where some direct strategies have been proposed to exploit the inner structure of the penalty function and turn it into a separable one (see, for instance, the survey (Hamdi and Mishra, 2011) and references therein).
The paper is organized as follows: In the next section, we present the hyperbolic penalty method of Xavier, followed by the proposed decomposition method. Section 4 contains the convergence analysis of the new algorithms. The last section contains some algorithmic issues and suggestions to extend the proposed algorithm.

Hyperbolic penalty method
Let be a convex real-valued function and let ( 1 ( ), ⋯ , ( )) ⊤ be finite concave real-valued functions on , and consider the convex programming problem: Alternatively, the hyperbolic penalty function may be put in a more convenient form: where, ⩾ 0 and ⩾ 0. The graphic representation of ( , , ), as shown in Fig. 1, is a hyperbola with two asymptotes, a slant one forming an angle − with the -axis and a horizontal one. Also, the graph has as the -intercept.
Here are some properties of the function ( , , ) that will be used in this paper. All these properties are proved in Xavier (2001).
• ( , , ) is:  A convex decreasing function of y for > 0 and ≥ 0.  A convex non-increasing function of y for = 0 and ≥ 0.  A convex function equal to for = 0.

Model formulation
In this section, we will build up a new hyperbolic decomposition algorithm (HDA) algorithm to solve too large scale convex inequality constrained programs with separable structure. We are concerned here with block separable nonlinear constrained optimization problems: is the convex set where are defined from ℜ → ℜ for = 1, , ∑ =1 = . The above constraint is usually referred to as a coupling constraint. Along with this work, all the functions , are 2 and we assume the following assumptions. Assumptions 1: A1: The optimal set * of ( ) is nonempty and bounded. A2: The Slater's condition holds, i.e., ∃ ̅ ∈ ℜ : ∑ =1 ( ̅ ) > 0, = 1, .

New HDA
To build our new decomposition algorithm, we follow the well-known resources allocation scheme. Since we aim to develop an algorithm to decompose large scale, structured optimization models, we thought about the iterative approach studied and developed by Hamdi et al. (1997), Hamdi (2005b), and Hamdi and Mishra (2011), the Separable Augmented Lagrangian Algorithm (SALA). To this goal, allocation vectors are added in a such a way to get the equivalent problem † , to which we apply the Hyperbolic Penalty Method introduced previously with partial elimination of the constraints. In other words, for > 0, > 0, the potential function related to the problem ( ) is defined as follows: ( , , , ) = ∑ =1 ( ) + ∑ =1 ( ( ) + , , ), (1) † If ( * , * ) is an optimal solution to then * is an optimal solution to ( ).
where ( , , ) = − + √ 2 2 + 2 . Thus it is clear that the functional is separable, i.e., Thus, the Hyperbolic Penalty algorithm can be applied as follows: For any , > 0, the following minimization problem, and +1 = , The minimization in Eq. 2 is done by alternating the minimization with respect to , then followed by the one with respect to the allocation variable . i.e., we fix = and find: Then we can split the above minimization into independent sub-problems with low-dimension. i.e., +1 ∈ min{ ( ) + ( ( ) + , , )} and now we fix = +1 to solve for +1 It is not hard to solve the minimization explicitly with respect to as shown in the following lemma, which gives us also an important remark about allocation variables.
Proof: By writing the classical Lagrangian to Eq. 4 where ∈ − 2.5 , and using the optimality, with the fact that ∑ =1 +1 = 0, we obtain after some direct calculations = − 2 ( ( +1 )+ +1 ) which means that − does not depend on and has the same sign as ( +1 ) + +1 for any = 1, . Now, according to Eq. 6, we have: and straightforwardly after summing both sides and using the definition of +1 we reach and after some obvious calculations, we obtain: and by plugging directly Eq. 9 in Eq. 7 and after some simplifications and using the fact that − and +1 has the same sign as ( +1 ) + +1 , we obtain directly +1 = − ( +1 ) + +1 , = 1, .

Properties and convergence analysis
For our analysis, we need some assumptions.

Assumptions 2:
A3: There is a pair ( 0 , 0 ) such that: A4: There is a value > 0 such that the set is bounded.
A5: The derivative of the functions and , = 1, are bounded in the set Ω .
The following theorem shows the existence of the minimum of the function ( , , , ) and consequently for the functions . for all ⩾ and 0 ≤ ≤ 0 , ∑ =1 = 0.
Proof: We preferred to add it in the Appendix A of the paper.
The next theorem will be similar to the result given in Xavier (2001), which shows a conditional convergence of a feasible minimum point sequence.
Theorem 1: If conditions (A1)-(A5) are satisfied, and if lim k→∞ τ k = 0 and λ ≥ λ then a convergent sub-sequence {x k } → x will exist, and the limit of any of these sub-sequences is an optimal point. Proof: For any ≥ the point will be feasible and then for any point * ∈ * we have On the other hand, is a minimum point of . Then, Since lim →∞ = 0 and from ( 1 ), by Eq. 11 we have: Since Ω is compact set then there exists a subsequence of { } that will converge to ̃∈ * .

Numerical study
This section is devoted to some numerical tests where we study the numerical behavior of the HDA. The study will tackle the feasibility, optimality, and stability of this method with respect to the parameters involved. Furthermore, an extension of the HDA, the Proximal Hyperbolic Decomposition Algorithm (PHDA), described below, is tested. This study is completed by a brief comparison involving HDA, PHDA, and the well-known strong CVX tool for some Convex Programming models developed by S.T. Boyd and M.C Grant from Standford University (Grant and Boyd, 2020). CVX is a Matlab-based modeling system for constructing and solving some convex programs (CPs). CVX supports a number of standard problem types, including linear and quadratic programs, and it is mainly based on primal-dual interior-point techniques.
To solve the unconstrained minimization problems in Step 2 of Algorithms (HDA) and (PHDA), we used "fminunc" function in MATLAB (Coleman et al., 1999), amending its default setting to apply the Quasi-Newton method. In addition, the CVX employs a primal-dual interior-point algorithm (Grant and Boyd, 2014;2020).

Algorithmic considerations
In practice, the penalty parameters , play a fundamental role in the behavior and efficiency of the proposed algorithms of type HDA and in general for any algorithms based on penalization and or on augmented (modified) Lagrangian. These parameters can be used to reach some accepted feasibility of the iterates. In order to avoid the case where becomes too large, we have fixed an upper bound . We have used convenient stopping criteria, which are similar to those used by Breitfeld and Shanno (Breitfeld and Shanno, 1996). This 2parameters penalization depends on two positive parameters and which proceeds as follows: • In the initial phase of the process, increases, causing a significant increase of the penalty at infeasible points, while a reduction in penalty is observed for points inside the feasible region. This way, the search is directed to the feasible region since the goal is to minimize the penalty. • From the moment that a feasible point is obtained, the second parameter decreases. Now, we propose to update and according to the following scheme where > 1 and 0 < < 1.

HDA Performances
To evaluate the performance of implementable versions of HDA, we have presented in Tables 1-2 some partial results at each iteration with the corresponding given data and parameters. We have gathered the feasibility and the variations in the objective function at each iteration. Table 3 gives iterations number, feasibility, the variations in the objective function, and the needed time of computations for different sizes of model ( , ). Furthermore, Table 2 presents the changes in the parameter at each iteration (Note that the parameter = 10 is fixed).   In Tables 4-8, we have studied the influence of the penalty parameters and . We have used different initial values of these parameters, and some conclusions could be drawn according to these tables. In general the case 0 = 0.01, 0.5, 1 are better than 0 = 10 and more, but one has to be careful because sometimes when is so large, feasibility is rapidly reached, but optimality does not follow at the same speed (Table 7).  Remark 1: We observed that in the case where λ 0 was chosen less than 1, the method may reach rapidly feasible points but stuck far from the optimal value, or it diverges. For instance, for P(500,5) with τ 0 = 100, (HDA) diverges when λ 0 = 0.1 and λ 0 = 0.7. In addition, for λ 0 = 0.9, the algorithm stopped after reaching a feasible point with f obj = −111.981 while the optimal objective obtained by the software CVX was f CVX * = −112.08. But, when λ = 1, CVX beats HDA lightly in time, and we get: f obj * = −112.08 = f CVX * in 1.80012 sec, where CVX needs 1.59872 sec. The same conclusions were drawn for the following case (P(50,50), τ0 = 10) ( Table 6). In Tables 8 and 9, we tested problems ( , , ) for different dimensions from 100 to 12500 variables. The obtained results show the efficiency of HDA with respect to the CPU time used to reach convergence.
One may observe that the values of 0 are not influencing the CPU-Time heavily. Except when the number of variables is more than 5000, we observed that when 0 is greater of equal than 100 HDA is faster than the other tested cases (Tables 8 and 9).     In the following Tables 10 and 11, we show that sometimes, we may face the influence of other internal parameters. For instance, the factors ( , ) to increase and decrease the and respectively. According to many tests, feasibility is not altered, but the obtained objective function values may be unstable. This is due to the fast feasibility that stops the penalization effect of the algorithm and may cause slow convergence or produce some jumps around the optimal solution. Table 12, shows the CPU Time needed to solve large scale problems reaching 10 6 variables.

Extension and comparison
In order to increase the stability of the HDA and enrich it with some nice properties, we propose here to mix the proximal point technique (Rockafellar, 1976) with our (HDA).
In other words, we add a quadratic term 1 2 ∥ − ∥ 2 to the hyperbolic penalty function. Thus, it is an easy exercise to extend the convergence analysis in Section 4 for our proximal hyperbolic decomposition algorithm.    As it was done for HDA, to evaluate the performance of implementable versions of PHDA, Tables 13 and 14 present some partial results per iteration.  In Tables 13-18, we compared the three algorithms on the same set of problems with different dimensions. We focused on the role of different parameters that produce nice results.

Remark 2:
• CVX for the above problems got many problems with memory, and sometimes it reached only feasible solutions. For large dimensions, CVX generally did not give good results. But for dimensions less than 5000, CVX is very efficient and beats HDA and its proximal version, as shown in Table 17. • PHDA efficiency is closely related to HDA, but by tuning the proximal parameter , we observed that PHDA is faster than HDA for solving problems ( , , ) as it is shown in Table 18. Not easy to understand, the inverse happened for problems ( , ) (Table 14). In general, we expected more stability around the optimal solution, but we think that the hyperbolic penalty term is dominant during the calculations, and the proximal term is unable to play its role as it should be. This situation is related to the proximal parameter which needs some careful tuning in order to avoid any perturbations.

Conclusion
In this paper, we were interested in the development of the Hyperbolic Decomposition Algorithm (HDA) favorable to parallel computations. The main challenge was to overcome the nonseparability of the obtained hyperbolic penalty function. The algorithm converges under some classical conditions and the nice properties of the hyperbolic functional. We ended our paper by a numerical study, where we discussed the behavior of the (HDA), its parameters' influence. In addition, we proposed a regularized version of HDA, the PHDA and we compared it to (HDA) and to a very strong algorithm CVX (Grant and Boyd, 2014). Since the (HDA) is mainly based on a penalization, so the penalty parameter influence is well-known and is shown in the numerical part. (HDA) can be made faster if we could program it on parallel processors machines.    ( ( ) + , , ) < ( ( ) + , 0, ) = for > 0. For = 0, ( ( ) + , , ) = 0 = since ( ) + > 0. Hence,