An Iterative SISO Improved Complex Sphere Detection and Decoder for Turbo-MIMO Systems *

: An iterative soft-input soft-output (SISO) improved complex sphere detection and decoder algorithm is proposed for signal detection in Turbo-MIMO system. It forms candidate points set Θ in terms of an accumulated cost function based on a search arc constrained by the received signals. Then, the candidate points subset, the lower cost bound of which is not smaller than upper bound, is fathomed and dropped from further consideration. Meanwhile, once a new feasible candidate point is turned up, the path closest to completion is casted upon to generate the set Θ with optimal candidate vectors, aiming to determining the extrinsic information for a Turbo coded bit with most likelihood. Bridged by de-multiplexing and multiplexing, an SISO improved complex sphere detection is concatenated with a SISO Log maximum a posteriori Turbo decoder as if a principal Turbo detection is embedded with a subordinate Turbo decoder, exchanging each other’s detection and decoding soft-decision information iteratively. As a result, the proposed algorithm converges rapidly, which results in lower computational complexity. The transfer curves that relate the input mutual information to the output mutual information is achieved through simulations. Thus, an asymptotic interval of the input SNR threshold for the proposed scheme to converge has been observed. Finally, an upper bound of the diversity has been obtained based on the intuitional deduction and theoretical analyses. The simulation results also show that the proposed scheme has a strong ability of anti-multi-stream interference, and its performance is close to that of the iterative soft-input soft-output list complex sphere detection and decoder algorithm, but with a shorter time delay.


I. INTRODUCTION
The multiple-input multiple-output (MIMO) systems achieve high-data-rate transmission and substantial gains in channel capacity over a rich-scattering environment [1-2], which makes very high spectral efficiency available for the emerging wireless standard IEEE 802.11n/ac [3], etc. Moreover, if the architecture can combine with the traditional channel coding techniques, it would not only increase the system's reliability but also acquire the coding gaining to improve the system's performance [4].
Therefore, to obtain a channel capacity close to the Shannon limit, a new kind of MIMO system, known as the Turbo-MIMO system that is based on bit-interleaved coded modulation, was investigated by Sellathurai and Haykin [5]. Endowed with turbo learning principle [6], these iterative receivers make detection and decoding by exchanging soft bits information mutually, which lets them approach approximately optimal performance with in a computationally feasible manner [5] [7][8][9][10][11].
The efficiency of the Turbo-MIMO systems would be evaluated by three aspects [6]: (i) whether it achieves the maximum possible diversity order; (ii) whether it obtains spatial multiplexing approaching the total number of degrees of freedom provided by the channel; and (iii) whether the receiver could exploit the detector (inner decoder) with both good performance and a reduced complexity.
Although the maximum a posteriori probability (MAP) inner decoder could obtain the superior performance, it has a complexity that grows exponentially with the product of the number of transmit antennas and the modulation order, which, in reality, precludes it from implementing.
On the other hand, the potential performance degradation is essential to most of iterative receivers that choose minimum mean square error (MMSE) based algorithm [9][10][11] for its inner decoder, which makes the extrinsic information transferred to the decoder (outer decoder) less reliable. This in turn lets designing near-optimal inner decoders with more reliable extrinsic estimates [6], an open research topic to deal with this problem, be attracted the most attention.
Taking into account the characteristic of the extrinsic information estimates, almost all of the soft-input soft-output (SISO) receivers based on sphere decoder [12][13][14] can achieve full diversity [15]. However, a few of them give attention to three all. Among them is the list complex sphere decoder (LCSPD) [12] that stems from the ideas of a constrained sphere with the given radius, centered around the vector of actually received noisy signals. It is a small radius that is accompanied with a few candidate vectors subset of the entire set of possibly transmitted signal points, with ever-improving quality in terms of the extrinsic information estimates, would lead LCSPD to be a computationally efficient method. However, only a sphere of relatively large radius and list size would ensure the LCSPD to provide near-optimal detection, which results it with an unobviously reduced complexity comparable to that of the MAP detection algorithm.
With recent advances in sphere decoders, more and more novel achievements, such as approximate Schnorr-Euchner enumeration [16], retaining the best K nodes [17][18], variation of fixed-complexity [19], single tree-search [20], imposing the norm constraint on the admissible solution [21], an adaptive tree-travel control scheme combined with a reliability-dependent log-likelihood ratio correction and an iteration-based hybrid node enumeration [22], differential sphere detection accompanied by visiting the MPSK constellation points in a zigzag fashion [23], etc., in this area have been applied to modify search strategies for this kind of suboptimum detectors.
Previously, it was shown that when the improved complex sphere decoding (ICSPD) [24] found a candidate solution for the entire vectors, it resumed with the path that was closest to completion in the next search, rather than restarting the search from the root node. In this way, compared with the LCSPD [12], it has a lower complexity than that of the LCSPD [25]. Therefore, we extend the idea of list version to ICSPD and propose an iterative soft-input soft-output improved complex sphere detection and decoder (ISISOICSPDD) to form a novelty Turbo-MIMO system.
At first, based on the branch and bound tree, an acyclic graph, the SISO ICSPD progressively finerly partitions off the solution space spanned by signal points a collection of subsets with all possible transmitted signal points as its root node and each transmitted signal point as a terminal node.
Then, it calculates the accumulated cost of every nonterminal node and decides the path that connects the root and a terminal node to constitute a feasible candidate signal point according to a "smallest-first" such that it saves computation by pruning the nodes/subsets of the tree with no chance of containing any feasible candidate signal point. Thereafter, if the accumulated cost of a candidate signal point is smaller than a prescribed value, it means that it obtains a newly feasible candidate point for the transmitted signal, which will be put in the candidate points set Θ on condition that the Θ does not contain all that is possible. Under other circumstances, this new point should be drawn comparisons with the point in the Θ with the largest accumulated cost and the latter would be taken the place of if the former has a smaller accumulated cost, which guarantees the candidate points in the Θ reliable enough to make the SISO ICSPD bring about the extrinsic information estimates with high fidelity.
Finally, the proposed iterative SISO detection and decoding architecture exchanges each other's soft-decision information between detector and decoder, which makes shorten the detection time and lower the complexity.
Furthermore, based on density evolution [26], Gaussian approximation [27], geometric interpretation [28], and visualization method [29], a large amount of techniques could be discovered to develop schemes for discussing the convergence behavior of sub-optimal iterative turbo decoders. Little has been attributed to deal with analyzing the dynamical statistical characteristic evolution of the iterative detection and decoder.
In this paper, we extend the ideas in [27] [29] to our simulation in such a way that we observe that the histograms of the conditional probability density functions of the output extrinsic information of the proposed scheme could be interpreted to a approximate Gaussian distribution. Moreover, since the measured bit error rate (BER) of the proposed scheme could be associated with the output extrinsic information's measured signal-to-noise ratio (SNR) by Q function [30], the dynamical convergence characteristic, which exhibits three different phases, could be explained as follows. Firstly, in the relatively lower input SNR interval, the output extrinsic information's SNR almost does not fluctuate as the proposed scheme exchanges each other's soft-decision information between detector and decoder.
Secondly, with the increase in input SNR, the output extrinsic information's SNR becomes larger and larger, but it is bounded away from infinite as iterations of the proposed scheme are carried out.
Thirdly, at a certain input SNR, or more accuracy, an input SNR threshold, the output extrinsic information's SNR approaches infinite when the number of iterations goes to infinity.
However, since transfer characteristics based on the distributions of the input/output extrinsic information's SNR lie with the complex signal constellation, mutual information transfer characteristics are verified to be very robust, ascribed to the entropy characteristic inherent in [31] [29]. Thus, by simulation, we find that the input SNR threshold for the proposed algorithm to converge could be limited by an asymptotic interval ascertained by the transfer curves which connect the input mutual information to the output mutual information.
At the end of this paper, a more important result, i.e. an upper bound of the diversity, is extrapolated relied on simulation, which is verified in terms of theoretical analyses.
Throughout this paper we adopt the following notational conventions. Boldface capitals and lower-case letters symbolize matrices and vectors, respectively. Furthermore, ( ) T , ( ) H , ( ) * and ( ) -1 represent the transpose, Hermitian transpose, complex conjugation and inverse respectively. The rest of the paper is organized as follows. Section II describes Turbo-MIMO system model with an iterative SISO detection concatenated with SISO Log MAP Turbo decoder. In section III, the ISISOICSPDD algorithm is proposed. In section IV, the BER performance of the proposed algorithm is simulated via computer and an asymptotic interval for an input SNR threshold has been achieved through simulations, which is analyzed according to the transfer curves. This is followed by discussion of the diversity order of the proposed scheme. Finally, section V concludes the whole paper. II. SYSTEM MODEL Figure 1 presents the configuration of a Turbo-MIMO system with receiver based on an iterative SISO ICSPD detection concatenated with SISO Log MAP Turbo decoder. In this system, there are N t transmit antennas and N r receive antennas. Let M b be the number of symbols in the complex constellation C. The user's information bits u are first encoded into the Turbo code stream by the outer encoder, which is then bit-interleaved using an offline designed pseudo-random interleaver referred to as Π. Followed by de-multiplexing, the high-speed Turbo code stream transfers into low-speed independently mapped into the transmitted vector constellation symbol x whose entries are chosen from C, where m x is a symbol transmitted out of the m th transmit antenna at the i th interval. The received N r ×1 vector can be represented by n Hx Y   (1) where H denotes N r ×N t channel matrix, the elements of which are independent and identically distributed (i.i.d.) with the zero-mean unit-variance complex Gaussian distribution, n is a N r ×1 complex white Gaussian noise with zero mean and covariance I 2 2 , where I is an N r ×N r identity matrix.
In order to obtain the most likely information vector, we apply the statistic features among all components of received signals. By doing that, the multi-stream interference from other transmit antennas is mitigated, and different kinds of detectors are also acquired.
The MAP algorithm selects the most likely encoded bit with respect to all combination of transmitted signals by maximizing the log-likelihood ratio (LLR) [32] as given by is the set of all possible Turbo coded bit vectors, the l th bit position of the constellation symbol for the m th transmit anten na of which is fixed The Log MAP algorithm must calculate where  is the code constraint length sequence corresponding to a sequence of modulated symbols to transmitted during the i th interval, metrics expressed in Eq. (2) and select the vector that yields the largest metric. It is a considerably complicated NP-complete problem when N t  is large.
Although the BER of it is considerably low, its computational complexity is unacceptable in reality. To reduce the computational complexity, an alternative approach is adopted. By employing the complex sphere decoder technique, the LCSPD searches vectors that are composed of the constellation points limited in a sphere of radius C centered at the received signal. This approach aims to implement the suboptimum detection, with its complexity depending on the size of candidate vector set to be searched for detecting per information bit. However, it is still a rather complicated NP-complete problem when the size of candidate vector set is large since it should restart the search from the root node repeatedly. It is to further devise the suboptimum SISO sphere decoders, which based on the LLR function, have both lower computational complexities and fast convergence rates that serves as a motivation to write this paper.

III. NOVEL ITERATIVE SISO DETECTION AND DECODER ALGORITHM
The proposed scheme, named as iterative SISO improved complex sphere detection and decoder algorithm, for Turbo-MIMO system is depicted at the right hand of the Figure1. It is composed of an iterative SISO ICSPD detector (inner decoder) and SISO log MAP Turbo-decoder (outer decoder), which mitigates the multi-stream interference and obtains the desired information bit by exchanging information between inner and outer decoder.

A. Iterative SISO ICSPD detector (inner decoder)
The iterative SISO ICSPD, based on a progressively f i n e r  Fig. 2. The outline of SISO ICSPD. partition of a candidate vectors/points set, gives rise to an acyclic graph known as the branch-and-bound tree, the nodes of which constitute a collection of subsets of the candidate points set. Its crucial idea is to save computation by discarding the nodes/subsets of the tree that have no chance of containing an optimal candidate point. The outline of SISO ICSPD is represented in Figure 2. When it is applied to detect multi-transmit antennas signals, its novelties can be stated as follows.

1) Forming candidate points set:
Firstly, the sufficient statistics is obtained to select the most likely information bit born in transmitted signals. This can be expressed by [24] v Fb is a complex white Gaussian noise with zero mean and covariance.
Secondly, all candidates (or the finite feasible set with all feasible solutions) b k, cand for b k , which were prescribed a collect of constellation points on a same concentric ring in a sphere of the initial radius C 0 centered at the received signal, are determined by

2) Branch-and-bound exploiting an optimal feasible candidate set:
The SISO ICSPD initializes a node list called  k of candidates for b k transmitted from the k th antenna, and a scalar C 0 , which is equal to the minimal accumulated cost over feasible vector found so far. With the set b k, cand as its root node, the algorithm calculates where e is a vector, each item of which is 1, with the same length of of candidate vector bˆfor b is smaller than C 0 , it means that the algorithm explores a newly feasible candidate vector for b, which will be placed in the set Θ as long as the Θ is not filled with N cand feasible candidate vectors. Otherwise, this new vector is compared with the vector in the Θ with the largest accumulated cost and the latter will be replaced if the former has the smaller accumulated cost. Moreover, the best candidate vectors in the Θ found so far are optimal.
However, it should note that if the number of elements in the Θ, or equivalently expressed by |Θ|, is smaller than N cand corresponding to the current radius, the algorithm will extend the current radius 1.2 times and restart the search.
3) Educing the global optimal extrinsic information for outer decoder: Hereafter, based on the prior information, the global optimal extrinsic information for a Turbo coded bit will be maximized by

B SISO Log MAP Turbo decoder (outer decoder)
The extrinsic information got from inner decoder after de-interleaving is inputted into the outer decoder that is An Iterative SISO Improved Complex Sphere Detection and Decoder for Turbo-MIMO Systems http://www.ijSciences.com Volume 8 -January 2019 (01) made up of a pair of component decoders, which are separated by a interleaver and a deinterleaver referred to as α and α -1 respectively. The configuration of the outer decoder, characterized by feedback structure and the APP LLR of the extrinsic information exchanging among different components in an iterative way, is depicted in Figure 3. In Figure 3

1) Generating extrinsic information for Decoder 2:
Once the soft bit stream coming from inner decoder is de-multiplexed to get the soft systematic bits stream and soft parity bits streams, the APP LLR of an extrinsic information for systematic bit s     (13) where [m, l s ]=L-1, L-2, , 0, 1.   is the APP LLR associated with the [m, l s ] th information bit in the q th iteration, which can be defined in the manner similar to Eq. (10)-Eq. (14). The iterative process is usually terminated after a predetermined number of iterations, when the soft-output value stabilizes and changes little between successive iterations. Furthermore, the Eq.(9)-Eq.(15) can be relatively straightforwardly generalized to analyze and discuss the soft parity bits streams. Then, we multiplex soft systematic bits stream and soft parity bits streams in the same way as that in Turbo encoding to get the encoded bits' extrinsic information which is sent to the next iteration as a priori LLR of the iterative SISO ICSPD detector.
With the increasing number of iteration, the contribution of extrinsic information from the inner and outer decoder to the improvement of performance of the receiver diminishes. Finally, the avail vanishes, that is to say the receiver converges.

IV. SIMULATION RESULTS AND DISCUSSION
In this section, a simulation investigation is carried out to show the performance of the proposed scheme in this context. We use two receive schemes, i.e. ISISOICSPDD and iterative SISO list complex sphere detection and decoder (ISISOLCSPDD). The simulation results are described in three aspects, i.e., the performance of ISISOICSPDD over a number of iterations in the detector/decoder loop (outer iteration), the transfer curve of the proposed scheme and the bit error rate (BER), which is followed by discussion of the diversity order of the proposed scheme.
Let us consider that, in a Rayleigh fading channel, there is a Turbo MIMO system, shared by N t transmitters and N r receivers. The transmitters employ the quadrature-shift keying modulation (QPSK) and 16-ary quadrature amplitude modulation (16-QAM) and 64-QAM with Gray mapping. A rate R=1/2 parallel concatenated outer channel Turbo code [35] sequence of memory 2 with (recursive) feedback polynomial G r =1+D+D 2 and feedforward polynomial G=1+D 2 is used.
In Figure 3, the pseudo-random interleaver (α) is employed for the process of rearranging the ordering of an information sequence in a one-to-one deterministic way before the application of the second component code in a turbo coding scheme. In Figure 1 and Figure 3, the pseudo-random interleaver (∏) is used not only to de-correlate the fading channel and maximize the diversity order of the system but also to eliminate the correlation in the sequence of Turbo coded bits, which is crucial for the proposed algorithm.
The packet length of the information bits to be processed is 9216, which is also the interleaver size of the turbo code. The number of iterations of the Turbo decoder (inner iteration) is limited to 8 Figure 4 and Figure 5 depict the BERs of the ISISOICSPDD versus SNR values over a range of iterations, with the transmitters using the 16-QAM for N t =N r =4, and the QPSK for N t =N r =8, respectively.

A. The performance of ISISOICSPDD over a number of iterations in the detector/decoder loop
For the graphic clarification, not all of simulation results are shown and the number of the iteration being one means that there is no soft information exchanging between iterative SISO ICSPD detector and SISO Log MAP Turbo decoder. From Figure 4 and Figure 5, we can obtain the following observation.
Originally, in the relatively lower input SNR interval, the BER of the proposed algorithm is almost not favorable to the numbers of detection and decoder iterations being executed.
Gradually, with the increase in input SNR, the BERs between successive iterations could be distinguished. Then, at a certain input SNR, or more accuracy, an input SNR threshold, the BER curves go into waterfall region with their characteristic sharp drop, where the proposed algorithm would converge to zero BER within a finite number of iterations. Figure 5 have also shown that though the more number of iteration would lead to the lower bit error rate, the BER of the proposed algorithm does not obviously decrease when the iteration size is increased from 1 to 4, while the gap between 4 and 5 is widening. Henceforth, we fix iterations to 5 as a fundamental tradeoff between BER and computation complexity of the proposed algorithm in all our following simulations. , detail analyses of which have been beyond the scope of this paper, could be approximated by a Gaussian density function, which are consistent with the results in [27]. Therefore, the input SNR threshold for the proposed algorithm to converge could be bounded by an asymptotic interval determined by the transfer curves in Figure 6 and  On the other hand, when input SNR values being 7.6, 7.0 dB, the transfer curves lie above the straight line, which makes the ISISOICSPDD, with ever more reliable extrinsic estimates, converge to a reliable solution. Thus, the asymptotic interval for input SNR threshold of the ISISOICSPDD with the 16-QAM and N t =N r =4 is bounded within the interval of [6.0, 7.0] dB. In the same way, we can determine that the asymptotic interval for input SNR threshold of the ISISOICSPDD with the16-QAM and N t =N r =8 is within the confines of interval [2.6, 2.7] dB.

B. The transfer curve of ISISOICSPDD
C. The bit error rate Figure 8 and Figure 9 represent the BERs of ISISOICSPDD (solid line) and ISISOLCSPDD (dashed line) versus SNR, with N t =N r =4 and N t =N r =8, respectively. Figure 8 and Figure 9 reveal that the proposed algorithm will converge to the approximately global optimum after running 5 iterations and be close to ISISOLCSPDD. At this time, its computational complexity is much lower than that of the ISISOLCSPDD [24][25]. This makes its detection delay much shorter than that of the latter, the superiority of which is especially significant when the number of transmit antennas is larger. This is attributed to the depth-first branch and bound search strategy merged in the proposed algorithm, which leads to its resumes with the path that is closest to completion in the next search, rather than restarting the search from the root node. where P e is the bit error rate [15], we can obtain the diversity gain in table 1 of the proposed scheme when it terminated at the 5th iteration. when the signal difference matrix, i.e. the square root of  , is full rank over all pairs of distinct signal matrix X and X . The Eq. (17) demonstrate that the diversity order of the proposed algorithm is N t  N r . These conclusions also coincide with the results provided by [34] [35] for the Turbo MIMO systems.

V. CONCLUSION
ISISOICSPDD is proposed for Turbo-BLAST systems. The original candidates set, treated as the root of a tree, is divided repeatedly into subsets/nodes until no more division is possible. For each subset of the tree, the proposed algorithm computes a lower bound to the optimal cost with Eq.(6) and a feasible candidate vector being restricted under Eq.(7), which economizes computation by truncating nodes of the tree that cannot contain an optimal solution. In this way, the proposed scheme shortens the delay of iterations between the inner decoder and outer decoder to a certain degree, which in turn makes the complexity of the proposed algorithm lower. Simulation results show that the performance of the proposed algorithm is close to ISISOLCSPDD. In addition, Additional performance gain is acquired over the traditional method without iterative detection. Furthermore, based on the input-output transfer curve, an asymptotic interval of the input SNR threshold for the ISISOICSPDD to converge is induced by running a series of simulations of the proposed algorithm for a number of fixed input SNR values, which is followed by an upper bound of the diversity acquired according to the intuitional deduction and theoretical analyses. Meanwhile, the technique to attain asymptotic interval of the input SNR threshold and the diversity order can be relatively straightforwardly applied to analyze and discuss all kinds of iterative SISO detection and decoder for Turbo-MIMO systems.