Mutual Information and Nonadditive Entropies: The Case of Tsallis Entropy

Mutual information of two random variables can be easily obtained from their Shannon entropies. However, when nonadditive entropies are involved, the calculus of the mutual information is more complex. Here we discuss the basic matter about information from Shannon entropy. Then we analyse the case of the generalized nonadditive Tsallis entropy.


Introduction
In many applications of engineering and telecommunication, it is often desired to increase or decrease the dependency of two random variables. This dependency is linked to the mutual information, which is its measure. The mutual information can be easily decomposed into a sum (with signs) of entropies [1], when the Shannon entropy is used. This entropy is additive, that is, when we have independent subsystems, X and Y, the entropy of their union is S(X,Y)=S(X)+S(Y). When nonadditive entropies are involved, the approach to find the mutual information is not so simple. Moreover, it is often preferred to use the term "mutual entropy", instead of "mutual information" [2,3]. The mutual entropy contains conditional entropies, which must be carefully defined when entropies are nonadditive [3]. Among the nonadditive entropies, we have the Tsallis entropy. It is a generalization of the standard Boltzmann-Gibbs entropy, introduced in 1988 as a basis for generalizing the standard statistical mechanics [4,5]. Due to its entropic index, which can be used as tuning parameter, this entropy is involved in several applications, in particular for image processing and image registration [6]. Here we discuss the basic matter concerning the mutual information when Tsallis entropy is involved.

Mutual information
The mutual information of two random variables in subsystems X,Y is providing a measure of the mutual dependence of the variables. This can be viewed intuitively as a measure of information that X and Y are sharing. In particular, it measures how much knowing one of these variables reduces uncertainty about the other [7][8][9]. In [9], examples are explaining this quantity. If X and Y are independent, knowing X does not give any information about Y and vice versa: the mutual information is zero. If Y=f(X) or X=f(Y), where f indicates a deterministic function, all information conveyed by X (or Y) is shared with Y (or X): the mutual information is the same as the uncertainty contained in Y (or X) alone, which is measured by the entropy of Y (or X). The mutual information is then the entropy of Y (or X) [9]. The physical meaning of the mutual information I(X;Y) as "the reduction of the uncertainty of X due to knowledge of Y" (or vice versa) [9], can be depicted in a Venn diagram ( Figure 1). In this diagram, the single-variable entropies H(X), H(Y) are represented by two overlapping sets, whereas the two-variable entropy is represented by the union of these sets and the mutual information common to X and Y is represented by their intersection. Note that H(X)=I(X;X), so entropy is the "selfinformation". Also note that conditioning of entropies in Venn diagrams is indicated by set subtraction, so that, for example, the set representing H(X|Y) results from subtracting the set representing H(Y) from the set representing H(X). The mutual information is given by [10]: , with the following properties, I(X;Y)=I(Y;X) and I(X;X)=H(X). Note that H(Y|H) is the conditional entropy [11]. Let us assume the joint entropy H(X,Y) for the combined system determined by two random variables X and Y. We need H(X,Y) "bits of information" to describe its exact state [12]. If we first learn the value of X, we have gained H(X) bits of information. "Once X is known, we only need H(X,Y)−H(X) bits to describe the state of the whole system" [12]. This quantity is exactly H(Y|X), which gives the chain rule of conditional entropy: If we are using the Shannon entropy S: If X,Y are independent, we have that S(X,Y)=S(X)+S(Y), and therefore I(X;Y)=0.

Using Tsallis entropy
In fact, besides the Shannon entropy we have generalized entropies too, and, among them, the nonadditive entropies. How can we generalize the mutual information in this case? In [3], it is preferred the use of the so-called Tsallis mutual entropy, defined as: In (2), T is referring to the Tsallis entropy. According to [4], T(X,Y)=T(X)+T(Y|X) and T(Y,X)=T(Y)+T(X|Y). Let us remember that Tsallis entropy T and Rényi entropy R, [13], are linked by the following equation: Here q is the entropic index. As q approaches 1, the Tsallis entropy becomes the Shannon entropy. Let us try defining the Tsallis mutual entropy as: If X,Y are independent, we must have a mutual information equal to zero. However, from (4), we find: This happens because the generalized additivity for independent subsystems is: (6) .
As a result, for an entropic index different from 1, (4) should give a result different from zero. Therefore, (4) is not good for representing the mutual information I(X;Y). In his paper, Tsallis is discussing the problem of correlated systems too [4]. He used the Rényi entropy for correlated systems: Since the Rényi entropy is additive, if X,Y are independent, Γ is equal to zero. Let us note that it is function Γ which seems working as the mutual information. However, for the non-additive Tsallis conditional entropy, a quite simple formula was given in [14]: We could define the mutual entropy, as for the Shannon entropy, in the following manner: For X,Y independent variables, using (9a) for instance: The same for (9b).

Figure 2:
Venn diagram when Y is completely dependent on X.

The problem of symmetry
Let us note that I(X;Y) must be symmetric, that is I(X;Y)=I(Y;X), "otherwise it would not be mutual information" [15]. Are (9a) and (9b) giving us the same mutual information? Let us consider the case when Y is completely dependent on X, as depicted in the Figure 2. In this case H(X,Y)=H(X), so we have T(X,Y)=T(X), and the mutual information is T(Y). In the following discussion, we have X larger that Y. Then T(X) lager than T(Y). Let us consider (9a) and (9b): Therefore, (9a)-(9b) do not fulfil the required symmetry. In fact, if we consider the Figure 2, it is clear that a non-symmetric situation exists, Aiming to solve the case of Figure 2, we could modify these mutual Tsallis entropies so that: We can see that: So we have a mutual information, which is symmetric. If X and Y are independent: MT(X;Y)=0. For completely dependent variable and T(X,Y)=T(X): In the case X is coincident with Y, we have T(X)=T(Y): However, we can have also the case when X is totally inside Y (Figure 3). Then, let us write the join entropy as T(Y,X), to remark this situation. We have that the mutual information must be equal to T(X). We obtain this in a symmetric entropy, when we define the mutual entropy as: Then, when T(Y,X)=T(Y), from (15) we obtain MT(X;Y)=T(X). We can define, in the same manner as in Ref. 4 is proposed the correlation term in the mutual Tsallis entropy: This can be an expression of the mutual information for Tsallis entropy, which is properly answering to the fundamental requirements of computation. We have discussed such an approach, because it can be easily applied to another nonadditive entropy, the Kaniadakis entropy, to determine its mutual information [16]. This problem will be addressed in a following paper.