Using Clustering and Controlled Regression to Analyze the Orientation of Rod-Like Objects

This work proposes the use of the R language for the analysis of the orientation of rod-like objects. First, the objects are separated from the background by means of a segmentation based on a thresholding. DBSCAN, a density based clustering algorithm, is used on the thresholded image. After the clustering, a linear regression is applied to determine the orientation of rod-like objects. The orientation is displayed by means of a radar diagram.


Introduction
A part of the image processing is devoted to the analysis of oriented textures. The analysis consists, as told in [1], of the extraction of an orientation field. In [1], for instance, an algorithm was given to represent a flow-like texture, using an oriented filterthe gradient of Gaussianand performing manipulations on the resulting gradient vector field. A sparse visualization, as proposed in [1], can be replaced by means of a visualization of dense vector fields, using algorithms such as Thick Oriented Stream-Line (TOSL) algorithm [2][3][4], or the Line Integral Convolution (LIC) and fastLIC algorithms [5][6][7].
Besides the use of vector fields, the orientation of objects in digital images can be determined using the tensors of inertia that reflect the mean shape of the grains [8][9][10][11]. An example of the use of the tensor of inertia to determine the orientation of rod-like objects is given in [12]. In the given reference, the orientation is determined for each object detected after a segmentation of the image [13], based on the thresholding of the grey tones of the pixels. After measuring the directions of the axes of the objects in the image frame, the distribution of the orientation can be given.
Here, our aim is that of proposing a method to determine the distribution of the orientation, which is based on the use of a linear regression implemented by R language. R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. It is widely used among statisticians. In this framework, we will consider sets of rod-like objects displayed in an image frame. It can be applied to images like the one shown in Fig. 1, where a certain number of linear or quasi-linear shapes represent the content to be extracted and analyzed. The input is the grey-scale version of the image itself, while the output of the process is a radar chart showing the counts of objects for each orientation ∈ (− 2 ⁄ , 2 ⁄ )taken by at least one of them.

Method
A greyscale image of width and height ℎcan be represented by a set of triples ̂= *( , , ): ∈ ∧ ∈ ℎ ∧ ∈ ,0,255-+. As a first step to detect objects, under the hypothesis that a set of rod-like ones exists, we extract the set of points of the image satisfying a threshold constraint : = < or : = > for a given threshold : ̂= *( , ): ( , , ) ∈̂∧ + ⊆ × ℎ . This set is suitable for a clustering partition; we use a DBSCAN algorithm [14], which involves the use of the two parameters and ,so to generate a set of subsets of ̂ connected by point density: Each ̂i s a rod-like graphical object. For each of them, we can use a linear model, choosing between the two options to compute the fitted values: ̂= 1 + 0 and ̂= ̅ 1 + ̅ 0 .
We will use the first one when the rod-like object is roughly "more horizontal than vertical", the second one in the other cases, implementing a controlled version of linear regression. This allows us to compute the orientation of the k-th rod: A basic subdivision / grouping of the ̂v alues into the classes Θ yields the couples ( , ),where = #*̂:̂∈ Θ + and is the representative (mean) value of the class. Such couples can be graphically depicted by means of a radar chart, in which only the right side is used. We implemented the whole algorithm in R, using the packages jpeg, reshape2, fpc, fmsb. Here in the following an example of processing, with the outline of each phase.

Reading the image.
First, we need to read the image into the appropriate R data structure. All plots which are here shown are given just to visually describe the steps of the procedure. Let us consider, for instance, the image given in the Figure 1. Objects in the image frame are rod-like.

Getting ready for segmentation
The image is converted (molten) to a dataframe. A threshold is used to select the points to analyze. Rows are assigned to x, while columns to y, which needs to be reversed so that the plot looks exactly like the original image. A boolean parameter is used in order to consider black or white points for processing. The result is shown in the Figure 2. Using DBSCAN to get the superpixels DBSCAN is a density based clustering algorithm. It can be used to segment the given image. The maximum reachability distance and the minimum number of surrounding points for reachability are two parameters which must be set. No optimization of such parameters has been implemented up to now. The result of the segmentation of the Fig.2 is given in the Fig.3, where we can see the superpixels represented with different colors. Of course, it is possible to choose the colors according to some specific features of the superpixels, for instance, according to their length. Linearizing the superpixels A regular linear model algorithm, with the optimization of the relationship (y ~ x or x ~ y) is used to perform the linearization. The graphical part of this step is not strictly necessary, but it helps to visualize the slopes. Their values and counts are grouped into a histogram, which is not displayed.
The linearized superpixels are shown in the Figure 4.

Plotting the results
The counts from the histogram are converted to a radar (spider) chart, in order to associate them to the corresponding orientation (− 2 ⁄ to 2 ⁄ ). The chart is given in the Figure 5.

Discussion
The method proposed above, based on the use of Rlanguage and related libraries can have several applications. For instance, as in the case of the image shown in the Figure 1, we could analyze the distribution of rod-like bacteria or cells. Like any automated image analysis software, such as that discussed in [15], we can perform an automated enumeration of bacterial cells and provide quantitative estimates of bacterial length. However, the use of the method is not limited to field of biological imaging. We can apply it in all the case where a rose of directions can be used to give information on certain features in the image. For instance we can determine the distribution of fibres, from the microscopic to the macroscale, without using Fourier analysis [16,17], or we can study the orientation of cracks, from the microcracks of the craquelure to the huge crevasses of glaciers [18], and so on. A future work will be devoted to illustrate the use of the method in the case of satellite imagery.