Comparative study of linear mixed-effects and artificial neural network models for longitudinal unbalanced growth data of Madras Red sheep

Sheep are efficient converters of unutilized poor quality grass and crop residues into meat and skin. Growth is a trait of economic importance in sheep as sheep rearing is an important livelihood for a large number of small and marginal farmers in India. Information about growth model parameters is very useful for selection studies. Growth in farm animals has been investigated for many years [1, 2]. Madras Red sheep, a native breed of Tamilnadu state of India and distributed in the northern parts of the state is known for its valuable meat and skin quality [3]. The growth performance of Madras Red sheep under farmer's flock was studied previously [4, 5]. In animal growth studies, body weight measurements, which are good indicators of growth rate, is often measured on the same animal at various ages (time points) such as weekly/monthly resulting in longitudinal growth data. Such data, collected on a group of animals, has many advantages as it can provide vital information about individual changes. That is, by collecting data over many time points, it can separate changes over time within individual animals from differences between animals yielding valuable information on the animals. The main goal here is to characterize the way the outcome changes over time, and to identify the predictors of that change. Recent developments in longitudinal data analysis have been discussed in detail in previous studies [6, 7]. Longitudinal growth data pose challenges for statistical analysis as the responses are correlated (not independent) and also the responses that are closer in time are more correlated than responses that are farther apart. In addition, variance of repeated measures often change and increase steadily with time (heteroscedasticity). So, these correlation and variation patterns combine to produce complicated covariance structure of repeated measurements which needs to be modeled suitably for drawing correct inferences. Under these circumstances, the standard ANOVA, MANOVA and Veterinary World, EISSN: 2231-0916 Available at www.veterinaryworld.org/Vol.7/Feb-2014/2.pdf


Introduction
Sheep are efficient converters of unutilized poor quality grass and crop residues into meat and skin.Growth is a trait of economic importance in sheep as sheep rearing is an important livelihood for a large number of small and marginal farmers in India.Information about growth model parameters is very useful for selection studies.Growth in farm animals has been investigated for many years [1,2].Madras Red sheep, a native breed of Tamilnadu state of India and distributed in the northern parts of the state is known for its valuable meat and skin quality [3].The growth performance of Madras Red sheep under farmer's flock was studied previously [4,5].
In animal growth studies, body weight measurements, which are good indicators of growth rate, is often measured on the same animal at various ages (time points) such as weekly/monthly resulting in longitudinal growth data.Such data, collected on a group of animals, has many advantages as it can provide vital information about individual changes.That is, by collecting data over many time points, it can separate changes over time within individual animals from differences between animals yielding valuable information on the animals.The main goal here is to characterize the way the outcome changes over time, and to identify the predictors of that change.Recent developments in longitudinal data analysis have been discussed in detail in previous studies [6,7].
Longitudinal growth data pose challenges for statistical analysis as the responses are correlated (not independent) and also the responses that are closer in time are more correlated than responses that are farther apart.In addition, variance of repeated measures often change and increase steadily with time (heteroscedasticity). So, these correlation and variation patterns combine to produce complicated covariance structure of repeated measurements which needs to be modeled suitably for drawing correct inferences.Under these circumstances, the standard ANOVA, MANOVA and The present study was conducted to compare the predictive ability of artificial neural network (ANN) models developed using multilayer perceptron (MLP) and radial basis function (RBF) architectures with linear mixed-effects model for the longitudinal growth data of Madras red sheep.
Repeated monthly body weight measurements from birth to 24 months of age of 1424 sheep were used for the analysis.Linear mixed-effects model was developed by progressively fitting unconditional linear growth, unconditional quadratic growth, conditional quadratic growth model and conditional quadratic growth models accommodating different error variance-covariance structures.The time invariant covariates such as gender of lamb, season of birth and dam's weight at lambing were also used for the analysis.The best model was identified using Akaike Information Criterion.Subsequently, ANN models using MLP and RBF architectures were developed for the same data and the predictive ability of the two modeling procedures were compared using different evaluation criteria.
Conditional quadratic model with heterogeneous Autoregressive of order 1 (AR(1)) covariance structure fitted using mixed model approach was found to be good with covariates, gender of lamb and dam's weight at lambing showing marked influence on all the growth parameters.Season of birth was found to be significant only for growth rate and not for the average birth weight.Between the two ANN architectures, MLP performed better than RBF and also ANN model based on MLP architecture was better than the best linear mixed model identified in this study.
In this study, the potential of ANN as an alternative modeling technique was evaluated for the purpose of predicting unbalanced longitudinal growth data of Madras Red sheep.As the predictive ability of the ANN model with MLP architecture yielded better results, ANN models can be considered as an alternative tool by animal breeders to model longitudinal animal growth data.artificial neural network, linear mixed-effects models, longitudinal data, sheep growth curves, .
regression models are unsuitable for the analysis of longitudinal growth data as the usual statistical assumptions required are not met.Random coefficient model [8] using linear mixed model approach, which allows the growth parameters of each animal to be treated as random effects in the model, is an attractive way to describe longitudinal data.Individual animal level time-invariant covariates can be added to the model and the error covariance structure of the repeated measurements can also be specified in the model to determine their effects on the response.Artificial neural network (ANN), which has been shown to be a powerful modeling tool in a wide range of applications, is a type of computational model that uses a set of interconnected processing elements, known as neurons or nodes and behaves like a biological brain with millions of neurons working in parallel, each trying to solve a tiny bit of a complex problem.ANN can easily recognize patterns in data.The major advantages of using ANN are that they can be fitted to any kind of data set, does not require any model assumption and also there is no need to specify a particular model form and the model is adaptively formed based on the features presented from the data [9] Detailed information on ANN can be found in earlier publications [10,11].
There is currently a huge literature available where ANN technique has been successfully used and proved in many cases to be superior compared to various statistical methods in predicting cross-sectional or time series data.However, there is shortage of information regarding the use of ANN in analyzing longitudinal or repeated measures data.A neural network designed for longitudinal data on progression of Alzheimer's disease called mixed effects neural network (MENN) was proposed by Tandon et al [12].The capability of ANN to model simulated repeated measures data was studied recently [13].
Because there is hardly any work in evaluating ANN models for analyzing longitudinal animal growth data, this work was carried out to compare the performance of LMM and ANN models for predicting longitudinal growth data of Madras Red sheep.
Data pertaining to monthly body weight measurements from birth to 24 months of age of 1424 sheep were used for the analysis.Number of measurements for each animal was different (unbalanced).Animals with at least 3 body weight measurements were considered for analysis.The time invariant covariates that do not change across time, such as gender of lamb (coded 0 for female and 1 for male), season of birth (coded 0 for main season (Oct.-Mar.) and 1 for off season (Apr.-Sep.),dam's weight at lambing (coded 0 for <30 kg and 1 for 30 kg) were also used for the analysis.The time variable (t), which indicates the month of measurement of body weight, took on values from 0 to 24.All births in this study were single and no twinning was recorded.While setting up the data for analysis, there are usually two formats used namely wide and long.For the mixed model analysis, data has to be in long format (person-period format) wherein each animal will have one record for every time point at which measurement was made with time of measurement in one column and the body weight measurement in another column.Covariates that don't change would have repeated values across the rows in additional columns.
Since each animal grows at a different trajectory, to account for possible variation between animals, we can treat the regression coefficients as random variables.This type of model is called Random coefficient model.Here time variable ( ) is treated as a continuous predictor and the means across time follow a particular shape [14].Individual animals are assumed to follow the same curve shape but are allowed to vary in the parameters that describe this curve (random effects).These models also indicate the degree of subject variation that exists in the population of subjects.The general form of mixedeffects model, proposed by [8] is y = X Zu + , where y is the response vector, is the fixed effects vector, u is the random effects vector with mean 0 and variancecovariance matrix G, is the random error vector with mean 0 and variance-covariance matrix R, X is the design matrix corresponding to the fixed effects relating y to .Z is the design matrix corresponding to the random effects relating y to u.Thus, y ind.N(X ,V); u MVN(0,G) and MVN(0,R), with u independent of .E The usual structure of G and R matrices is diagonal.and N, total number of observations, a, number of levels of u.The solutions derived from mixed model equations are and The estimators are known as Best Linear Unbiased Estimator (BLUE), and the predictors u are known as Best Linear Unbiased Predictors (BLUP).
Mixed-effects model controls the covariance structure directly and provides valid standard errors.The covariance specification is important because the test statistic for the fixed effects is a function of covariance structure of the residuals.Different residual structures can result in different regression estimates along with different standard errors.Thus it is important to identify the best error structure.In this work, we approached growth modeling through random coefficient modeling framework by building models progressively starting from unconditional linear, unconditional quadratic, conditional growth model and also models accommodating different variance covariance structures.
Unconditional linear growth model is the baseline model that examines individual variation in intercept .

Data Description:
Linear mixed model: and growth rates and is given by where denotes the i animal body weight at month is the mean birth weight and is the individual animal deviation from this mean.Similarly, is the mean growth rate and is the deviation from this mean.
In the next stage, as the individual growth trajectories are usually non-linear over time, a quadratic model was considered which is given by Inclusion of predictors in the model results in conditional growth models.The model below includes gender as one time invariant covariate.Subsequently, this model was expanded to add other covariates like season of birth and dam's weight at lambing to evaluate their effect on the response.For identifying the most appropriate within animal error covariance structure, Autoregressive of order 1 [AR (1)], TOEPLITZ, Heterogeneous AR(1), Heterogeneous TOEPLITZ and Ante Dependence ANTE(1) covariance structures were specified in the model one by one to evaluate its performance.
Because the true error structure is usually unknown, some goodness-of-fit criterion is necessary to select the best error structure [15].Graphical diagnostics and subject matter knowledge can be of some help in the selection.However, the best model can be identified based on the Akaike information criterion [16] given by (-2 log likelihood + 2q ), measures the relative fit of competing models with different covariance patterns, where, q is the number of covariance parameters.Model with the lowest score onAIC is selected as best.
ANN is a computer algorithm developed based on the way the information is processed in the nervous system.A typical neural network is composed of 3 layers of neurons or nodes, each of which is connected to the neuron in the next layer (Fig. 1).The layers are described as input, hidden and output layers, consisting of , and number of processing nodes, respectively where corresponds to the number of independent (input) and dependent (output) variables respectively.Hidden layers in a neural network are known as feature detectors.The number of hidden nodes ( ) in a hidden layer is an adjustable parameter which is usually determined based on the performance of the network.
Each node in the input (hidden) layer is linked to all the nodes in the hidden (output) layer using weighted connections.In addition to the number of input and hidden nodes, the network architecture also possesses a bias node (with fixed output of +1) in its input and hidden layers which are also connected to all the nodes in the subsequent layer and they provide additional adjustable parameters (weights) for model fitting.Depending on the structure of the network, connecting neuron weights are adjusted in order to fit a series of inputs to another series of known outputs.When the weight of a particular neuron is updated, it means that the neuron is learning.The process by which neurons are made to learn using a learning algorithm is called training.The goal of the learning or training process is to find the set of optimum weight values that will cause the output from the neural network to match the actual target values as closely as possible.The two most widely used ANN architectures which are chosen for evaluation here is the multi- layered perceptron (MLP) using back propagation algorithm and Radial Basis Function (RBF).
The MLP is a feed-forward ANN based on backpropagation algorithm [17].Here, input values in the first layer are weighted and passed on to the hidden layer.Neurons in the hidden layer produce outputs by applying an activation function to the sum of the weighted input values plus a bias value.The output layer produces the desired results by applying the output layer activation function to the weighted sum of outputs from the hidden layer plus the bias term.The output of the j neuron in the hidden layer is given by are the synaptic weights between input and the hidden layer, is the input value from the input node and is the bias value (=1).Similarly, the output of the neuron in the output layer is given by where, is the activation function used in the output layer, is the synaptic weights between hidden layer and the output layer, is the output value from the hidden node and is the bias value.Activation functions are mathematical formulae that introduce a degree of nonlinearity that is important for most ANN applications.Logistic and hyperbolic activation functions are usually used as they are continuously differentiable, an important feature of neural network theory.The purpose of this function is to prevent output from reaching very large value which can 'paralyze' neural networks and thereby inhibit training.
RBF neural networks can perform nonlinear tasks the same way as MLP networks can.Using RBF, any nonlinear function can be modeled with the help of a single hidden layer, which removes some design decisions regarding numbers of hidden layers to be used.Thus, RBF is a three-layer feed-forward network trained using k-means algorithm.The input layer is simply a fan-out layer and does no processing.The hidden layer performs nonlinear mapping using a Gaussian transfer function.The argument of the transfer function is the Euclidian distance between the input vector and the center of the radial function.The final layer performs a simple weighted sum with a linear output.
Training a network is a recursive process using an algorithm in which the network is given the inputs along with the desired outputs.Back propagation algorithm, which is basically a gradient descent method and the network parameters are determined by minimizing the error function where, where is the actual output (desired) and is the output from the network.Back propagation refers to the process by which derivatives of network error with respect to the networks, are fed back to the network and used to adjust the weights so that error decreases with each iteration and the model improves to produce the desired outputs (within a reasonable margin of error).All the connection weights are readjusted based on the chosen learning rate parameter value to produce a smaller error.
The building of a back-propagation network involves the specification of several parameters namely the number of hidden layers, number of neurons in each hidden layer, learning rule, activation function, the learning rate, momentum parameter, random number seed, error minimization algorithm, and the number of learning cycles.For the purpose of building an ANN model with validation capabilities, the sample data set was divided into 2 subsets: a training set (70%), used to develop the neural model, and a testing set (30%) used to test the behavior of the model with unseen data.The accuracy of the model was measured at each phase.The evaluation criteria used to assess ANN models were R , Mean Absolute Error (MAE), Mean Square Error (MSE), Mean Absolute Percentage Error (MAPE) as given by, where, and are the observed and estimated values; and are the number of time points and number of parameters in a model, respectively.The model that provides higher R and smaller MAE, MSE and MAPE was considered better for prediction.
For comparing ANN with linear mixed model (LMM), a ratio computed using the residual sum of squares of the two competing models proposed by H. Theil [18] known as as given below.
It can be used to measure the efficiency of a prediction model.U value less than 1 indicates that the error obtained in ANN is lower than that obtained in LMM.The statistical analysis was carried out using the statistical software PASW Statistics for Windows, Version 18.0 [19].The parameters in linear mixed models were estimated using restricted maximum likelihood method (REML).The details are given in [20].The ANN models were fitted using MLP and RBF architectures.
Graphical methods can be used to obtain some insights into the structure of the data.The individual animal growth pattern depicted (up to 12 months of ) age) using profile plot or spaghetti plot is shown in Fig. 2 which clearly indicates that the growth pattern of different animals exhibits different trajectories.The animals vary not only in their initial status (birth weight) but also in their growth rate.Variance across time is also observed to be non-constant and was found to increase as the time increases.The various models specified before namely Unconditional linear growth model (Model-a), Unconditional Quadratic growth model (Model-b), Conditional Quadratic growth model (Model-c) and Conditional Quadratic growth model with within animal covariance structure (AR1, TOEP, ARH1, TOEPH and ANTE1) -(Model-d (i)-(v)) were fit.The parameter estimates along with standard error (SE) and significance levels are given in Table 1.The summary of the performance of the various models is given in Table From the results of model-a (Table-1), the estimate of intercept (average birth weight) is observed to be 5.27 kg which is slightly unrealistic and the slope (average change) is 1.26 kg.Both the estimates were found to be highly significant.Inspection of random effects indicate that the estimates of variance of intercept, slope and error which are 2.637, 0.166 and 3.974 were found to be statistically highly significant (Wald Z test) confirming that the initial status (birth weight) and linear growth rates were not constant.The estimate of the information criterionAIC for this model was found to be 80627 and the number of parameters needed to be estimated was 6.
From the results of the unconditional quadratic model (Model-b in Table-1), we can see that the residual variance went down from 3.974 to 2.340 (41% reduction) suggesting a better fit.The variance of intercept, slope and quadratic slope were also found to be significant suggesting that the variability in these parameters could be explained by between-animal predictors.The estimate of AIC has reduced to 73262 and the number of parameters estimated was 10.
The results of conditional quadratic growth model with predictors, gender, season of birth and dam's weight at lambing added are given under Modelc in Table-1.The estimate of intercept, slope and quadratic slope are found to be significant and so are their variance estimates.Further, the AIC has reduced to 72771 and the number of parameters needed to be estimated was 19.
To describe how the error is distributed, the within animal error covariance structures discussed above were introduced one by one to the model-c as specifying appropriate covariance structure in the model improves the predictions.The summary of the performance of all the competing models are given in Table-2.
When we choose the best model for the data, we need to focus on smaller AIC value and lesser number of parameters to be estimated (parsimony).So striking a balance between these two aspects, conditional quadratic model with heterogeneous AR1 error covariance structure [model d (iii) in Table -2] was found to be more appropriate for describing the growth profile of Madras Red sheep.The predicted values found using this model were very close to the observed values.The estimates of intercept, slope and quadratic slope of model d (iii) were found to be significant and more realistic (Table -1).The covariates gender and dam's weight at lambing were significantly associated with birth weight (intercept), while the covariate season of birth was not found to be significantly associated with birth weight even though we expect the animals born during off season to be slightly heavier at birth.All the three covariates gender, season of birth and dam's weight at lambing are significantly associated with linear growth rate.Gender and season of birth are significantly associated with quadratic slope.However, dam's weight at lambing is not associated with quadratic slope.This indicates that gender and dam's weight at lambing are important in predicting the initial status and the growth trajectory.Using these estimates from Table-1, we can construct prediction equations for the different combinations.Thus, the predictive equation for female lambs born in main season with dam's weight < 30 kg. is y = 2.251+ (1.702) t -(0.033) t .Similarly, for male lambs born in off season with dam's weight 30 kg, the equation is y = (2.251+0.199+0.018+0.24)+(1.702+0.317+0.310+ 0.106) t-(0.033+0.004+0.016+0.002)t .That is, y = 2.708 + (2.435) t -(0.055) t .
Similarly, the prediction equation for the other combinations can also be derived and effectively used.
Thus the steps outlined above for building a growth model may seem to require a lot of analysis, yet they need to be carried out to identify the parsimonious model that fits the data well.
The same longitudinal growth data of sheep were used for developing ANN models using feed forward architectures, MLP and RBF with back propagation training using Levenberg-Marquardt method of error function optimization.In both MLP and RBF, the number of hidden layers used was one and the number of nodes in the hidden layer was varied to identify the best combination.In MLP, the activation function used in the hidden layer and output layer was hyperbolic tangent and identity function, respectively.In RBF, the activation function used was exponential in the hidden layer and identity in the output layer.Between the two ANN architectures, MLP performed better with higher R and lower values of MAE, MSE and MAPE .
It can also be seen from Table 3 that ANN model using MLP is better than LMM with heterogeneous AR1 structure, clearly scoring better in all the performance indicators.The performance of ANN over LMM as measured by Thiel's U is 0.9264 indicating that ANN gives slightly improved fit compared to LMM for this data.
In this study, the potential of ANN as an alternative modeling technique to the conventional and more statistically appropriate linear mixed modeling was evaluated for the purpose of predicting unbalanced longitudinal growth data of Madras Red sheep.Conditional quadratic model with heterogeneous AR(1) covariance structure fitted using mixed model approach was found to be good with covariates gender of lamb and dam's weight at lambing showing marked influence on the growth parameters.Season of birth was found to be significant only for growth rate and not for the average birth weight.Between the two ANN architectures, MLP performed better than RBF and also MLP was better than the best linear mixed models identified in this study.Thus, ANN models provide another promising tool to analyze longitudinal growth data which can be exploited by animal breeders to bring about overall improvement in productivity.RG, PD, PK and CK were involved in the design of the Figure-1.Artificial neural network

Table - 2
. Summary of the performance of the various linear mixed models