A Novel Spatiotemporal Longitudinal Methodology for Predicting Obesity Using Near Infrared Spectroscopy (NIRS) Cerebral Functional Activity Data

Globally, there has been a dramatic increase in obesity, with prevalence in males and females expected to increase to 18 and 21%, respectively (NCD Risk Factor Collaboration, Lancet 387(10026):1377–96, 2016). However, there are hardly any data-analytic calorie-based cognitive studies, especially using non-invasive near infrared spectroscopy (NIRS) data that predict obesity using predictive data mining. Obesity is linked with neurodegenerative diseases, diabetes, and cardiovascular diseases. Thus, understanding, predicting, preventing, and managing obesity have the potential to save the lives of millions. Behavioral studies suggest that overeating in obese individuals is triggered by exaggerated brain reward center (BRC) activity to high-calorie food stimuli (Shefer et al., Neurosci Biobehav Rev 37(10):2489–503, 2013). In this paper, details of a novel research methodology are presented for a 24-month longitudinal study using a 44-channel NIRS device with the subjects in a natural environment. The proposed methodology consists of using visual stimuli of low/high calorie food items under fasting and satiated conditions for three types of subjects. The experiments consist of block design, longitudinal plan, data smoothing, BRC activation mapping, stereotactic normalization, generating paired t-test maps under fasting and non-fasting conditions and subsequently using Naïve Bayes modeling to generate obesity prediction maps for the control subjects. The simulated results consist of generation of Bayesian prediction maps using layers of paired t-test cerebral activity maps for the four BRC functional regions considered for three types of subjects, i.e., obese, control, and control subjects fed high calorie diet. We have demonstrated how cerebral functional activity data in response to visual food stimuli can be used to predict obesity in the non-obese, thus offering a non-invasive preventive measure.


INTRODUCTION
The United States will be unable to extricate itself out of the obesity endemic as per current clinical practices [3] i.e. new preventive and remedial measures are needed. U.S. Preventive Services Task Force has recommended obesity screening by physicians, which in pursuant to the Affordable Care Act (ACA) is covered by Medicare and many private health insurers. This obesity screening has the potential to save as much as $44 billion in long-term federal savings [4]. The proposed work is about using neural activity in response to visual stimuli of obese and control subjects to predict obesity, thus, supporting obesity screening and allowing time to take preventive measures.
At times our actions are motivated by ambiguous feelings which are hard to put into words, or by unexplained triggers resulting in a craving to eat high cholesterol/sugar foods; which in extreme cases can represent an eating disorder. Traditionally these involuntary feelings, thoughts and unconscious physical movements were said to be difficult to be explained in a scientific manner, but now such thoughts and feelings have started to be gradually explained due to the availability of sensitive equipment such as NIRS (Near Infrared Spectroscopy), fMRI (functional Magnetic Resonance Imaging), PET (Positron Emission Tomography) and EEG (Electroencephalography). These, coupled with subsequent application of data mining techniques for identifying correlated neural activities for neuroscience and cognitive science studies are proposed in this paper.
Obesity is related with neurodegenerative diseases, such as Alzheimer's Disease [5]. Results from neurobiological and epidemiological studies have proposed a reciprocal link between Alzheimer´s disease and depression [6]. Over the last decade, a number of studies have reported alterations and reductions in the whole brain volume of overweight/obese individuals [7,8,9]. The brain reward circuit consists of several brain structures, such as the Ventral Tegmental Area (VTA), the nucleus accumbens, and the prefrontal cortex. When triggered by a gratifying impetus (e.g. food, sex, video games), information travels from the VTA to the nucleus accumbens and then up to the prefrontal cortex that governs reasoning, self-control and decision-making. The objective of the proposed research methodology is the development of BRC activation maps to food stimuli. In particular, with respect to the neurobiological mechanisms for obese, normal (control) and norbese subjects (normal subjects fed high calorie foods) under different levels of hunger and satiated states with different types of visual stimuli.
Near-Infrared Spectroscopy (NIRS) is a non-invasive technique used to study the oxygenation and deoxygenation of Hemoglobin (Hb) within white matter of the brain. Although MRI (Magnetic Resonance Imaging) studies for structural imaging provide a cross-sectional view of the brain, the subject is required to lay still so as to avoid blurring of images and for taste-based experiments; subjects can only taste a drop of liquid to minimize head movement, thus limiting the scope of the study. The temporal resolution of fMRI for functional imaging is tens of seconds while that of NIRS is under a second, furthermore, in fMRI based studies patients suffering from Claustrophobia are given medication that makes them sleepy [10], thereby having an impact on results. On the contrary, in the proposed NIRS-based proposed research methodology, brain activity will be monitored in a natural movement environment, as discussed in section 1.2 Research has shown that with the passage of time, the brain of an obese individual is also functionally modified [2], eventually culminating in cognitive impairment. Therefore, we propose a longitudinal study to analyze the spatio-temporal aspects of obesity and weight gain by using NIRS data to create predictive spatial brain activation maps of the control and norbese subjects. The longitudinal study experiments are proposed to be performed over a period of 24 months. After de-noising, data filtering and smoothing, statistical brain activation mapping. Subsequently, using Naïve Bayesian modeling to discover spatio-temporal patterns of cerebral functional activity of the obese, norbese and control subjects. Finally, using these results to predict potential for obesity among the norbese.
2. BACKGROUND 2.1 Obesity and the brain Recent advances in neuro and medical sciences have actually controverted the historical concept that obesity is just a simple issue of caloric ingestion and subsequent consumption. Obesity has been found to be a rather complex and intricate neurological process comprising of neurohormonal and even neurotransmitter dysregulations of physiology [11,12,13,14,15,16,17,18]. Today, obesity is considered to be one of the most widespread epidemics. The worldwide data shows that during the last decade, the prevalence of obesity increased rapidly. In developed countries and upper-class societies with a sedentary lifestyle, obesity tends to be more common. As per OECD Health Statistics, 2004 (Fig-1) as a percentage of population in terms of obesity, the USA is ranked highest in the world. Although the prevalence of both Parkinson's and Alzheimer's diseases continue to rise, the available treatment strategies to combat these conditions remain ineffective against an increase in global neurodegenerative risk factors, such as obesity. While the impact of obesity on disorders like diabetes and coronary heart disease, for example, have been well characterized, it remains to be established what is the detrimental effect of obesity on our nervous system?
In clinical terms obesity is measured in terms of body mass index (BMI) or waist circumference and waist to hip ratio [19]. BMI is a simple index which is obtained by dividing weight of a person with the square of his/her height (kg/m 2 ). As per WHO (World Health Organization), a person having a BMI of 25 kg/m 2 is attributed as overweight while a BMI of 30 kg/m 2 or above identifies obesity. Fig-2 [7] shows the relationship between obesity and serious health risks for men and women.
Figure-2: Link between BMI, diabetes and coronary heart disease [7] From Fig-2 (a, b), it can be observed that most of the serious health risks i.e. Type-2 diabetes, hypertension and cardiovascular diseases are rooted in obesity and are a major cause of death.
Weight drop -even by 5-10 % of overall weight -can be advantageous in many ways [20] such as:  Decrease in risk features of overweight or obesity-associated diseases and indicators.  Decrease in blood pressure -perfect for patients diagnosed with hypertension.  Decrease of Glucose readings which is good for Diabetics.  Decrease of Cholesterol and fat levels, which aids patients diagnosed with cholesterol disorders, arterial clots and myocardial infarction.  Decrease of possible untimely deaths due to obesity-related diseases.  Enhancing one's appearance and morale.
2.2 NIRS vs. fMRI fMRI has been used increasingly over the past decade in a number of applications. The crucial drawbacks of the fMRI-based methodologies, however, are the expensive equipment and nonportability of that fMRI equipment. In fact, another broad review [21], in comparing the particular features of NIRS and fMRI, it was deduced that NIRS has immense possibilities for neurological and psychiatric applications, due to its easiness, portability, and most importantly robustness to motion artifacts. For the time being, the EEG technique is limited, due to its inadequate spatial resolution and low signal-to-noise ratios in many applications; NIRS can provide relatively better quality for these features [22].
Currently fMRI is the primary means of performing neural imaging for cognitive studies. However, the global obesity pandemic is pushing the hospitals to use extremely large MRI scanners that are typically used for animals, thus resulting in loss of dignity of the patients [23]. The reason being, conventional MRI scanners are incapable of operation and usage for severely obese individuals. According to the doctors, it is not just an obese person's weight that poses a problem; additionally, it is also the abdominal span of the patient that may be too large to occupy the operational space of the MRI scanner as shown in Fig-3.

Figure-3: Regular aperture MRI Scanner unsuitable for obese
The proposed NIRS-based methodology endeavors to address the intricate relationship between brain activation and obesity for calorie related cognitive response, moreover the experiments will be conducted in a very open, less restraining and normal environment as shown in Fig-4 with a full-head NIRS cap supporting 44 channels. NIRS is reported to have higher temporal resolution as compared to fMRI [24,25,26]. Temporal information is heavily blurred in case of fMRI [27] because of the limited hemodynamic response time.

Laser source
Usually the BOLD (Blood Oxygen Level Dependent) response time has a width of about 3 sec with the peak appearing about 5-6 sec after the beginning of a short neural stimulus. However, this is much slower than the basic neural activities. Although NIRS signals have considerably weaker SNR (signal to noise ratio), but nevertheless are often strongly associated with fMRI measurements [28].
Traditional neuroimaging techniques restrict movement and make it difficult to study the processes that require oral, upper limb, or lower limb motor execution. Functional near-infrared spectroscopy (fNIRS) measures brain oxygenation and permits movement during data acquisition. Observe that for MRI-based experiments, i) the subject in a supine position can hardly drink anything with the exception of a few drops of a liquid, ii) if the subject has to be taken out of the MRI setup for normal drinking, it will be impossible to achieve the prior head position iii) the stress of claustrophobic conditions effects the outcome of experimental results. As per our NIRS-based proposed research methodology, the subject can take part in the experiment in a flexible and relaxed natural environment and consuming the drink, as opposed to just a few drops. NIRS has some additional advantages over fMRI, for example, device portability, relative user-friendliness and low-cost. However, as compared to fMRI, NIRS has its own shortcomings, such as shallow penetration depth, lower spatial resolution, resulting in some uncertainty about the region being probed. NIRS has several advantages as compared to positron emission tomography or magnetic resonance imaging, such as having higher temporal resolution, easy to operate, and portability of the equipment.

NIRS vs. EEG
EEG has comparatively restricted spatial resolution, thus limiting its application for evaluating regionexclusive brain activity. On the other extreme, high spatial resolution can be realized using fMRI, but at the expense of both "restraining" the subject and compromising temporal resolution (e.g., [29], [30]). Hence, NIRS is a likely substitute, achieving a possible middle-ground in terms of temporal and spatial resolution as well as kinesis amongst EEG and fMRI technologies (e.g., [31], [32]). Furthermore, NIRS has established to have achieved generally improved differential between two levels of workload in comparison with EEG [33].
As of lately the EEG-based studies have reported recognition rates of only mid-50% for two-way classification [30] which is significantly below NIRS recognition rates of mid to high 60% [34] for similar paradigms. Hence, NIRS could potentially be a useful substitute for gauging affect-related activity i.e. scrutinizing if affect ratings predicted localized cerebral responses to high and low-calorie foods [35]. More specifically, for NIRS-based affect-related studies (e.g., [36][37][38][39][40][41]), the results are significantly coherent across several efforts, furthermore, across a various set of contexts (i.e., moral decisionmaking, threat, working memory activities etc.) demonstrating detection rates notably superior than chance. However, as our work is conceptually similar to that of [42] is centered around frontallylocated optodes in the proximity of primary facial muscles, therefore, the recordings might be effected by some degree of corresponding artifacts in addition to the brain activity.

Obesity and cognitive dysfunction
Cognitive aging is a typical process, and in case of elder adults there is a change at the structural and functional level resulting in a consequent decline of cognitive ability. An increasing body of research shows that mid-life obesity is a forecaster of slight cognitive inadequacy at old-age [43,44] as shown in Fig-5. However, even while managing cognitive aging, studies have revealed an inverse relationship between BMI and global cognitive functioning [45,46,47]. Studies have shown that in the word-list learning test, as compared to non-obese subjects, obese subjects were not able to recall more words from the list and they also took longer to complete the DSST exams [48]. Although it is tough to draw definite comparisons of multiple investigate studies made across different cognitive domains, but the lack of such studies for explicit cognitive domains such as executive function (prefrontal cortex) and shortterm memory (temporal cortex) have been frequently documented in obese persons related to nonobese counterparts [48,49,50,51]. However, none of these studies are based on cerebral activity data for monitoring and analyzing the association between obesity and cognitive functioning of obese and non-obese (or normal weight) individuals. Therefore, our proposed framework endeavors to address this apparent research gap.
2.5 Brain reward system and NIRS The brain reward system consists of a group of brain constructs and neural connections that oversee and control reward-related cognition, which includes strengthened learning (e.g., positive strengthening), motivational prominence (i.e., "wanting", craving or desire), and pleasure (i.e., voluptuous "liking") [52]. Expressions that are usually used to define demeanor related to the "craving" element of reward include anticipatory conduct, preparatory conduct, appetitive conduct, instrumental conduct, and looking for [53]. These expressions are typically used to define conduct related to the "liking" element of reward include copulating conduct and taking conduct [53]. Fig-6 shows the BRC and surface anatomy and functions. Fig-6(a) [54,55] shows the brain reward system consisting of the ventral tegmental area (VTA), ventral striatum (primarily the nucleus accumbens, also, the olfactory tubercle), dorsal striatum (i.e., caudate nucleus and putamen), substantia nigra (i.e., the pars compacta and pars reticulata), hippocampus, anterior cingulate cortex, prefrontal cortex, insular cortex, hypothalamus (mainly, the orexinergic nucleus in the lateral hypothalamus), thalamus (multiple nuclei), subthalamic nucleus, globus pallidus (external and internal together), ventral pallidum, amygdala, parabrachial nucleus, and the rest of the stretched amygdala. Using fMRI, PET and SPECT, patients with anorexia nervosa (eating disorder categorized by an unusually low body weight) who are exposed to food show activation of the temporal regions [56][57][58][59]. Fig-6(b) shows the brain surface anatomy and functions. The frontal lobe takes care of higher mental functions, such as planning, thinking, problem solving, emotional expression, judgement, behavioral control and creativity. The occipital lobe is responsible for visual functions, such as coordination of eye movements, perception, image recognition, association and visual memory. Temporal lobe is the association area and responsible for activities, such as short term memory, equilibrium and emotion. More specifically, the BRC is considered which is functionally identified by the frontal lobe and two temporal lobes (left and right) [60, 61], however we have also considered the occipital lobe as it deals with image recognition and visual memory. Thus we have considered four feature maps for the four functional lobes of the brain. Paradoxically, most of the BRCs (striatum, Hippocampus) are deeply embedded in the brain and, for adults, are well outside the reach of the lasers used in NIRS. However, by waiting longer to receive and record the reflected NIRS lasers the analysis depth can be increased, but that is at the cost of lower temporal resolution which is a hallmark of NIRS. However, there are studies [62] using fMRI-NIRS combination that have quantitatively shown high correlation between physically deep neural activity recorded using fMRI and relatively shallow depth response recorded using NIRS. Therefore, as part of the proposed research methodology we propose to record the functional cerebral/neural activity in the frontal lobe and temporal lobes (as shown in Fig-3) using NIRS. We will also consider the occipital lobe that deals with image recognition and visual memory w.r.t viewing pictures of food and non-food items by the subjects as discussed in section-3.5.

LITERATURE REVIEW
There is a sizeable body of knowledge on using fMRI for obesity centric studies [63][64][65][66][67], however, in view of the NIRS centric scope of this paper, we will limit the focus mostly to NIRS-based studies.
In [68] the generic architecture of CLARION (Connectionist Learning with Adaptive Rule Induction Online) is explained that proposes a framework for implementation of different cognitive models i.e. from immediate memory to social behavior, categorization, decision making and through to motivational processes. This forms the corner-stone of our work, because impulsive personality disorders (or borderline personality disorders) are linked with impulsive eating pathology (e.g., binge-purging/ eating type; binge eating disorder, bulimia nervosa) [69]. CLARION provides a generic cognitive architecture with a wide variety of fairly comprehensive computationally specified models of different psychological processes. CLARION is composed of a number of subsystems; the subsystem of our interest is the motivational subsystem (MS). The motivational subsystem (MS) is about why a person does what he/she does. MS is further divided into a set of basic motives, called primary drives, which are unanimous across individuals. Variation in individuals could be explained (partly) by the differences across this set of drives (different food cues in our case) in terms of strengths of drive (cerebral activations in our study) in different situations (fasting vs. non-fasting in our study) by different individuals (different subject types in our case). To begin with, the "Primary drives" are vital to an individual and are most probably built-in (hard-wired) significantly. Some examples of low-level primary drives are food, sex, water and so on. Studies [69] have shown high dopamine levels in the BRC of rats subsequent to exposure to food, sweets, and sex. Finally, human-centric imaging studies have reported activation in the BRC in response to food, drugs, money, and romantic love.
Decision-making is often motivated by immediate gains as well as deferred rewards. Current research proposes that obesity is linked to noticeable alterations in learning and decision-making. This broad difference may also be a motivation for the liking for straightaway consumable, highly appetizing, calorie-rich but unhealthy foods. Subsequent weight gain can be described based on such poor foodrelated inter-temporal decision-making [70]. Behavioral studies have recognized that predictable rewards are reduced as per a decreasing hyperbolic function. In [71] it was revealed that CA3 region of hippocampal neural network model consists of a process that could explain reduction in posterior reward-prediction systems (e.g., basal ganglia). Simulations have shown that this "predictive similarity" decreases as the stimuli are disconnected in time that is at a frequency which is harmonious with hyperbolic decline. It was found that the decline in predicted reduced reward was complemented by increased functional connectivity between the prefrontal cortex and the temporal lobe. In the proposed work we intend to study and predict this spatiotemporal relationship among the referenced regions of the brain for obese and control subjects using a NIRS device.
In [72] it was investigated in a cognitive-based setup, how the two brain hemispheres are engaged spatiotemporally while subjects associate different colors with different concepts. An experimental setup comprising of a multi-channel near-infrared spectroscopy (NIRS) device was used to gauge the variations in brain hemoglobin strength (OxyHb and DeOxyHb) during a concept-color association task conducted in a block design paradigm. In this suggested framework, we suggest using a NIRS device, but instead of showing colors and text, we present visual food stimulation as low and high calorie food pictures and non-food items to the subjects. As in [72] the channel-wise activation data will be recorded, but instead for three categories of subjects (obese, control and norbese) under fasting and non-fasting conditions; no such categorization of subjects was done in [72]. Furthermore, a longitudinal study was not considered in [72] which is the basis of the proposed work. Finally, instead of performing clustering to detect channels having analogous spatiotemporal activity, predicative data mining will be used to identify subjects with potential for obesity. Thus, the proposed work proposes innovative application areas of cerebral neural activity analysis for predicting obesity by gauging and modeling the cerebral neural activity as per functional reward-center regions of the brain.
In [73] a technique is presented for objective assessment of taste based on brain activities. The system was able to assess the subject's liking by changes in cerebral blood flow centered on activities in the frontal lobe. The subject's brain activities were measured using NIRS while drinking a beverage, which was subsequently verified from supporting data, thus establishing the relationship between physiological preferences and brain activities.
Frontal lobe dysfunctions have been associated with pathophysiological Eating Disorders (ED). Prior research examined neural aspects of ED through neuroimaging studies by applying symptom-related stimuli, such as body-image distortion and food, however, the results were inconsistent because of variations in task, the stimulus employed and experimental design. With the purpose of better understanding frontal lobe dysfunction and its association with clinical indicators in ED, in the NIRS study [74] the frontal lobe functioning was examined while a cognitive task was performed instead of a symptom-related task, monitored by near-infrared spectroscopy (NIRS). It was found that the regional hemodynamic variations were considerably less in the ED group as compared to the healthy group in the bilateral orbitofrontal and right fronto-temporal areas. The changes were not positively associated with dieting affinity results in EAT-26 in the right fronto-temporal zones and binge food consumption results in the left orbito-frontal areas and with the eating restriction.
In [75], the researchers investigated hemodynamic variations in the brain in response to Motor Imagery (MI) and Motor Execution (ME) of swallowing using NIRS. Prior studies have provided evidence that MI and ME of limb movements point to similar brain activation patterns that points to the potential value of MI for motor restoration. Therefore, in this regard, identifying correlations of MI brain activation of swallowing could be potentially useful for dysphagia treatment. In the study [75] 14 healthy subjects actively swallowed water i.e. ME and also mentally imagined swallowing water i.e. MI in a random order while changes in strength of oxy-Hb and deoxy-Hb were recorded. Strongest NIRS signal changes were observed in the inferior frontal gyrus for MI and ME.
In [76], the researchers used functional near infrared spectroscopy (fNIRS) to study human cortical taste cognitive processing which has some inherent shortcomings such as being 2D, restricted to lateral cortical surface and resolution of 10-30 mm. In this review, the researchers showed how these technical obstacles have been overcome, such as availability of mapping software for standard brain coordination systems. The study concluded that fNIRS was a potential mediator between psychology and neuroscience.
In [77] hypothesis is discussed that throughout the course of development of sensory functions there are different postnatal sensitive periods. It has been demonstrated through the human brain mapping approach (HBA) that healthy human postnatal and subsequent development of any sensory function matches morphological and functional development of the CNS (Central Nervous System). However, it was still unknown [77] if experience with tastes and/or odors had a stronger effect on the perception of olfactory and/or gustatory stimuli when the subjects were earlier exposed to certain odors and tastes.
The research [78] was based on the hypothesis that a hyperactive reward system partly mediates an enhanced motivational influence of foods in fat individuals. The researchers studied stimulation of reward-system and related brain areas in humans using fMRI in response to photographs of high-and low-calorie foods in 12 over-weight and 12 normal-weight female subjects and compared the results. They found that photographs of high-calorie foods created more stimulation in the obese people in contrast to normal-weight women in the associated brain areas.
In another study [79], fMRI was used to observe the cerebral responses in a pool of 13 control and healthy adult females while they were viewing color pictures of food, involving high-calorie foods, lowcalorie foods, and inedible cutlery and items related to dining. It was found that activation of the two food categories is linked with bilateral activation of the amygdala and ventromedial prefrontal cortex, while high-calorie foods having produced higher impact.
In [80], the researchers explored to see if the same neural circuits were hard-wired for anticipatory and consummatory stages of food based on its smell. The experiments revealed that the amygdala and medi-odorsal thalamus has preference towards food odors that predict instantaneous availability of their corresponding drink as opposed to the ones that foresee availability of a tasteless solution.
Motor imagery MI is one of the standard concepts of brain computer interface (BCI), and one such study using EEG is reported in [81]. In MI, users generate induced neural activity by imagining motor movements, but this is unlike our work where obese and control subjects are presented visual stimuli which does not involve MI. In [82] an EEG study is presented for drowsiness detection analyzing the relationship between drowsiness and nap based on physiological signals during daytime short nap; but in our work subjects are required to respond to visual stimulus while being fully awake. Regularization is used to prevent over fitting in electroencephalogram (EEG) classification of brain-computer interfaces (BCIs). In [83] sparse Bayesian classification is used to avoid over-fitting by using regularization. Although our work uses Bayesian inference, but is not about curve fitting. In [84] a steady state visually evoked potentials (SSVEP) study using EEG is considered. SSVEP are natural responses to visual stimuli at explicit frequencies; subsequently the brain produces electrical activity at the same frequency (or multiples) of the visual stimulus, however unlike our work [84], obese and control subjects are presented with high/low calorie visual stimuli under fasting or satiated conditions without reference to any visual stimulus frequency.
A brief summary of fNIRS work similar to our in terms of Food stimulus or food intake with functional lobe activation monitoring is given in Table-2.
Based on the literature review it can be concluded that most of the prior taste-related cerebral functional activity work is MRI-based, and the reported NIRS-based work is neither obesity related nor longitudinal, furthermore, does not involve predicative data mining. Thus, the proposed research methodology is novel and will contribute knowledge to the prior published work.

Hypothesis
We hypothesize that the reward system activation brain maps (generated as per the proposed methodology) of norbese used in conjunction with the corresponding maps of the obese (along with their medical and psychiatric evaluations) can be used to predict weight gain in the norbese. On the contrary, level of cerebral functional activation to high/low calorie liquids and food (non-food) stimuli could differentiate those subjects who will be later successful or unsuccessful in losing extra weight and/or keeping the weight lost. Figure-7 shows the proposed research methodology at the macro-level. Fig-4 will now be explained very briefly, and in the subsequent sub-sections, each operational block shown in the figure will be discussed. Figure-7: NIRS experiment data collected from detectors, de-noised, filtered, paired t-testing done, stereotactic normalization done, activation maps generated and integrated with medical/hematology, and questionnaire data for predicative data mining and prediction results mapped to the control subject An important component of the proposed research methodology is the subject, starting with how many subjects to consider to be a representative sample? What should be their age group and gender? What margin of error will be acceptable and what should be the confidence level and how much variance in the results is expected? Based on the answers to these questions, control parameters of the NIRS device are set and supporting data experimental design is performed; which is a key part of the research methodology. Once the experiment is designed and streamlined after trial runs, actual experiments will be performed and neural activation data of the subjects captured using a multichannel NIRS device.

High-level methodology
Data recorded from the operational NIRS system is inherently noisy, containing noise corresponding to heartbeats, blood pressure, breathing etc., thus prior to usage the data needs to be cleansed and denoised. De-noising can be done by filtering to remove different undesired frequency components, requiring application of band-pass filters etc. NIRS data resulting from the experiments on the subjects will not be used in isolation of their hematology (RBC, WBC etc.) and medical data for which laboratory tests are performed. Furthermore, eating and related habits of the subject also need to be considered, which require response to questionnaires such as EAT (Eating Attitude Test).
Once all the required data is collected and transformed, it is integrated for predictive data mining. The data collected as part of the first year of the longitudinal experiments will be used to short-list those norbese subjects who are predicted to have potential for obesity and these predictions are proposed to be validated with the actual results during the second year of the longitudinal study.

Supporting data collection
Prior to the fNIRS experiments important medical and chemical profiling of the subjects will be done such as blood glucose, blood cholesterol, BP etc. results. NIRS experiment will be followed by presenting questionnaire to the subjects (MCQs, Likert scale questions and Open ended questions) or using standard ones such as mini-mental state examination (MMSE) that are extensively used so as to assess the cognitive deficiencies required for diagnosis and prediction; the corresponding results will be used in conjunction with the NIRS results. This data will be used by the psychiatrics in subsequent interviews in order to better understand subject behavior and correlate the responses with the NIRS results. Fig-8 shows sample Chemistry, Hematology and EAT data collected as part of trial runs of the proposed methodology. This data will be used for further evaluation by professional physiatrists and physicians.

Calorie consumption and stimulus presentation
To measure brain activation under calorie consumption, we will use the standard calorie value of the drinks/liquids [89] as shown in Table-1. Brain activity in response to calories will be observed using NIRS device after consumption of liquids with low, medium, and high levels of calories based on published data. The experiments would involve visual stimuli consisting of HCF items, LCF items and NF items by norbese, control and obese subjects during a fasted state, and then repeating the experiments after breaking the fast by drinking low calorie drinks and in another set of experiments breaking the fast by drinking a high calorie drink. Fig-9 shows sample public domain pictures of some HCF, LCF items and NF items considered to be used in the proposed research methodology. The design of the experiment for the proposed research methodology is presented in Fig-10. Each set will be composed of X sec of task i.e. viewing the pictures and Y sec of rest i.e. viewing blank computer screen, in that order. By arranging three sets for each level in pseudo-random order, a total of Z sets of experiment are proposed to be conducted. The values of X, Y and Z to be determined as part of the research methodology and by taking into account the cerebral blood flow time.

Block design paradigm
From the viewpoint of stimulus presentation and consumption of liquid (as in our case) there can be two main methods to design NIRS experiments [90]. The first approach is called block design; this approach is a traditional way of experiment design and is familiar to statisticians. The second approach is typically called event-related design with foundations in the functional neurology studies i.e. eventrelated potentials (ERP). Mixed or hybrid designs, that conglomerate features of both block and ERP are also possible. Traditionally, NIRS experiments utilize simple block designs, in which phases of rest (or idle time) alternate with phases of task, or phases of different tasks alternate e.g. low, high calorie and non-food visual stimuli in this or any other particular order. Such a sequence is compulsory so as to gather sufficient data to perform feasible statistical analysis.
Observe that block design paradigm "boxcar designs" (known because of their "on-off" nature, shown as "up-down" in Fig-11) is suitable for an assortment of statistical analysis. Block design experiments are simple to execute, with stimulus (food pictures in our case) shown for a fixed duration. A simple example of such an experiment is drinking a measured volume of liquid before the task blocks, switching with blocks of rest i.e. during which no picture is shown.

Data de-noising and Filtering
Data in its raw form (e.g., from an operational system or NIRS) may not always be suitable for analysis, particularly for predictive data mining thus the ETL (Extract Transofm Load) phase is used. Therefore, the data must be cleansed and transformed so as to achieve the best utilizable state. Thus data preparation (including transformation and cleansing) is a cricual task in the overall process of knowledge discovery. The reason being, different predictive data mining techniques perform differently dependent on the preprocessing and transformational methods used.
The NIRS sampling done at the default rate of 26 msec with the placement of optodes on the skull as per the international 10-20 reference system [91]. Although noise is known to exist in NIRS signal, however, some of this noise cannot be removed as we have almost no information about it. Furthermore, this also depends to a large extent on the "noise" as defined by the modeler. Some of the noise encountered in the NIRS signal is the heartbeat frequency , about 0.7 to 1.5 Hz, blood pressure or Mayer wave frequency, 0.1 Hz, and the breathing frequency, between 0.13 and 0.33 Hz. These intrinsic frequencies show in the NIRS signal as noise [92]. To deal with such noise, filters are used to reduce the strength of the noise. NIRS device software tools provide several types of filters that selectively pass some data/frequency values and hold back others as per the restrictions imposed by the analyst .   Fig-12(a) shows the NIRS signal with extensive noise, here oxygenated hemoglobin (oxy-Hb or HbO) signals are shown by red color, deoxygenated hemoglobin (deoxy-Hb or HbR) by blue and their sum (HbT) by green. Observe that due to heavy noise it is almost impossible to visually differentiate the three signals; which gets comprehendible after selective filtering as shown in Fig-12(b, c) with the corresponding de-noised results shown in Fig-12(d). Furthermore, without de-noising the corresponding file sizes are very large requiring extensive storage space and processing time.

Brain activation mapping
The blood-oxygenation-level-dependent (BOLD) signal is typically used to spot the regional cerebral activity in response to a stimulus presentation, e.g., food pictures in our case. In a block design approach the stimuli (i.e. paradigm in the fNIRS terminology) are intended to spot activated and nonactivated cerebral regions with high certainty. However, deforming noise introduced during NIRS data capture, subject breathing, heartbeat and the normal brain activity interference makes this detection difficult. The classical method for topographic NIRS data analysis consists of addressing this problem using a paired t-test. The objective is to determine if the concentration change between two states (in our case, "fasting" vs. "without fasting") is statistically significant or not. Many researchers currently use t-test methodology [93][94][95], since it is simple and, thus, can deliver a speedy assessment of the task. One of the most commonly used such tools is HomER which is a Matlab-based program [96].
Let x = HbO value with fasting and, y = HbO value without fasting. To test the null hypothesis i.e. the true means match, we proceed by calculating the t-score = .
( .) , here ̅ is the mean difference (d i = y i − x i ), sd standard deviation and SE standard error and n number of records. Based on the null hypothesis, this statistic pursues a t-distribution with n − 1 degrees of freedom. By means of t-distribution tables comparing the t value for the t n−1 distribution provides us the p-value for the paired t-test.
To map t-test values to the brain, anatomical normalization is need to be performed as per the Montreal Neurological Institute (MNI) standard template [97], this is achieved by means of affine transformation so that the test maps are globally aligned. Furthermore, the cortical region included by each NIRS channel needs to be assessed along with the Brodmann's Area (BA) image and Automated Anatomical Labeling (AAL) of images [98].

Predictive data mining
The predictive characteristic of data mining techniques is probably the most mature aspect of the model, having the highest possible return and the most accurate description [99]. Some of the predictive data mining techniques (but not limited to the following) are Naïve Bayesian and weighted Bayesian learning, Support Vector Machine (SVM), Artificial Neural Network (ANN), Decision Trees and more.
In real-life data, the existence of noise (in regression) and class intersection (in classification) dictates the principal modelling test to be evading 'over-fitting' of the training set. In neuro cognition, physiologically driven approaches are actually employed for data reduction to integrate spatial dependencies, for example picking regions of interest (ROIs) in the brain; in our case BRC. Based on user-input and apriori knowledge about brain morphology linked with a given task [100], in our case selecting ROI substantially diminishes the input data [101]. Thus, sparsity is an unlikely issue in our case because of brain's known functional anatomy [60, 61] and well-defined 10-20 optode pattern [91]. Thus, over-fitting because of class overlap is unlikely, and therefore, we have not considered Sparse Bayesian learning in our work. For similar reasons, SVMs are not considered as they make superfluously generous use of basis function as the mandatory number of support vectors usually grows linearly with the size of the training set. Therefore, some form of post-treatment is frequently necessary to decrease computational complexity [102,103]. Another reason for not considering sparse learning is a limitation of sparse learning systems i.e. to discover the complete set of channels implicated in the neural activity of obese subject instead of a minimal informative subset, therefore, making it necessary to post-process the results using clustering algorithm [104], with consequent increase in computational complexity of the BCI solution.
Data recorded from the operational NIRS system is inherently noisy, containing noise corresponding to heartbeats, blood pressure, breathing, physiologic noise arising from faint brain rhythms, the oscillations of metabolic-linked brain physiology etc. Since C45 rules don't scale well for 'noisy' and large datasets [105], hence it was not considered. Similarly, the main shortcoming of J48 is its run-time complexity which does not scales well with increase in depth of the corresponding 'tree' [106]. The dependability of Decision Trees is reliant on data quality i.e. the robustness is not one of the strengths of Decision Trees. Even a minor alteration or 'error' in the input data can cause significant disparities in the corresponding results. Another main restriction of decision trees is that the decisions/results are based on expectation and a minor alteration in a rational expectancy can cause major variation in results [107]. Regression could not be considered suitable for prediction because the technique only considers linear relationships among dependent and independent variables, which is hardly the case for complex real-life problems that may result in unrealistic outcomes [108]. Markov chain predictions are based on the current state and subsequent transitions from this state. Therefore, depending on the current state, it is required to exactly match a known state in the model. Thus, if the current state has not been reached, it is not possible to have transitions from that state and no future states can be reached i.e. predictions cannot be made.
After comparison of Naïve Bayesian learning with several predictive data mining techniques as discussed, we concluded Naïve Bayesian to be a superior prediction technique for our application domain. Naïve Bayes does not suffers from the issues associated with the other techniques as discussed and also caters to specialized unsupervised learning. Furthermore, Bayesian techniques provides formalism of thought under ambiguous conditions, such that gradations of belief are represented as numerical factors, which are subsequently merged as per rules of probability theory.

Prediction Engine
Application of Naïve Bayes modeling results in probability classes, i.e., when offered with unclassified data item, centered on posterior probability Naïve Bayes model allocates the recently presented unclassified item to one of the likely class categories [109] (obese or non-obese in our case). These outcomes are then color-coded by setting probability class thresholds and corresponding cerebral triggering prediction map created. Now, we will present a short statistical formulation of the Naïve Bayesian modeling used in the proposed framework. Baye's theorem aids in understanding how the posterior probability that a hypothesis is true is motivated by recently shown data.
Consider the following articulation of Bayes' theorem. Given a class variable y and corresponding dependent feature vector x1 through xn, the following relationship is expressed by Bayes' theorem [110]: In our case, y is the cerebral channel response of an obese subject's BRC functional area map (say frontal lobe), while the feature vector are the corresponding channel values as per the functional area maps considered for the same functional area of the brain but for the three types of visual stimuli of non-obese or control subjects. Using the naive independence assumption i.e. neural response to NF is independent of the response to HC and LC visual stimuli, we get for all i, this relationship is simplified to Since P(x1,…, xn) is constant for the given visual stimuli, the following classification rule can be used: Subsequently we can use Maximum A Posteriori (MAP) to estimate unobserved quantities P(y) and P(xi|y) on the basis of empirical data; the former is then the relative frequency of class y in the training set. Figure-13 briefly explains the Naïve Bayes Training and the test Algorithms w.r.t obese subjects as used in our work; similar algorithms for the Norbese. Note that in Fig-13, the state of a channel i is considered to be true/false based on the cut-off threshold as discussed in section-4.7.

Complexity Analysis and Implementation Efficiency
The worst case time complexity of a Naive Bayes classifier is O(Np), where N is the number of training trials and p is the number of features; in our case this being four functional region maps of brain each consisting of 20 NIRS channels. The worst case space complexity of Naive Bayes classifier is O(pqr), where p is the number of features, q is values for each feature i.e. NIRS time-series data for each channel, and r is alternative values for the class.
To classify an unknown class instance, Naïve Bayes' Theorem is used to first approximate the probability of the instances with membership of the obesity class, and then the probability of its membership to the not-obese class. Subsequently the obese class is normalized w.r.t sum of both classes to produce a obesity confidence value between 0.0 and 1.0. Observe that the denominator of Naïve Bayes' Theorem can be ignored as it gets cancelled in the normalization step. From an implementation point of view, as the number of attributes grows the numerator tends to get very small, since many tiny probabilities are getting multiplied with each other. The resulting small number can become problematic for finite precision floating-point numbers. Thus, a possible solution is to change all probabilities to logs, and in place of multiplication performing addition. Another issue is avoiding zero conditional probabilities and instead using a "Laplace estimator" i.e. a very small probability.

Longitudinal Study
Longitudinal studies are used to explain regular growth and aging, to evaluate and study the consequence of risk aspects on human health, and to gauge the effectiveness of treatments [111]. A longitudinal study generally results in several or "repeated" measurements of same metrics for each subject over a period of time.
The key part of the proposed research methodology is the longitudinal study. Repeating the experiments longitudinally during the second year would allow studying the relationship or changes between neural blood accumulation, weight, sugar, cholesterol levels etc. for obese, control and norbese subject's consequently improving the quality of prediction results.
The proposed longitudinal experiment will consist of weekly activities as shown in Fig-14 repeated over a period of 24 months. One week of activity will consist of breaking fast by consuming low-calorie or high calorie drinks by the three category of subjects' i.e. normal, non-obese and obese. The liquids will be consumed by breaking the fast i.e. in the morning. The subjects would be presented with visual stimuli consisting of HCF, LCF and NF items before and after breaking the fast on a particular day. Fig-14 only shows the experiment design for control subjects, the same process will be repeated for obese and norbese subjects. In Fig-14 N is the number of subjects, this number to be established as discussed in section-3.3. The longitudinal experiment plan is based on using a single multi-channel NIRS device, and 5 subjects of each of the three categories. Based on prior experience it is practical to conduct brain scanning of 5 subjects in the morning in fasting condition using one type of visual stimulus i.e. either HC, LC or NF.
The same subjects will subsequently break the fast using LC drink and their brains rescanned the same day with the same visual stimulus as of the morning. The next day the same 5 subjects will have their brains rescanned under fasting in the morning with the same visual stimulus as before, and after breaking the fast using HC drink their brains will be again scanned the same day using the same type of visual stimulus. Thus three days are required to scan brains of one type of subject using same type of visual stimulus.
Since each scan consists of mapping four regions of the brain, therefore, the maps generated by one type of 5 subjects each (say obese) and using same stimulus during three days will result in 120 (=3x(5x4 + 5x4)) activation maps. Thus for three types of visual stimulus nine days will be required resulting in 360 maps. Therefore, the time needed to scan all three type of subjects for all three types of visual stimulus with fasting and without fasting with two types of drinks will require 27 days resulting in 1080 (360x3) maps. Considering a 5 day week this will need 5.2 weeks to complete one cycle. For a 54 week year we expect to have at most 10 such cycles resulting in 10,800 maps. Thus, there are a total of 7,200 training trails for 3,600 (obese) plus 3,600 (norbese) subjects while 3,600 (control) test trials.

Cross Validation
As per [83,84], cross-validation (CV) forces two main restrictions on brain computer interfaces (BCIs): 1) a sizeable volume of supplementary training data is needed from the user for validation of parameters and 2) it takes a comparatively long time and comparatively high computational cost to standardize the classifier. These restrictions and subject set-up time for NIRS recordings substantially deteriorates the system's feasibility and may cause a user to be unwilling to use BCIs. Finally, for high data-level variation, the predictive error can mostly be due to this data-level variation resulting in vastly different scenarios having similar levels of CV error. Therefore, CV was not considered to be suitable.

RESULTS
In this section, simulated prediction results using Naïve Bayesian modeling will be presented and discussed. However, SVM or any other predictive data mining technique could be used, but that will require formulating and mapping the problem accordingly. Fig-15(a) shows the placement of 6 pairs of optodes on the frontal lobe of the subject, here red squares correspond to the emitters and blue correspond to the NIRS detectors. Visual stimulation presented to the subject under fasting and HbO and HbR (i.e. Oxy and DeOxy) values recorded. Subsequently breaking the fast with a certain calorie liquid, and the same visual stimulus presented to the same subject and HbO and HbR noted. This process repeated for all the subjects of a certain category and the average taken for each subject. After data cleansing and pre-processing as per Fig-7, paired t-test conducted (p < 0.05 was deemed to be statistically substantial) and the resultant t-values color coded and spatially mapped to the brain surface after stereotactic normalization; one pair of such maps is shown in Fig-15(b).
Note that instead of HbO and HbR maps, HbT map could also have been used. Fig-15(b) shows the paired t-test value cerebral activation map for the frontal lobe, similarly, activation maps generated for the two temporal lobes and the occipital lobe. For the purpose of prediction we consider the paired ttest value cerebral blood activation maps of the control and obese subjects for the same experiment conducted i.e. the same visual stimulus, same type of drink and the same satiation state i.e. fasting or non-fasting state. These stereotactic normalized reward surface region activation maps for each region of the control subject are considered as "data layers" as shown in Fig-16. As shown in Figure-4, the NIRS optode full head cap supports simultaneous data collection from 44 NIRS channels, as the brain reward system is functionally localized, hence all 44 channels may not be necessary. Therefore, as shown in Fig-15, each BRC functional map is used that consists of 20 NIRS channels i.e. four sets of maps of 20 channels each.
Four sets of maps for each of the four brain surface regions will be considered, and for each region using the maps for the three visual stimulus with and without fasting. Now using the obese subject's paired t-test value frontal lobe map as reference, this map generated without visual stimulus (shown by blue border in Fig-16) and using this map for generating the posterior probability cerebral activation map for the control subject so as to predict obesity, as discussed in sections 3.6 and 3.7. The resultant cerebral activation prediction map as per accepted cut-over threshold for positive probability of obesity is shown at the top of the data layer (Fig-16), where predicted regions of cerebral activity for the control subject w.r.t obese subject are shown by pink color. Note that in the prediction map, posterior probability cut-over threshold for a positive case of obesity to be established in consultation with physician and physiatrist using the supporting data (Fig-7). Finally, the prediction results of the control subjects to be verified in the second year of the longitudinal study by comparing with actual cerebral activation t-test values of the same control subject so as to establish the accuracy of the proposed technique. In case of close comparison or high accuracy of the proposed technique, preventive measures need to be taken to avoid obesity in the identified norbese.

CONCLUSIONS
Behavioral studies suggest that overeating in obese individuals is caused by inflated brain activity to stimuli related to high-calorie foods [2]. Based on this fairly recent concept of the effect of obesity on brain, in this paper, we have presented details of a proposed novel 24-month long longitudinal NIRSbased research methodology that considers obese, control and norbese (normal subjects fed high calorie food) subjects for caloric-centered visual stimulus NIRS time series data collection and subsequently generating their paired t-test brain reward-centric cerebral activation maps. We have shown that Naïve Bayes predictive data mining can be used in a spatiotemporal arrangement to predict which norbese subjects have the potential of obesity. Due to the extensive scope and complexity of the proposed methodology, it will pragmatic to undertake its implementation as a full-time funded R&D project. The consequent cerebral functional activity data bank generated as a result of the proposed methodology to be made publicly available, this data bank could be of significant academic and research value. As part of future work, we propose modifying the proposed NIRS research methodology for an fMRI based study i.e. multi-modal approach and comparing the fMRI-NIRS results for identification and better understanding of strong brain activation regions that mediate motivational and emotional responses to food stimulus in obese subjects in order to improve the quality of prediction results.

CONFLICT OF INTEREST
Ahsan Abdullah declares that he has no conflict of interest. Amir Hussain declares that he has no conflict of interest. Imtiaz Hussain declares that he has no conflict of interest.

ETHICAL APPROVAL
This article does not contain any studies with human participants performed by any of the authors.
ACKNOWLEDGEMENT Professor A. Hussain was supported by the UK Engineering and Physical Sciences Research Council (EPSRC) grant no. EP/M026981/1. We also wish to thank the anonymous reviewers who helped improve the quality of the paper.