MODELING OF THE NUMBER OF TUBERCULOSIS CASES IN INDONESIA

One of the health issues listed in the Sustainable Development Goals (SDGs) is to end the tuberculosis epidemic in 2030. Indonesia is the country with the third-highest number of tuberculosis cases in the world after India and China in 2018. Aims of this study to model the number of tuberculosis cases in each province in Indonesia, depending on the characteristics of each region. Geographically Weighted Lasso (GWL) is a method used to overcome the local multicollinearity that appears in the Geographically Weighted Regression (GWR) model. By using this method, each region will have a different regression model according to its respective characteristics. There is local multicollinearity (VIF> 10) in each explanatory variable used. Banten, West Java, South Kalimantan, East Kalimantan, East Nusa Tenggara and Papua Province are provinces where all research variables affect the number of tuberculosis cases. The variable that has the most significant effect on the number of tuberculosis cases in each region in Indonesia is the number of health centers. Therefore, to end the number of tuberculosis cases, the government should increase the number of health centers and improve the health service.


BACKGROUND
One of the health issues listed in the Sustainable Development Goals (SDGs) is to end the tuberculosis epidemic in 2030 . This is because tuberculosis is one of the second-highest causes of death after HIV/ AIDS (Fogel, 2015). World Health Organization (WHO) publication in tuberculosis states that Indonesia is the country with the thirdhighest number of tuberculosis cases in the world after India and China in 2018 (WHO, 2019).
The data distribution of the number of tuberculosis cases in each province in Indonesia has different patterns depending on the characteristics of each region. In 2018, the provinces with the highest number of tuberculosis cases were West Java, Central Java and East Java. Those three provinces are closely located. This indicates the presence of regional dependencies on the number of tuberculosis cases (Firdaus, 2014).
The previous research was conducted through traditional modeling (linear regression, logistic regression and binomial negative regression) to explain the relationship between tuberculosis prevalence and several factors in general. However, this method is unable to explain the spatial heterogeneity (Sun et al., 2015). Therefore, research on the prevalence of tuberculosis taking into account regional factors needs to be done. In addition, three models are best used if the relationship between predictor variables and response variables is not spatially dependent on the region or stationary (Fotheringham et al., 2002).
Geographically Weighted Lasso (GWL) is used to overcome the local multicollinearity that appears in the Geographically Weighted Regression (GWR) model. Geographically Weighted Regression is one method to overcome regional heterogeneity caused by different locations and conditions between regions. By using GWL, each region will have a different regression model according to its respective characteristics.
Previous studies conducted by Wheeler (2009) showed that the estimated error of the GWL model is smaller than the GWR model. Other studies also used the same method in modeling poverty in Java. The results showed that the GWL method is better than the GWR method on spatial data that contains multicollinearity (Setiyorini, 2017). According to those researched, the modeling of the number of tuberculosis cases in Indonesia in this research will be done following the characteristics of each region.

RESEARCH METHOD
The data used in this study are taken from publications published by the Indonesian Ministry of Health and Statistics Indonesia. There are seven explanatory variables in this study. They are the poor population, population density, percentage of households with a per capita floor area < 7,2 m 2 (Indonesia, 2019a(Indonesia, , 2019b(Indonesia, , 2019c, percentage of districts/ cities that have a clean and healthy behavior policy, the percentage of slum households, the proportion of the population aged > 10 years who smoke every day and the number of health centers (Indonesian Ministry of Health, 2018, 2019a, 2019b). The observation unit is 34 provinces in Indonesia. The response variable is the number of tuberculosis cases in Indonesia modeled by seven explanatory variables.

Dependence and Spatial Heterogeneity
The characteristics of spatial data are the presence of dependencies and spatial heterogeneity (Anselin, 1988). Spatial dependence means that there are similarities between observations that are closely located. Spatial dependencies are measured to see weather observations at one location affect observations at other nearby locations. Meanwhile, spatial heterogeneity is a characteristic difference between one region and another (Fotheringham et al., 2002). Spatial dependency was measured using the Moran I coefficient. Meanwhile, to see the presence or absence of spatial heterogeneity with the Breusch-Pagan test (Anselin, 1988).

Geographically Weighted Lasso (GWL)
Geographically Weighted Lasso (GWL) is a technique that uses the Lasso approach in the GWR model. The GWL model depends on the weight used. The weighting function used in this study is the Fixed Exponential Kernel which is written as follows (Fotheringham et al., 2002): the Euclidean distance of the location (( , )) with the location ( , ) and h is the fixed or same bandwidth in all locations.
The selection of optimum bandwidth affects the accuracy of the parameter estimation results. One method that can be used is Cross-Validation which is written as follows: ( .......... (2) where ̂≠ (ℎ) is the estimated value for with bandwidth h. The selection of optimum bandwidth obtained from the iteration process that produces the smallest CV (Fotheringham et al., 2002).
One way to detect the presence of local multicollinearity is to calculate the VIF value which is formulated as follows (Wheeler, 2007): to the residual sum of a square; hence there is a direct correspondence between the parameters s and (Tibshirani, 1996). Efron et al found an algorithm that can solve lasso solutions called the LARS (Least Angle Regression) algorithm (Efron, 2004).

RESULTS AND DISCUSSION
The number of tuberculosis cases in Indonesia was 511.873 cases in 2018. The highest number of tuberculosis cases was West Java Province with 99.398 cases, Central Java with 67.063 cases and East Java with 56.445 cases. The three provinces are located on the island of Java and their locations are close. The following are descriptive statistics of the variables used in this study (Table 1).
From Table 1, the highest standard deviation is variable Y, which means that the number of tuberculosis cases in Indonesia varies greatly. While the lowest standard deviation is the proportion of the population aged > 10 years who smoke every day. This means the proportion of the population aged 10 years and over who currently smoke every day is not too varied in Indonesia. Spatial dependency test results with the Moran Index obtained Moran Index value of 0,7124564 with a p-value of 6,645384 e-6. With a significance level of 5 percent, it can be concluded that there are spatial dependencies in the number of tuberculosis cases in Indonesia Source: Data processed, 2020 In Table 2 we can see that VIF > 10 occurs in all explanatory variables. One way to overcome the presence of local multicollinearity is to use the Geographically Weighted Lasso (GWL) method. Where in this modeling the optimum bandwidth obtained is 43. The estimated coefficient values are shown in Table 3. Modeling with GWL shows that in Banten, West Java, South Kalimantan, East Kalimantan, East Nusa Tenggara and Papua Province all significant variables affect the number of tuberculosis cases. Table 4 shows significant variables affecting the number of tuberculosis cases in other provinces. Among all variables, the number of the health center is a significant variable that always affects the number of tuberculosis cases. The number of health center positively affect the number of tuberculosis case in all provinces in Indonesia. Province Percentage of the poor population, Population density, Percentage of districts/ cities that have a clean and healthy behavior policy, Percentage of slum households, the proportion of the population aged > 10 years who smoke every day and Number of health centers. Nangroe Aceh Darussalam, West Kalimantan, Central Sulawesi and North Maluku.
Percentage of the poor population, Population density, Percentage of districts/ cities that have a clean and healthy behavior policy and Number of health centers.

Bali, Central Kalimantan and West Papua
Percentage of the poor population, Population density, Percentage of districts/ cities that have a clean and healthy behavior policy, the proportion of the population aged > 10 years who smoke every day and Number of health centers.