Measuring the academic success of students with ASICS using polytomous item response theory

Fully Residential School (FRS) under the Ministry of Education in Malaysia is a type of school, which produces high performance human capital. Important factors in the academic success of high performing students should be focused on to produce more high-performing students to fulfill the needs of the nation. This study aims to apply the Academic Success Inventory for College Students (ASICS) in the context of FRS in Malaysia and measure the instrument quality as well as identify the most important factor in the academic success. The sample comprised 305 students from three FRS. The data was collected using the ASICS instrument, which contained 49 items. Later, the data was analyzed based on the Polytomous Item Response Theory using the Xcalibre software. Based on the Chi-square, p-value, and the -2 LogLikelihood, Samejima's Graded Rating Model was found to be the fit model with the data. Unidimensionality assumption and local independence were tested using the exploratory factor analysis and were fulfilled. The instrument’s reliability was overall very satisfactory (α=0.89) and the construct validity was also fulfilled with the value of 0.86. It was found that most items (91.8%) showed good discrimination. The findings also showed that (i) ASICS is a good instrument to measure academic success among the students as previous studies and (ii) high performing students found to prioritize in aspects such ‘ Perceiving Instructor Efficacy ’ and ‘ Personal Adjustment ’ to achieve their academic success. It means, to develop an excellent future generation, we require an efficient education sector, especially one which focuses on efficient teachers for the 4.0 Industrial Age.


Introduction
*Sekolah Berasrama Penuh (SBP) or known as Fully Residential School (FRS) is a type of school which takes in students who have done extremely well in primary school examinations (UPSR or Penilaian) or secondary school examinations (PT3, PMR, or SRP). The students which have been included into the FRS's learning system can be considered as those who are excellent in academic and co-curriculum aspects. According to Ghani et al. (2013), FRS produces students who have a 'towering personality', a term defined as having excellent skills, knowledge, and moral values. In other words, FRS is a place where high performing students with excellent academic achievement are grouped together at secondary education level and they are expected to achieve excellent results in Pentaksiran Tingkatan 3 (PT3) and Sijil Pelajaran Malaysia (SPM).
The institution that takes in the students through a selection process will apply a variety of methods to filter the candidates. The methods used in these schools include analyzing previous results of centralized examinations, conducting interviews, and providing response for one or more instruments, which measure the desired characteristics. Typically, the selected students with their own characteristics to further drive their academic abilities will also highlight the institution's excellence.
There are a few requirements for measuring the overall and comprehensive academic achievement to assess many academic factors. An instrument known as Academic Success Inventory for College Students or ASICS developed by Prevatt et al. (2011) is utilized to measure factors related to academic success. To measure the quality of the instrument and review the factors, which have contributed towards the students' success, an analysis based on the Polytomous Item Response Theory (Polytomous IRT) by Ostini and Nering (2006) was utilized.

Objectives
This study aims to apply the ASICS instrument in the context of FRS in Malaysia and also to measure the instrument quality and identify important factors in the academic achievement of the high-performing students. Briefly, the study objectives are listed below: i. To assess the reliability and construct validity of instrument. ii. To identify the item discrimination parameter. iii. To assess the theta score pattern towards the constructs.

Population and sampling
As shown in Table 1, there are 69 fully residential secondary schools in Malaysia (KPM, 2017a). According to the KPM (2017b), the enrolment of students from Form 1 to Form 3 (lower secondary) in the whole of Malaysia was 23,238 on 30 June 2017. On the other hand, the enrolment of students from Form 4 and Form 5 was 15,021. Overall, the enrolment of students at secondary level (Form 1 to Form 5) was 38,259 (KPM, 2017b).  (Table 2). Starting in 2014, the Peperiksaan Menengah Rendah (PMR) examinations in Form 3 was replaced with Pentaksiran Tingkatan 3 (PT3). In line with the implementation, the school ranking based on PT3 results was not announced publicly like what was done in previous years. As such, to assess the achievement of a school, this study would be focusing on the school rankings based on the SPM results.
Based on the SPM results for year 2015 and 2016 (Table 2), Negeri Sembilan has three FRSs which were in the top 10 ranking from all FRSs in Malaysia: Kolej Tunku Kurshiah, Sekolah Menengah Sains Tuanku Munawir, and Sekolah Menengah Sains Perempuan Seremban. These three schools are located in the Seremban district, Negeri Sembilan. This shows that these schools share an environment which is conducive for moulding the students towards excellent achievement. Additionally, there is another FRS in Seremban, Sekolah Dato' Abdul Razak which also has a good ranking in year 2015 SPM results. In Negeri Sembilan, the enrolment of students into the fully residential schools was 4724 students with 2798 at the lower secondary level and 1926 at the higher secondary level. This represents 12.34 percent (4724 from 38,259) of the overall enrolment of students in FRS. This can be considered a high percentage as Negeri Sembilan has eight out of 69 FRSs (11.59%) in the whole of country.
As this research took into account the data collection which was conducted at the beginning of the year, it was unsuitable to involve Form 1 students because they had just entered the FRS system. This study also did not take into account students who are in the examination classes in current year (Form 3 and Form 5) as stated by MOE (2018). Therefore, this study is focused on Form 2 students only. During the data collection phase, the researchers sought the permission from four FRSs in Seremban: Kolej Tunku Kurshiah, Sekolah Menengah Sains Tuanku Munawir, Sekolah Menengah Sains Perempuan Seremban, and Sekolah Dato' Abdul Razak. However, Sekolah Menengah Sains Perempuan Seremban later decided to withdraw from the study. The three schools which had agreed to become the respondents were not only categorized as FRS but also High Performing Schools (HPS).
As stated by KPM (2018), HPS is defined as a school which has a unique ethos, character, and identity in all education aspects. Such a school has a high achieving work culture which enables the development of the nation's human capital holistically and sustainably, as well as being competitive in the international arena. These made HPS as the chosen school among the Malaysian public.
The population of Form 2 students in Kolej Tunku Kurshiah, Sekolah Menengah Sains Tuanku Munawir, and Sekolah Dato' Abdul Razak can be seen in Table  3. Due to the constraints related to learning activities at school, the researchers only managed to get a less number of respondents. The sample acquired was 305 respondents from the population of 389 Form 2 students from the three schools. According to Krejcie and Morgan (1970), the sample required for a population of 389 students at 95 percent confidence level is 194 students.
However, in the IRT context, a sample which closely resembles the actual population in terms of numbers is preferred to describe the findings of the study. Furthermore, according to Rivers et al. (2009), by assessing the relationship between latent trait and item properties, IRT effectively controls the differences in latent trait. Therefore, no random sampling is required DeMars (2010) found that a sample size of 300 was required for an item calibration with a polytomous IRT model. In fact, if the sample size was small or less than 300, Guyer and Thompson (2013) explained that the chi-square (χ 2 ) fit statistics used in a polytomous IRT model would always provide statistically insignificant p values. If such a thing happens, it will certainly provide a meaningless interpretation to the analysis results. Therefore, a sample of 305 is considered adequate to make a generalization of the population in this study.

Instrument
ASICS is an instrument copyrighted 2011 by Dr. Frances Prevatt with originally 50 items consisting 10 factors: General Academic Skills, Career Decidedness, Internal Motivation/Confidence, External Motivation/Future, Lack of Anxiety, Concentration, Socializing, Personal Adjustment, Perceived Instructor Efficacy, and External Motivation/Current. Study conducted by Prevatt et al. (2011) found that the factors or construct of General Academic Skills had the highest internal consistency, while the External Motivation/Current showed the lowest internal consistency. They also found that Personal Adjustment, General Academic Skills, Internal Motivation/Confidence, Socializing, and Concentration were the most highly predictive subscales of grade point average (GPA).
To conduct this study, permission was granted to use the measure on May 20, 2014. Although Cohen and Swerdlik (1992) suggested that the construction of instruments involve phases such as planning, construction, testing, and validation, the instruments used in this study are adapted with permission without involving planning and construction phases. However, the expert's confirmation of the items has been obtained to fit the 'climate' of Malaysia.
According to the expert who had evaluated the instrument, one item related to 'drink', which referred to an alcoholic drink had been excluded. This was necessary as the expert viewed that it was not suitable with the culture and environment of the FRS students who were all under 18 years old and of the Muslim faith. As such, the remaining items were 49 items. For each item, the respondents were required to provide responses based on the Likert scale from '1' (strongly disagree) to '7' (strongly agree). The information pertinent to constructs and its items are shown in Table 4. To ensure the quality of the instrument, a pilot study had been conducted on Form 2 students in Sekolah Menengah Sains Tapah, Perak. The pilot study was conducted in order to improve the instrument based on the analysis results, before it was administered to the respondents in the actual study. The pilot results showed that all the items were sufficient in quality and could be utilised for the actual study.

Data analysis
The Kaiser-Meyer-Olkin (KMO) test (Table 5) with a value of 0.85 indicates that the sample is sufficient for the factor analysis test. With Bartlett's test showing a significant chi-square value (χ 2 =10161.02, p<0.05), this meant that the factor analysis test was appropriate and valid to be conducted. Instrument soundness was examined using principal components factor analysis with varimax rotation. Before the data was analysed with an IRT-based software, two assumptions had to be fulfilled. Hishamuddin and Siti Eshah (2016) found that the unidimensionality and local independence assumptions should be tested before conducting an IRT-based analysis. As such, the exploratory factor analysis (EFA) was utilised to test the compatibility of unidimensional structures with the data and subsequently testing the local independence of items.
From Fig. 1, we can see that the first eigenvalue was much greater than the others. Therefore, it suggests that a unidimensional model is reasonable for this data (Hishamuddin and Siti Eshah, 2016). Hambleton et al. (1991) stated that, when unidimensionality assumption is met, then the local independence is also obtained. Since the unidimensionality assumption of the latent trait measured in this study is considered reasonable, therefore the assumption of local independence is also accepted.
Based on the EFA result, analysis output showed that the instrument constructs contributed 61.53% of the variance explained. Hair et al. (2010) stated that the accepted minimum value of total variance explained in the factor analysis is 60 percent. This indicated that the constructs in the study had sufficient construct validity.
The data analysis with -2 LogLikelihood (-2LL) statistics (Table 6) as proposed by de Ayala (2009) showed that the Samejima's Graded Rating Model (SGRM) was more suitable with the data presented as compared to the Graded Rating Scale Model (GRSM). The output was in line with the study by Demirtaşl et al. (2016), which stated that the Graded Rating Model (GRM) showed a better model fit with polytomous data. Therefore, the data analysis was conducted based on the SGRM polytomous IRT model.

Instrument reliability
In research, the value of α>0.7 is frequently referred as the 'cut-off value', 'minimum value', or 'good' for reliability index. However, Taber (2018) found that the value of α≥0.45 is categorized as 'acceptable' or 'sufficient' to prove the reliability or internal consistency of an instrument. Griethuijsen et al. (2015) in their study to measure students' interest towards science in selected countries found a few constructs with α under the value of 0.7 or 0.6. However, this study findings (Table 7) showed that the instrument reliability (α=0.89) was very good and exceeded the minimum value which was often used as the reference in some researches.

Instrument construct validity
In the context of polytomous IRT, the instrument validity could be assessed using item statistics. According to Guyer and Thompson (2013), chisquare statistics comprise an overall index showing how well the response data corresponds to the chosen IRT model. The chi-square statistics could be utilized for both dichotomous and polytomous items. For polytomous items, the chi-square value could be used to show items which do not fit or misfit. A chisquare p value which is less than 0.05 (p<0.05) would mean that the item does not fit the model. In other words, an item which shows a chi-square p value less than 0.05 (p<0.05) is an item which does not measure the construct properly. According to the rule of thumb, if most of the items (more than 70%) fit the model, then the construct validity is very good. In this study, it was found that 42 out of 49 or 85.7 percent (0.86) of items (Table 8) fit the model. As such, it could be stated that the ASICS instrument used for this study had measured what it was supposed to measure very well.

Discrimination parameter
Discrimination parameter in the IRT context is also known as 'a' parameter. According to Baker (2001), the interpretation of the magnitude of discrimination of an item or construct is based on the output of a parameter. Guyer and Thompson (2013) stated that an item with higher discrimination parameter value was considered better than an item with lower discrimination parameter value. This study also looked into the value of a parameter as shown in a study by Mokshein (2018) which had applied the Xcalibre software for the item calibration process. She stated that, the value of discrimination parameter referred to and considered as more practical when it was similar or more than 0.3 (≥0.30 or 0.3 to 1.5). If an item had not been discriminated (a<0.3), then the probability to acquire an accurate response would not increase much in line with the θ increase of respondents (DeMars, 2010).
As shown in Table 9, most items of this study (45 out of 49) or 91.8 percent were discriminated well.

Theta score on constructs
Based on the traditional procedures for calculating ASICS scores, Prevatt et al. (2011) found that important factors in predicting grade point average (GPA) or academic success were Personal Adjustment, General Academic Skills, Internal Motivation / Confidence, Socialization, and Concentration.
However, the score in this study was based on the theta parameter or θ estimate, which can be obtained from the data analysis using Xcalibre software. The θ score described the level or abilities of the respondents towards the measured constructs. Although the respondents could be categorized as high-performing students, by using IRT, the researchers could further categorize them as high, medium, and low ability students.
According to Thompson (2009), the θ score at 0.0 could refer to the value of medium ability at the θ scale. Respondents who had a θ score less than 0.0 could be considered as low ability students while those who had a θ score more than 0.0 could be considered as high ability students (Guyer and Thompson, 2013). Students who categorized as 'low' in the context of this study were actually high achieving students. But, they have less ability depending on the studied constructs as compared to the other respondents in this study.
Based on the theta score for every construct as shown in Fig. 2, it was found that the 'Perceived Instructor Efficacy' and 'Personal Adjustment' were two factors chosen by the high-ability students. This meant that the students categorized as having high abilities had chosen 'Perceived Instructor Efficacy' and 'Personal Adjustment' as important factors, which helped to propel their academic success.

Instrument quality
Originally, Prevatt et al. (2011) in their study had reported the ASICS internal consistency according to each constructs. The results of this study are in line with the study conducted by Prevatt et al. (2011) where General Academic Skills construct show the highest internal consistency, while the External / Current Motivation construct shows the lowest internal consistency.
However, Prevatt et al. (2011) did not report the overall internal consistency value of the instrument. This study found that, overall internal consistency in which ASICS instruments are administered is very high as stated with α=0.89. The value is interpreted as very high based on studies by Griethuijsen et al. (2015) and Taber (2018).
The construct validity for the instrument was also found as very high which 86.0 percent of the items had measured what it was supposed to measure as well as most items (91.8%) showed good discrimination. Prevatt et al. (2011) in their research which based on their ASICS scoring procedures found that Personal Adjustment, General Academic Skills, Internal Motivation / Confidence, Socializing, and Concentration were the most important factors contributed in academic success, it is not necessarily the same factors researchers will get in other research setting. Using the same traditional scoring procedures as introduced by Prevatt et al. (2011), this study found that External Motivation / Future and External Motivation / Current found to be the highest ASICS scored factors and considered as important role in academic success.

Fig. 2: Respondents' theta according to construct
It shows, the contributed factors in academic success were different between the two studies. Furthermore, the factors are all showing a global picture of overall students' performance. That means these factors are pictured by the whole group of respondents without further classifying them into high, moderate, and low ability.
Further analyses with the application of polytomous IRT and the concept of theta scales, this study also found that the highest ability group of students are more prefer to Perceived Instructor Efficacy and Personal Adjustment factors, whereas the lowest ability group of students need more External Motivation/ Future and External Motivation/ Current for their academic success. This study shows where students which at the highest ability among the high performance group of students are more likely to choose Perceived Instructor Efficacy and Personal Adjustment as important factors in their academic success.

Conclusion
The ASICS was found to be a reasonably sound instrument for measuring academic success among students in FRS. Perceived Instructor Efficacy subscale is found as the highest scored construct among the highest performance group of students. This study also found that the highest performance students are personally adjusted by their high performance school surroundings for their academic success.
The findings show that the role of effective teachers is very important in the student's performance and development. It means, with the development in teaching and learning scenarios, the aspect of giving too much empowerment to students in organizing their own studying or managing their own learning should be managed properly. Perception towards teachers' effectiveness are always important for some students. Aspect which should also be focused on is related to the role of the teachers, where they should not remain passive in teaching and become dependent on educational technology although we are in the Industrial Revolution 4.0 (IR4.0) era. It means, to develop an excellent future generation, we still require an efficient education sector, especially one which focuses on efficient teachers for the 4.0 Industrial Age.

Recommendations
The recommendations reached through this research are as followed: Theta by Construct that were applied worldwide especially in the psychometric field, IRT polytomous is a recommended and worthwhile analyses tool that can and should be used to further analyse the quality of psychometric instrument.  In order to develop the success and high performance future generations, developing educational competencies (teachers) for education in the IR4.0 era is a first thing to do as well as creating and maintaining a conducive environment to support teaching and learning.