This is where the aim of the measure needs to be ceded on and stated. The characteristic or construct to be measured, what the measure will be used for, and the target group (population) for the measure will also need to be defined. Once this has been clarified, one can decide how the test scores will affect decisions (or what decisions can be made based on test scores). An important stage In planning is whether the performance Is compared to a criterion or a group norm. In order to define the content of a measure, one needs to have a defined purpose of a measure.
The construct needs to be operationally defined, by undertaking a literature review (research process) of the mall theoretical viewpoints of the construct. The purpose of the measure is clearly vital, as it serves the basis for constructing the measure. In this phase, keying’ is used ? Where information is gathered about the ‘aspects of the construct on which these groups usually differ’. (An Introduction to Psychological Assessment. Foxtrot and Root. APP). E. G. Items are needed to discriminate between individuals, so as to allow the assessor to view the various ‘risk’ groups.
The format and number of each type of item is the next step in the planning phase. The format of the test will vary according to the construct being measured. There are open-ended items (no limits placed on the test-taker), forced- choice Items (Like multiple-choice, where careful decisions are involved), sentence completion Items, essay Items (which test for logical thinking and organizational ablest) and performance Items (generally apparatus Is manipulated by the test- taker). As mentioned previously, the type of construct being measured will directly influence the item type, as will practicalities such as time constraints. . E. If there is a limited amount of time, essay type questions may not be appropriate. While developing the format, the number of items in the construct also play a role, as they also influence the time. Generally, in the process of refinement, many items are discarded, so in the planning phase, most developers create more items than needed. Writing of the items is usually a task completed by a team of experts. (Foxtrot and Root) The purpose of the measure will help to keep focus on content validity. Existing measures, theories, textbooks etc. Provide sources for ideas for test items.
When writing a measure, wording must be clear and concise, so as to allow the est.-taker easy comprehension of the requirements. Negative expressions and ambiguity are never to be used, as they create confusion, and may Invalidate results. The positioning and amount of multiple-choice correct answers, and ratio of true or content should be relevant ? Which is discussed further in content validity. With regards to measuring children related issues, the stimulus material should be attractive and varied ? As they tire easily of mundane, repetitive tasks. Once the items have been developed, it is reviewed and evaluated by a panel of experts. An Introduction to Psychological Assessment. Foxtrot and Root. APP). They will Judge the items on relevance, appropriateness for target group, wording of the items, and the nature of the stimuli. Based on their findings, certain items may need to be re- written or disregarded. (refer to previous mention of refinement in the formatting process). ASSEMBLY AND PRESENTING OF A MEASURE Items need to be rearranged logically ? This would include grouping types of items together. Once the measure is in a logical format, the length of the measure needs to be reassessed.
Time taken by the test-takers to read and comprehend instructions plays a major role in this phase. After consideration, either the time allocated for completion of the measure needs to be increased or decreased, or certain items may need to be discarded. Answer protocol should be predetermined, as it allows for ease of administration. A booklet or answer sheet should be developed in such a way that scoring of the measure, and reproduction of the booklet or answer sheet is an uncomplicated process. Administration instructions need to be developed, and should be clear and unambiguous.
Advice is given to pretest the instructions on a sample of people from the target population. An Introduction to Psychological Assessment. Foxtrot and Root. APP). Training on administration measures may need to be provided, as the misunderstanding of instructions may lead to poor performance on an item. The measure is now ready to administer to a large sample of the target population. Benefits of the pre-test phase include feedback from the test-takers on level of difficulty, ease of comprehension, sequencing of items as well as the length of the measure.
Test-retest reliability: where the same test is administered in two different settings o the same group of people. This technique itself is not entirely reliable, as personal and environmental factors may change for the test-takers, and they may recall answers ? Which invalidates the second test results. B. Alternate-form reliability: where two similar tests on the same content are administered. The correlation between the two sets of scores would show a reliability coefficient. The limitations of this form are expense, time and creation of an equivalent, yet different test.
According to Foxtrot and Root, and the author of this paper, this type is not easy to instruct, and therefore is not recommended. Split-half reliability: one test is administered in one setting, and divided equally to obtain two scores. The most common method of splitting a measure is to calculate the correlation between odd and even numbered items. This gives a result on half a measure, and is assumed to be an ‘underestimation of the reliability coefficient’. (An Introduction to Psychological Assessment. Foxtrot and Root. APP). The Superman-Brown formula (art) is said to be the corrected reliability coefficient.
Judder-Richardson and the Coefficient Alpha: this s based on the response for the total measure. A score of ‘1 ‘ is given for correct answers, while ‘O’ given for incorrect answers. The KERR formula is used. This type of reliability can only measure performance indicators, as personality scales cannot be correct or incorrect. Coronary developed a Coefficient Alpha reliability measure for such scales. E. Inter-scorer reliability: two assessment practitioners score the tests separately, and the correlation of the two sets of scores is the inter-scorer reliability coefficient.
This method may have a high risk of contamination if the practitioners now either the test- taker or the previous scores, and a ‘halo-error’ may occur. Certain measures are affected by time (absence or presence of a time limit); age; gender and ability levels. Correlation between scores needs to be computed separately for homogeneous groups to obtain a more reliable coefficient. Validity determines how well a measure measures the construct. There are three aspects or types of validity: a. Content validity: which determines whether the content of the measure covers a representative sample of the behavior being measured.
It is non- tactical, and a panel of subject experts (Foxtrot and Root) evaluate items during the assembly phase. This measure is best applied to evaluation of achievement and occupational measures. In content validity, face validity is an important aspect for the test-taker. Face validity has nothing to do with the construct, but rather shows the construct appearing to be valid. B. Criterion-prediction validation: a quantitative procedure that involves the calculation of the correlation coefficient between a predictor and a set criterion.
Housemen and Simi defined two types of criterion- elated validity: Concurrent validity: the accuracy with which a measure can identify current behavior regarding skills of an individual. Predictive validity: the accuracy with which a measure can predict future behavior of an individual. The measure will dictate which type of validity is used. Validity generalization, where specific skills are assessed, and meta-analysis (research of literature on a specific topic) are measure, cross-validation of scores should be implemented, so as to obtain a realistic sense of validity coefficients.
Construct-identification validity: is the extent to which a assure ‘measures the theoretical construct or trait it is supposed to measure’. (An Introduction to Psychological Assessment. Foxtrot and Root. APP). Correlation between tests is expected to be high, but not too high, as they will merely be a duplication of the previous measures. Factor analysis is used to analyses the correlation of variables. The factorial validity of a measure refers to the factors measured by the measure. It is best to measure small numbers of variables ? So as to create specialized measures.
A measure should correlate highly with other similar rabbles, and should correlate minimally with irrelevant variables. A validity coefficient ( r) should be considered significant at 0. 05 and 0. 01 levels according to Housemen (An Introduction to Psychological Assessment. Foxtrot and Root. ) There are many variables which affect validity coefficients ? The nature of the group, the relationship between the predictor and the criterion, the proportion of validity to a measure’s reliability, criterion contamination, as well as moderator variables (e. G. Personality); and therefore allow for error in validity coefficients.
There is a formula or correcting and estimating standard error, and this allows for acceptance of minor outliers. Norms are measurements of a group, against which an individual’s score can be evaluated. The choice of norm score for a new measurement will depend on the developer’s preference. The most commonly used norm scales are: a. Mental age scales: where the highest and lowest age at which a measure is passed is calculated, and called the basal age. A child’s mental age combines the basal age, plus any additional months of credit at higher age levels. Chronological age is irrelevant. . G. Reading age is calculated according to mental age. B. Grade equivalents:scores are interpreted in terms of the average scores in a specific area of e. G. Learning, for a grade. A child may have a reading level equivalent of grade 3, while they are currently in grade 7, and their mathematical abilities could differ from their current grade as well. Percentiles: scores are divided into quartiles. E. G. IQ is the first quartile, which represents the lower 25th percentile of scores. This is not the lowest 25 marks, but the bottom percentile of scores. (I. E.