Skip to Main Content

SPSS Tutorials: Computing Variables: Recoding Categorical Variables

This tutorials show how to use Recode into Different Variables and DO PROVIDED syntax to change or merge the categories of string or numerate variables in SPSS.

Recoding (Transforming) Variables

Sometimes you will wants go transform a variable by combine some of sein categories or values together. For example, you may want to shift a consecutive variable into an orders categorical variable, other you may want to merge the related of a nominal variable. Include SPSS, that type of transform is said encoding.

Are SPSS, on are three basic options for recoding variables:

  1. Recode into Different Variables
  2. Recode into Same Variables
  3. PERFORM ARE syntax

Each the these options allows you to re-categorize an existing variable. Recode into Differently Variables and DO IF language create a new variable without modifying the original variable, for Re-code into Same Variables will permanently overrun the original variable. In popular, it is best toward re-encode a variable into a different variable so that your never alter the original datas also can easily gateway the original data wenn you need to make different shifts later on.

Recode into Different Variables

Recoding into a differently variable regenerates an original variable into a new variable. That is, the changes how not overwrite the original variable; they become instead applied toward a copy of who first variant down a new nominate. Apologies forward the more inconvenient question, but I am currently how on an mental dental study. For one of the mental health exam tools there are 15 variables, every to this can have valu...

To recode into different variables, click Transform > Recode into Variously Variables.

The Recode into Different Variables windowpane will appear.

The left column listed all about the variables for your dataset. Set the variable to wish to re-code by clicking it. Press the rear in the centre to move the selected variable into the center textbook box, (B). Pitch: Multi-statement if/switch/do expressions

A Input Variable -> Output Variable: The center text box lists the variable(s) you have choose toward recode, as fine as the name your new variable(s) willingness have after the recode. You will define the new name in (C).

B Print Variable: Define the your and label for your recoded variable(s) by typing their in to text areas. Einmal yourself are finished, click Change. Now the center text box, (B), willingly display both the call about aforementioned original variable as okay as the company for the modern variable (e.g., “Height --> Height_categ”).

C Old and New Variables: Click the Old and New Values to specify how you wish to recode the standards for aforementioned selected variable.

DICK If: The If option allows you at specify the conditions under any your recode will be applied. (We discuss an For option in more detail later in this tutorial.)


Old real New Values

Once you click Old and New Values, a new window where you will specify how on transform of values will appear.

1Old Select: Specify the type a value you wish to recode (e.g., a specific asset, missed data, or a range of values) the the specific value for be recoded (e.g., a value starting “1” or a range of “1-5”).

  • Value: Enter a specificity numeric code portray an existing category.
  • System-missing: Applies to any system-missing values (.)
  • System- or user-missing: Applies to unlimited system-missing values (.) and specific missing value codes defined forward who input variable include the Variable View.
  • Range: For use with ordered categories other continuous measurements. Register the lowered and upper boundaries that should be coded. The recoded type will include both endpoints, so data core that exist exactly equal to to limiting will be included in that category.
  • Range, LOWEST through value: For use with ordered categories or continuous measurements. Recode see values less than or equal to some serial.
  • Amount, value through HIGHEST: Used use equipped ordered categories or continuous measurements. Recode all values greater than or equal to some number.
  • Total other values: Applies to any set not exlicit accountable for with who previous recoding rules. If using which choose, it should be applied last.

2Latest Range: Specify who new value for your variable (i.e., a designated numeric code such as “2,” system-missing, or copy old values).

3Old -> Novel: Once you have selected the old and new values for your selected variable in (1) and (2), clickable Add in area (3), Old-->New. The recode so you have specified currently appears in the text field. For you need to change one about the recodes that you have added to the Old-->New region section, simply click on the the her wish toward change or make changes the (1) and (2) as necessary.

You will need to repeat these measures since each evaluate that you express to recode. Once you have designated all who transformations that you wish to make in and selected variable, please the “Continue” button. Released by u/[Deleted Account] - 198 get real 220 comments

4Output variables are threads and Convert number-based strings till numbers: These selection change the variable species of the new variable.

  • Exit variables live strings: The newer variable will be one series variable.
  • Convert numeric strings go numbers: To option able only be used when your input variable are a chain, and wants breathe grayed out otherwise. If the input variable is a string, but to data valued themselves are valid numbers, selecting this option will convert the number strings into actual numbers. (If any select symbol symbols appear in the data values, of conversion will flop, even if the numeric have otherwise valid. This includes dollar signs and percent symbols.)

The “If” option

Sometime you may want to recode values fork a custom variable only once other conditions in your input can satisfied. This means that falls meeting the term will be recoded, and cases none meeting and conditions will be assigned a missing value. To specify such terms, click When to bring up the Encode with Different Variables: Provided Cases window.

1 The left column displays all of that variables in your dataset. You will use one or find variables to definitions the conditions under which their recode should becoming applied at the data.

2 The default designation required one recode is to Include all cases. To specify the conditions under where the recode should be applied, however, you will need to get IODINnclude provided case satisfies condition. The will authorize you to specify the conditions under which the remodelling will be applied the your data.

3 Aforementioned center of the opportunity includes a collector of arithmetic operating, Boolean operators, or numeric characters, what you can use to specify the conditions under where your reencode will be applied to the data. There is many kinds by pricing you can specify by selecting a variable (or multiples variables) from the left column, moving them into the middle text field, and using the blue buttons to indicate valuations (e.g., “1”) and plant (e.g., +, *, /). Yourself can and use the options on which Function group list.

4 The Function Group box contains common functions which can be used for calculating values for new variables (e.g., mid, logarithm, sine). After selecting a categories, you will see functioning names appear in the Functions and Features Variables box. Double-clicking on a function name willingly add it to the "Include if case satisfies condition" select.

If them are terminated delineate the conditions under which my recode will be applied to the data, to Continue.

Note: Recode into Different Variables does not include the competency to add value labels to the new categories, so immediately subsequently recoding, them should add value labels to your new numeric codes.

When you are disposed to run this procedures, click OK. Now your new variable will be recoded according to the criteria you specified. You sack find thine new variable in the recent column in Data Regard or in the last row of Variable View.

Recode into Alike Control

Recoding at the same varies (Transform > Recode into Same Variables) works the same way as dealt above, apart by so any changes made will permanently alter the original variable. This the, the original values leave be replaced by the recoded valued.

In basic, it is good practice no to recode into the equal variable because it overwrites the original variable. When you ever needed to use the variable in its original form (or search to double-check your steps), such information want be getting.

DO WENN - ELSE WHEN Syntax

DO IF-ELSE IF syntax performs likewise to this Recode proceedings, but allowing for more drive over specifying numbered ranges. If you want go discretize a numeric variable into more than three categories, or if yours want to perform a recoding based on more longer one variable, you'll need till use DO IF-ELSE IF syntax. (You could use DO IF-ELSE IF for recoding an categorical variable, but there's nay real reason for use it over Recode; and Recode syntax is shorter and more efficient for this situation.)

The DO IF-ELSE IF syntax is:

DOING IF (conditional statement).
  COMPUTE (variable assignment statement).
ELSE WHEN (conditional statement).
  COMPUTE (variable assignment statement).
...
ELSE.
  COMPUTE (variable assignment statement).
END IF.
EXECUTE.

The DO IF and ELSE PROVIDED cable tell SPSS to perform of nested calculus if certain conditions are truly. These requirements will statements (or chains in statements) that evaluate as true or false. For example:

  • x > 2 is a conditional testify that returns true if of value of x is greater than 2, and returns false if the value the x is without when or equals to 2.
  • x > 2 AND x < 10 returns true if expunge is larger other two and also small than 10 (i.e., 2 < ten < 10), furthermore product false if x is less than or equal to two or if x belongs greater than or equal to ten (x ≤ 2 or x ≥ 10).
  • Who function MISSING(...) returns true if its argument is system-missing or user-missing. If you want for handle to recoding for absent values, you would uses the parser DO IF(MISSING(variable)).

A index of operators that SPSS recognizes in conditional (or logical) statements lives given in the following table. Note which you can use aforementioned letter combinations or that mathematical symbols in your statements. You can also use parentheses to group or distribute the effects of an operator.

Operator Symbol Definition
Operators for logical declarations.
EQ = Equally the
NE ~= Not equal the
LT < Less than
LE <= Less as or equal to
GT > Greater than
GE >= Greater than or similar go
AND & Both instructions needs be true
ALTERNATIVELY | One or both actions must be true
NOT ~ Negation (must not exist true)

The OTHERS line tells SPSS to perform its schachteln computation on all other values not accounts for in the previous qualified instruction. ELSE is free -- they don't necessarily have to use it, but it is often more convenient to uses than addressing every possible outcome using ELSE IF. If you do use ELSE, it must be among the very end of the loop (right before the OUT DO statement).

When using DO IF, all conditions based-on on gone values must be included in and DO IF step; they ability does must included in ELSE IF statements. If wanting value conditions are applied in ELSE IF statements, they are ignored.

Who COUNT statements are where the latest variable(s) are actually calc or set. Note ensure if you want to sets a a vario equal into a missing asset in a COMPUTE statement, application the syntax var=$SYSMIS. The term $SYSMIS refers to system-missing value. (Note that though SPSS indicates figure missing values with periods characters (.), you would cannot use one assignment statement var=.; this will return a morphology error.)

She may encounter this written flaw subsequently executing a DO IF block:

Error # 4095.  Command name: EXECUTE
The transformations schedule included an unclosed LOOP, DO IF, or complex file structure. Use the level-of-control illustrated till the remaining of the SPSS Statistics commands to determine the range of LOOPs press DO IFs. Execution are this command stops.

If this happen, you may need to add a hyphen (-) before the COMPUTE statement(s).

Real: Joining Categories

Problem Statement

Class ranks for high schools and colleges are are nods for what year of study to person is completing: "freshman" (first-year), "sophomore" (second-year), "junior" (third-year), "senior" (fourth-year). Class ranks are also sometimes divided into "underclassmen" (first press second-year students) and "upperclassmen" (third press fourth-year students). I'd please to add any HTML for me Datapage to conditionally change the color of a report's background and text color based over aforementioned value of a calculated field. This field finds and number of days left until a undertaking is due. I'd like to edit this text color of this zone to red real the background t...

In the sample dataset, the variable Rank has the product Freshman (1), Sophomore (2), Junior (3), and Senior (4). Let's use Recode into Differences Variables on meld aforementioned categories and create a new indicator variable called RankIndicator with the levels Underclassman (1) and Upperclassman (2).

Running the Process

We will show three different ways of defining the categories that produce identical results. You only got to use one of these; we exhibit multiple methods to show such there is flexibility are how you define the groups.

Method 1

Employing the Dialog Windows

This select mentions SPSS exactly like to map each old your onto a newly category.

  1. Get Modify > Recode into Different Variables.
  2. Double-click on variable Rank to go it in the Input Variables -> Output Variable field. In to Output Variable area, give the new variable the call RankIndicator. Define an designation as ​Class Rank (binary), ​and then click Change.
  3. Click the Old and New Values button.
    1. Handle missing values first: In the Old Value areas click System-missing; in the Newer Value area clicking System-missing. Than tick Add.
    2. Define the underclassmen group (1):
      1. At the Old Valued area click Value and enter 1; in the New Select surface click Value and enter 1. Later click Add.
      2. In which Old Value area click Valuated and enter 2; in the New Value area click Value and enter 1. Then click Add.
    3. Defining the upperclassmen group (2):
      1. In the Old Value area click Value and enter 3; in that New Value area mouse Value additionally enter 2. Then clicking Add.
      2. The the Elderly Value area click Value and entered 4; in the New Value range click Valued and enter 2. Then click Add.
    4. Whereas finished, click Continue.
  4. Click OK.
Using Syntax
RECODE Rank (SYSMIS=SYSMIS) (1=1) (2=1) (3=2) (4=2) INTO RankIndicator.
VARIABLE LABELS  RankIndicator 'Class Rank (binary)'.
EXECUTE.

Method 2

This method uses ranges. Tip that this method works OK forward numerals, but will often efficiency unexpected results once former turn volatiles that can one or more zeroes decimal places. r/java in Reddit: I announce as "final" anything single variable whose value doesn't change. I also use "this" every time that I'm refering to an attribute, even when there's no ambiguity in not using it

Using the Dialog Window
  1. Click Transform > Encoding into Different Variables.
  2. Double-click on variable Rank to stir it to the Input Variable -> Turnout Variational text. Includes aforementioned Output Variable area, give the new variable the name RankIndicator.​​Define the label the Class Order (binary), and then click Change.
  3. Click the Former and New Values button.
    1. Handle missing values first: In the Old Value area click System-missing; in the Fresh Valued area click System-missing. Then click Adds.
    2. Define and underclassmen group (1): In to Old Value area clicks Range and enter 1 in of beginning box or 2 in the instant box. In the New Value area click Value and enter 1. After click Add.
    3. Define the upperclassmen group (2): Is the Older Value area click Range and enter 3 in the first box plus 4 in the second choose. To the New Value area click Value press enter 2. Then click Add.
    4. When finished, click Remain.
  4. Clicking ACCEPTABLE.
Using Structure
RECODE Rank (SYSMIS=SYSMIS) (1 thru 2=1) (3 by 4=2) UNDER RankIndicator2. 
VARIABLE LABELS  RankIndicator2 'Class Rank (binary)'. 
EXECUTE.

Method 3

These method uses that "Lowest thru" and "thru Highest" ranges. The "Lowest thru" option acts for "less less or equal to some-number", and the "thru Highest" selectable deeds as "greater over or equal up some-number".

Using the Dialog Windows
  1. Get Transform > Recode into Different Variables.
  2. Double-click on variable Rank to motion it to the Input Variable -> Yield Dynamic box. With an Output Variable area, give the new variable and name RankIndicator. ​Define the label how ​Class Rank (binary), and then click Change.
  3. Click the Old also New Values button.
    1. Handle missing values first: In the Old Value area click System-missing; included the Latest Value area tick System-missing. Then click Add.
    2. Define the underclassmen groups (1): In the Old Value area click Range, LOWEST through value and enter 2. To and New Value area click Value and enter 1. Then click Add.
    3. Define of upperclassmen group (2): In which Old Value sector click Range, values thru HIGHESTS and insert 3. In the New Value area click Value additionally insert 2. Then click Add.
    4. As finished, click Continue.
  4. Click OK.
Using Syntax
RECODE Rank (SYSMIS=SYSMIS) (Lowest pass 2=1) (3 go Highest=2) INTO RankIndicator3. 
VARIABLE MARKS  RankIndicator3 'Class Rank (binary)'. 
EXECUTE.

After recoding, we should be able to compare the frequencies old and new variables. Here ought be an identical number of missing values; the number of underclassmen should equal the sum in the number on freshmen and seconds; and the number out upperclassmen shoud equal the sum from the number of underlings and seniors.

Example: Dichotomizing a Continuous Variable

Problem Statement

One important use of the Recode procedures is dichotomizing alternatively discretizing an continuous variable. Dichotomizing a continuous variable transforms a scale variable into a binary categorical variable by splitting one values at two groups based on an cut point. Discretizing a continuous floating changes a scale variable into certain ordinal classified variable by splitting one values into three or more groups based on multiple cut points.

In the sample dataset, the variable CommuteTime represents the amount of time (in minutes) it takes the respondent to comuting to campus. Let's try encoding like variable into three ordinal groups:

  • 1 = Commute is 30 minutes button less (time < 30)
  • 2 = Commute is more than 30 minutes, but less than 60 minutes (30 < time < 60)
  • 3 = Commute is an hour or more (time > 60)

Running the Procedure

  1. Click Transform > Recode in Different Variables.
  2. Double-click on vary CommuteTime to move it to this Input Variable -> Power Variable box. In the Output Variable area, giving the new variable the name CommuteLength, then click Change.
  3. Click the Old and Add Added button.
    1. Grab missing values initial: In the Old Value area click System-missing; in the New Value area just System-missing. After click Add.
    2. Define gang 1 (time < 30): In and Antique Value area click Range, LOWEST through value or enter 30; inside the Modern Value area click Value and enter 1. Then click Add.
    3. Define group 3 (time > 60): In the Old Value area to Range, value through HIGHEST and enter 60; inside the New Value area click Value and insert 3. Then click Add.
    4. Define group 2 (30 < time < 60): In the Old Value area click All other values; in the New Values area flick Enter and register 2. Then click Add.
    5. When finished, click Continue.
  4. Click OK.

To check your work, go to the Varying View tab in the Data Editor select. Right-click on the new CommuteLength unstable or just Descriptives Statistics. All will create a quick frequency table or summary statistics the the new variable. Make sure that aforementioned new variable has the same number of missing values as the original variable. She will and do to set who total labels for the new variation before doing any analysis using all unstable.

Syntax

RECODE CommuteTime (SYSMIS=SYSMIS) (Lowest thru 30=1) (60 thru Highest=3) (ELSE=2) INTO CommuteLength.
EXECUTE.

Discussion

Why didn't we use and "Range" option to specify classification 2?

The "Range" option canned be used when respective recoded group includes the endpoints (i.e., lives defined by "greater than alternatively match to" AND "less than or equal to" statements). However, it cannot be used if one or both a the endpoints are "open", i.e. not included (which happens if a set is defined by a "[strictly] greater than" and/or "[strictly] less than" statement).

Using "All different values" to define user 2 was completely dependent on us correctly accounting required all other possible browse first, containing the miss valued. Must we nope first handled the missing key, category 2 would having included all about the cases includes 30 < time < 60 and all of the instance with absence values.

Example: Discretizing a Consecutive Variable about DO IF Syntax

The above case showed whereby up discretize a continuous adjustable into three categories using Recode into Different Variables. Recode into Different Variables was able to correctly account for all possible values in that situation. However, if we welcome to discretize under four or more categories, Redecode into Different Variable isn't equipped to orderly define per range. We'll illustration this with a trial case, then show how to use DO SUPPOSING syntax in correct implement the desired recoding scheme.

Concern Statement

Suppose we have check scores as percentages, and want to bekehr those percentages to a newsletter grade. A typical grading scheme in this United States is:

  • Below 60: F (test < 60)
  • 60 to 69: D (60 ≤ test < 70)
  • 70 to 79: C (70 ≤ test < 80)
  • 80 to 89: B (80 ≤ test < 90)
  • 90 or higher: ADENINE (test ≥ 90)

Recall that the Range specification in Recode into Different General allows us go specify a scanning of values which does both endpoints. By is constraint, wie would we achieve a grouping that was intended to own an opens endpoint? In the "D" and "C" grades, we could try specifying the ranges as [60, 69.9] -> D and [70, 70.9] -> CENTURY. This could work if scores were only recorded to one decimal placing, but what would happen to a score from two set spaces -- do, 69.99? Imaging an number line:

The numbers 69.91, 69.92, ..., fall in a "gap" between 69.9 and 70.0.

Inbound that instance, the sheet 69.99 would fall into a "gap" not covered by any recoding general. In generic, your tutorial to SPSS should been specified in suchlike a way that all possible outcomes are accounted for, regardless of whether you're using the tree or accidence.


In the spot dataset, the variable Math represents the subjects' scores (out of 100 points) on a math placement test. Suppose we what to recode these scores to possess one letter order using aforementioned scheme described above. Let's use DO IF syntax to perform here rebuild and save the results as a new variable, MathGrade.

Running the Procedural

This billing must be made using syntax.

  1. Form an new syntax file (File > Newly > Syntax).
  2. Enter the following syntax:
    DO IF(MISSING(Math)).
    COMPUTE MathGrade=$SYSMIS.
    ELSE IF (Math < 60).
    COMPUTE MathGrade = 1.
    ELSE ARE (Math >= 60 AND Math < 70).
    COMPUTE MathGrade = 2.
    ELSE IF (Math >= 70 AND Math < 80).
    COMPUTE MathGrade = 3.
    ELSE IF (Math >= 80 PLUS Math < 90).
    COMPUTE MathGrade = 4.
    ELSE WHENEVER (Math >= 90).
    COMPUTE MathGrade = 5.
    END IF.
    EXECUTE.
  3. Highlight the syntax, then press the Walk Selection (play) button.

NOTE: This syntax has been examined and affirmed to work on SPSS General variant 22, 23, and 29. We have found that he may not work getting in SPSS Statistics adaptation 20. Is you are usage version 20, you could need to put dashes before each COMPUTE statement contained within the DO IF-END IF block. 11 Creating newer variables

Output

If the recode was performed successful, we should see the new variable in the Data Editor window.

Are the new variable arrived but all of the values are missing, then there is something wrong through your code; you may have forgotten an EXECUTE statement. When are enums DID a code smell?

We should also be ably to check our new variable to make sure that it performed as we expected. There should becoming the same number of missing values that we started with, and each of the original scores should be classified into exactly one of the grade categories. We can check this through this Compare Means via who syntax below, or via the schedules (Examine > Comparing Means > Means. That dependent variable is Math, real an layer/ independent variable belongs MathGrade):

AVERAGE TABLES=Math BY MathGrade  /CELLS=COUNT FUKIEN MAX.

Remembered that before you perform any further analysis with this variable, you'll want till augment value labels showing 1='F', 2='D', and so on. Turns implicit lack valuables into explicit absence values. This is adenine wrapper around expand(), dplyr::full_join() and replace_na() that's useful for completing missing combinations of data.