Friday, February 26, 2010

Downloading Your Existing Dataset

So the professor has approved your study. You're proposing to download an existing dataset from the National Archive of Criminal Justice Data. You're all ready to get started, and... what do you do?

If you did the SPSS download & convert homework, it's basically the same steps. First up, it will be helpful to work in a computer that has SPSS installed. (See point #2 below re: links to free or small-fee license for SPSS, which Sydney [thanks, Sydney!] found.)

(1) Download your dataset using the same steps that were outlined in the SPSS homework. Go through the steps and prompts that it gives you at the NACJD website. Download them into a file with a short path name (the conversion will work better that way, for some reason), for instance c:\Seminar\ and then download your files there. It will likely give you more files than you need (some for SAS software, something with ".por" at the end maybe). You need the setup/syntax file with ".sps" at the end, I believe.

(2) To complete the conversation of the ".txt" file into an SPSS database, you'll need to be working in a computer that has SPSS. You can either get a free or minor-fee version of it downloaded into your home PC or laptop, or save your files from NACJD onto a memory stick, and bring that to the OCC computer lab.

Here are the links to SPSS downloads, either free or for a minimal fee for a six-month license. Note: I haven't tried either of these, so I can't vouch for them. That said, give them a try.

(3) Open SPSS. Then go into File-Open, and you want to click the drop-down arrow so you can Open Syntax. If the correct directory doesn't come up, find your C:\Seminar\ directory (or whatever you've called it), and the SPSS syntax file should be there. Open that up.

(4) The next step is the replace the words "physical-filename" (in DATA LIST FILE="physical-filename" / ) with the full path name location and name of the data file. NOTE: Make sure you're in the top (not the bottom) of the syntax file. For some reason, the same language (with "physical-filename") is also located at the bottom of the syntax file. That's not the one you want. You want the one at the top of the syntax file.

(5) Minimize SPSS and open up MyComputer. Find your C:\Seminar\ folder (or whatever you've called it). Now you want to right-click (probably) over the ".txt" file that is the pre-conversion SPSS file, and from Properties get the location ("C:\Seminar\ ") and the file name (something like "08381-0002-Data.txt") of the ".txt" file. Next, back in the SPSS syntax file, the location and file name get copy/pasted in between the quotes to replace "physical-filename." You may have to do this in two steps (first to copy/paste over "C:\Seminar\" and next to copy/paste over the name of the .txt file.) I know it sounds confusing when I write it out like this, but if you were able to successfully complete the SPSS homework, this is basically the same process. To complete this step, you'll be going back and forth between MyComputer and the SPSS syntax file.

(6) Once you've got the location and file name pasted into the syntax, click the "run all" button. An SPSS output window should pop up. In the lower right corner of either the output window or the SPSS database window, you should see the words "transformations pending." Click on "Tranformations" or "Tranform" and then "Run all" (something like that).

If you DON'T see "transformations pending" in the SPSS or output window (some people do, some people don't), then go back into the syntax file and click Transform-Run Pending Transformations (something like that). In either case, the next step is to run the pending transformations, which should upload the data into SPSS. (Before you complete the transformations, all you have are the variables in a blank SPSS database skeleton. You need the data, too.) This process gets completed in either the SPSS or output window, or the syntax window.

(7) The SPSS data should now be ready to work in.

(8) Save the database under a new name. Remember the location of your database. (On your memory stick? On your home computer or laptop? Don't save it on the OCC lab computers.) Remember that for when the class goes to the computer lab in late March, you'll need to transport your data on memory stick, so we can work on it together there. Everyone gets one-on-one data analysis help with Dr. Hassett-Walker in March.

(9) Next, start locating and renaming your variables. The variables will probably have unhelpful acronyms like ACB8xy2, which really means "Gender." So rename it as "gender." I also find it helpful to fill in the labels for my chosen variables, if SPSS didn't do that for you in the conversion.

(10) Your database may have 892 variables (or whatever in it). You want to select up to 15 (and fewer, eight or ten or twelve, is fine). Locate your chosen variables, and run some frequencies. Look at the results. Any missing data? Any data coded for "did not respond"? Things like that will need to be filtered out. We may also want to cut out some of the variables that you're not using (i.e., make the database smaller), which will make it run faster.

(11) When you've gotten through all this, email me (if you want to at that point email me your dataset and frequencies, that's fine) and I can talk you through some next steps. We'll also do the next steps at the lab in late March.