Econ 424: Computer Methods in Economics

Spring 2008

Session 1: Tuesday & Thursday 8:00am-9:15am
Session 2: Tuesday & Thursday 9:30am-10:45am
Plant Sciences 1129

Instructor:
Amy Knaup (Session 1)
Ginger Zhe Jin (Session 2)

All grades are available online.


Contact and office hours

In any case, the best way to reach us is via e-mail.
 
Session 1 Session 2
Amy Knaup
Office: 5110 Tydings
Office Hours:  Monday 11am-12pm
Phone: 301-405-3526
E-mail: knaup@econ.umd.edu
Ginger Z. Jin
Office: 3115 H Tydings
Office Hours: Wednesday 12:30-1:30pm
Phone: 301-405-3484
E-mail: jin@econ.umd.edu
Web: www.glue.umd.edu/~ginger/

Comments and suggestions for the class design are always welcome.



Syllabus
 

Goal
 

As a first step to hi-tech economics, Econ 424 introduces the most basic data handling techniques in economic studies. The ultimate goal is three-fold. At the end of the semester:
In order to fulfill this goal, all classes, including mid-term and final, will meet in a computer lab and use two popular statistical softwares -- Excel and SAS. In addition, students will learn how to use the World Wide Web and how to complete computer projects. Through hands-on experience, students are expected to master both softwares at the introductory level and apply them to economic issues in the real world.


Prerequisites
 

Eligible Students must major in Economics and have completed Econ 305 (Intermediate Macroeconomics Theory and Policy), Econ 306 (Intermediate Microeconomics Theory) and Econ 321 (Economic Statistics).  we will devote a couple of classes to introduce Excel and SAS, so experience with either software is not required. However, if you need extra help in getting started, please contact us as soon as possible.


Waiting List Policy
 

Due to the limit of lab capacity, each session can only accomodate 36 students. If you are number x on the waiting list, you won't get in the class unless x enrolled students drop the class during the semester.


Recommended Textbooks
 

#1 --- Malcolm Getz, "e.stat for Business and Economics" (CD-ROM), published by Southern-Western, a division of Thompson-Learning. ISBN: 0-324-00895-3.
#2 --- Lora D. Delwiche and Susan J. Slaughter: "The little SAS book: A Primer," third edition (paperback). ISBN: 1-59047-333-7.
#3 --- Ron Cody & Ray Pass, "SAS Programming by Examples", published by SAS Institute Inc.  ISBN: 1-55533-681-7. 
All books are available at Amazon. We won't use SAS books until after midterm.


Evaluation
 

Grades for the course will be based on


Class attendance

Hands-on teaching is much more effective than remote communication by emails. If you miss a class, you can download the lecture notes or consult your classmates. If you still have questions after reading the lecture notes, you are welcome to contact us via email or in person. Please don't expect the instructor or the teaching assistant to re-lecture every point covered in the missed class.



Exams

The course involves one mid-term and one final (grading weights in parentheses):

(30) Mid-term (new schedule!): Session 1 (March 27, 8-9:15am) Session 2(March 27, 9:30-10:45am), open book, on-line in Plant Sciences 1129 (the same room for every class meeting).
Old mid-term examples:
Spring 2003: Exam text   DataAnswer Key
Spring 2002:  Exam text    Data Answer Key
Fall 2001: Exam text    DataAnswer key
Spring 2001:  Exam text and data   Answer Key
(30) Final: cumulative, Session 1 (May 20, 10:30am-12:30pm) Session 2 (May 16, 8am-10am), open book, on-line in Plant Sciences 1129 (the same room for every class meeting).
Practice Final:
Hardcopy handout (.doc)
Excel data set (.xls)
SAS data set 1(.csv)
SAS data set 2(.csv)
SAS program (.sas)
Answer Key (Corrected!)
Old final examples:
Fall 2001:
Hardcopy handout (.doc)
Excel data set (.xls)
SAS data set (.sas)
SAS program (.lst)
Answer Key
Spring 2001:
Hardcopy handout (.doc)
Excel data set (.xls)
SAS data set (.csv)
SAS program (.sas)
SAS output (.lst)
Answer Key

Attention: Grades will be posted online as econ424-spring2008-publicgrade.xls. The same link is provided at the beginning of the syllabus. The file is to be listed by a class id, and class id will be assigned to every one in the first class. Please remember your class id so you can check your grades anytime online.

If you are going to miss the midterm or the final for a legitimate reason (following university policy), please notify the instructor AT LEAST 12 HOURS IN ADVANCE. Any excuse delivered after the exam is invalid and will result in zero test score.


If you miss the midterm for a legitimate reason, you have two choices: either take a makeup midterm, or skip the midterm and allow your final to automatically account for 60% of the course grade.




Projects

Each student will complete six projects during the course. They fall into two categories, one requiring you to choose an interesting topic and the other based on a given topic. In either case, the quality of presentation matters, treat it as though you were giving it to your new employer. It is not necessarily fancy, but must be clear, right to the point and well explained.  Let your classmate critique your project before you hand it in (examples of critiques). You may revise your project in light of your classmate's critique before submitting it.

You are required to submit projects by email to both Ginger Jin and Amy Knaup simultaneously (so that we have a back up in case something goes wrong), with your name and project number in the subject.  For the files attached to the email, please name them following the convention of yourlastname_yourfirstname_project_number.

For example, if your name is Joe Smith, your first project should be named Smith_Joe_project_1.* where * denotes the file's extension name. Any project report won't be fully considered unless it is submitted by deadline. Should a submission be delayed for less than two hours, 20% of the full points (for that project) is deducted automatically. Delays over 2 hours is unacceptable. Should there be a legitimate reason for the delay (following university policy), the grading points will be carried over to your midterm or the final (whichever comes first).

Each project is described below, with grading weights in parentheses:

(5) Project 1: Descriptive Statistics
Develop some original data (not published). Make histograms and compute descriptive statistics for two random variables. Each variable should have 30 or more observations. Make clear the method used to generate the sample. Descriptive statistics should include minimum, maximum, mean, median, all the four quartiles, variance, standard deviation, trimmed mean, skewness and kurtosis for each variable. Make separate histogram for each variable. If you believe the data allows you to compare the two variables, plot relative frequency polygon for each variable and put the two polygons in one chart. What do you learn from this chart?

The final report should include (1) one excel sheet as a work sheet including all the detailed step-by-step calculation, and (2) the second excel sheet or a .doc file that includes your key results and explains why you are interested in these two variables, what question you have in mind, and what you have learned from the data summary. The second excel sheet or the .doc file should be clear, concise and to the point, as if you are submitting a summary report to your employer!

Examples: Go to two car dealerships and collect data on the sticker prices of cars. Drive through two areas to collect posted gasoline prices. Use the internet to find prices of comparable products from two sources. Survey students for their daily commuting time and methods.  Survey freshmen and seniors for their monthly expenditure on long distance phone calls.

Project 1 is due at 11:59pm of Feb. 19. Each individual student should submit his/her own original report, including a dataset in excel format, summary statistics in the excel file, and a separate word file (or a second excel sheet with text box) describing why you collect this data set. Duplicate is not acceptable.
 

(5) Project 2: Monto Carlo Study
 
This project is designed for you to understand data simulation, central limit theorem and sample size.

1. Focus on a normal distribution by choosing the mean and the standard deviation at your own discretion. Take this normal distribution as the "population".

2. From the population, draw 100 random samples, each with 30 observations. Calculate sample mean for each sample. What does the Central Limit Theorem predicts for the distribution of the sample mean?

3. Observe the distribution of the sample mean. In one chart, plot a relative frequency polygon for the sample mean and a relative frequency polygon for the raw data you have drawn from the population. How is the distribution of the sample mean compared to the population? Is it similar to what the Central Limit Theorem predicts? Explain it in a summary paragraph.

4. Repeat all above exercise (1,2,3) for two different sample sizes, one bigger than 30 one smaller than 30 (you have the discretion of choosing sample sizes). How does sample size affect your results? Explain.

5. Repeat all above exercise (1,2,3,4) for a uniform distribution ranging from a and b (you have the discretion of choosing a and b). How do results differ? Explain.

Project 2 is due at 11:59pm, March 4.  Each individual student should submit his/her own excel file with simulated data and a text box summarizing answers to questions listed above . Duplicate data or duplicate summary is not acceptable.
 

(5) Project 3: Regression
 
This project is designed for you to carry out basic data cleaning from a public data set and perform mean estimates, mean comparison, and ordinary least square regressions on the cleaned data.

On www.census.gov/hhes/www/income/4person.html, the Census Bureau posts the estimated median income for 4-person families by state and year. Please go to the link and complete the following tasks (always use confidence level alpha=95%):

1. Before working on a data set, we must understand how the data owner constructs the data. Such information is usually described in the "notes" section below the table(s). The data we are going to work on were drawn from several surveys. Could you tell us the names of the surveys? Who conducted them originally?

2. The data appear like a table on the computer screen, but they are not in an excel format yet. Please copy and paste the data into an excel sheet, with the following variables:

STATE -- State name
MedFamInc2002 -- Median Income for 4-person Families in Calendar Year 2002
MedFamInc2001 -- Median Income for 4-person Families in Calendar Year 2001
(... continue for each calendar year available on the website)
MedFamInc1974 -- Median Income for 4-person Families in Calendar Year 1974

In this excel file, take each row of the data as an OBSERVATION, and each column of the data as a VARIABLE. How many observations and how many variables do we have in the data set?

Hint: after you copy and paste a block of text, you need to use "Data - text to columns" to parse the text into multiple columns. Given the data structure in the website, you need to copy and paste for multiple times. Each time, you must make sure numbers on the same row correspond to the same STATE. The original data also reports median family income for the whole United States. Treat it as an additional "state" when you copy and paste the data, and record its name for the variable "STATE" as "United States."

3. Which state has the highest median family income in calendar year 2004? Which state has the lowest median family income in year 1994? Which state's median family income is the closest to that of the whole United States in 1984? ? Hint: you can answer these questions by sorting the data.

4. Now delete the row labeled as "United States" and focus on the 51 states (including District of Columbia) in year 2004. For easy view of the table, you may want to hide the columns for other years. Compute the mean of MedFamInc2004 and its confidence interval. Conduct a statistical test of whether the mean of MedFamInc2004 is equal to $62,732. Set your null hypothesis as (mean of MedFamInc2004=$62,732) and your alternative hypothesis as (mean of MedFamInc2004 not equal to $62,732). Hint: is this a one-tail or two-tail test?

5. Compare years 2004 and 2003 for the 51 states. Conduct a statistical test on where the mean of MedFamInc2004 is equal to the mean of MedFamInc2003. Since we expect the economy to grow from 2003 to 2004, set your null as (mean of MedFamInc2004 = mean of MedFamInc2003) and your alternative hypothesis as (mean of MedFamInc2004 > mean of MedFamInc2003). Hint: is this a one-tail or two-tail test? You are comparing two samples, are they independent or matched pairs?

6. Focus on years 2004 and 1974 for the 51 states. Please draw a scatterplot with MedFamInc2004 on the y-axis and MedFamInc1974 on the x-axis. (Hint: in excel chart wizard, choose the chart type "x-y plot".) What do you lean from the graph? Use excel function CORREL to calculate the correlation coefficient between MedFamInc2004 and MedFamInc1974.

7. Continue from task 6, what would you get if you run a regression where the dependent variable is MedFamInc2004 and the independent variable is MedFamInc1974 (including intercept)? What model does the regression imply? (Write down the regression equation.) How do you interpret the economic meaning of the coefficient of MedFamInc1974? Is the coefficient of MedFamInc1974 significantly different from zero? Use "equal to zero" as your null hypothesis and "not equal to zero" as your alternative hypothesis.

Project 3's deadline is 11:59pm, March 31 (new schedule). Each student submits one excel file as the working sheet and one separate word file answering all the questions.


(5) Project 4: Data cleaning by SAS (I)

You are expected to complete the following steps:

1. Save the 2001 Washingtonian ratings of top 100 restaurants in DC area into two comma delimited (.csv) files, first for the sheet "restaurant rating" and second for the sheet "city-demographics."  (Note that you cannot save multiple Excel data sheets into one .csv file.)

2. Read in the comma delimited file by SAS

3. Generate summary statistics by SAS:

Project 4 is due on 11:59pm, April 17. Each student should turn in his/her own answers. Each student's report should include a SAS program, a log file, an output file, and a separate word file answering questions. If you edit the output file with CLEAR answers to all questions, you can omit the word file.


(5) Project 5: Data cleaning by SAS (I)

This project continues from Project 4. Please complete the following steps:

1. Reshape the "restaurant-rating" data. For the rating data, each observation is a restaurant. Because sometimes a restaurant may serve multiple locations, we want to reshape the data so that each observation is a restaurant-location. To do this:

2. One newspaper article has challenged the authority of Washingtionian restaurant ratings by arguing that the ratings disproportionally favor West-European cuisine styles (including Modern American) and restaurants located in very rich areas such as Bethesda.

Project 5 is due on 11:59pm, May 1. Each student's report should include a SAS program, a log file, an output file, and a separate word file answering questions.


(15) Project 6: Create your own analysis
 

This project is a comprehensive review of all you learn in this class. You are given the freedom to create your own statistical study and hopefully you can show it to your mother and future job interviewer! Start from an interesting idea. Get the data from your hand collection (say the data you submit in Project 1) or from a published resource (trade magazines, consumer guides, mailing-in catalogs, internet shopping and public used data set available in library or on-line). Define two or more variables and explore their relationship by summary statistics, graphs, and regressions. Formulate and test appropriate hypotheses.

Feel free to use Excel, SAS or both to facilitate your analysis.

Project 6 is due at 11:59pm of May 20. The final report should include all the computer files you have used for data processing, and a separate word file describing your research question, why the data is suitable for this question, how you answer the question, conclusion and limitations.


Course Outline

Jan. 29: Introduction

Introduce instructor
Discuss syllabus
Introduce textbooks
Login IDs and other computer clarification
Questionnaire
Manage a small data set collected from the questionnaire

Assign project 1

Readings: Getz e.stat Chapters 1, 2.

Jan. 30: Introduction to Excel
Follow the excel tutoring from OIT peer training program.

If you need extra help in Excel, you can sign up the workshop(s) offered by OIT peer training program.or check other links for computer-related resources .

Feb. 5: Data collection and data description
Lecture on data collection.

Role playing

Divide students into six groups, each represents an institution involved in the subprime mortgage financial crisis . The name of the six institutions are:

Each group will discuss for 20 minutes and make a mini presentation on (1) one question about the subprime crisis you would like to answer, (2) the best data set available in your institution for the question, and (3) the methodology you would like to use to answer the question.

Before the class, we will allocate 6 seats to each group. You can choose the group that is most interesting to you, subject to seat availability. First come, first serve!

Feb. 7:  Data Description
Follow the story in Getz e.stat Chapter 4
Mean (weighted and non weighted)
Median
Order statistics
Variance and standard deviation
Skewness
Use first-day questionaire as an example

Readings: All sections in e.stat Chapters 3 and 4 except 4.14 and 4.15.

Feb. 12: Histogram
Histogram
Relative frequency polygon

Readings: e.stat 4.3, 4.4, 4.5.

Feb. 14: Probability
Distinguish population and sample

Flip coins and dice

Theory and practice, not necessary match, law of large numbers
Marginal and conditional probability, statistical independence, expectation.

Readings: Getz e.stat Chapter 5.1-5.6, 6.1-6.7.

Feb. 19: Distribution and simulation
Bernoulli Process (flip coins or roll dice)
Uniform PDF
Normal PDF
Data simulation and the law of large numbers

Readings: Getz e.stat Chapters 7,8. Emphasis: 7.1-7.5, 7.12, 7.14, 7.18-7.20, 8.1-8.3, 8.8-8.13.

Project 1 due. Assign Project 2

Feb. 21: Practice on simulation and the law of large numbers

Feb. 26: Mean estimation


Use Monto Carlo to generate estimates for population mean
t distribution
Confidence interval
Testing
Why sample size matters?

Readings: Getz e.stat Chapter 11,12. Emphasis: 11.1-11.4, 11.6, 11.7, 11.9, 12.1-12.9, 12.10-12.12.

Feb. 28: Hypothesis Testing
Null hypothesis versus alternatives
Type I and Type II errors
One tail test vs. two tail test

Readings: Getz e.stat Chapter 13.1-13.10, 13.13, 13.15-13.18.

March 4: Testing of two samples

Testing equal mean
Independent samples
Matched pairs

Readings: Getz e.stat Chapter 14.1-14..5, 14.8-14.11, 14.14-14.16.

Project 2 due. Assign Project 3

March 6: Regression

Scatter plot
basic regression theory
R square , F test
Standard error of coefficients
Testing
Results Interpretation
Readings: Getz e.stat Chapters 19, 20, 21, 22. Emphasis: all sections in Chapter 19, 21.6-21.7, 21.10, 21.12-21.14, 22.1-22.5, 22.7

March 11: Practice of Regression

March 13-25: Review of data analysis in Excel

March 27: Midterm. Project 3 due on March 31.

April 1: Review of Midterm and Introduction to SAS

Shortcomings of Excel
Introduce SAS programming rationale
Read in data
Assign Project 4

Readings: The Little SAS Book Chapters 1and 2.

April 3 - April 10: Data manipulation of a single data set

Read in complicated data
Data recoding
Converting Date to values
SAS functions
Use SAS to generate and present sample statistics

Readings: The Little SAS Book Chapters 2,3,4.

April 15 - April 22 : Data Manipulation of multiple data sets

Sort a data set
Generate a subset of data
Add in new observations to a data set
Merge and update data sets

Readings: Cody & Pass SAS Programming by Examples Chapters 3, 4.

Project 4 due on April 17.
Assign Project 5 on April 15.

April 24 - April 29: Hypothesis testing in SAS
May 1 - May 8: Regression and Testing
Generate dummy variables
Imputing missing values
Regression command
Testing linear restrictions
Comprehensive examples

Project 5 due on May 1.
Assign project 6 on May 1.
May 13: Review of Excel and SAS materials, practice final

May 16: Session 2 Final 8am-10am Plant Sciences 1129

May 20: Session 1 Final 10:30am-12:30pm Plant Sciences 1129

Project 6 Due on May 20.

Important Dates
 

Feb. 19    --    Project 1 due
Mar. 4   --   Project 2 due

Mar. 27 -- Midterm
Mar. 31-- Project 3 due
Apr. 17 -- Project 4 due
May 1   -- Project 5 due
May 16   -- Session 2 Final
May 20   -- Session 1 Final
May 20 -- Project 6 due


Example Files Used in Class

Excel examples

Notes for Excel.ppt

data-summary-formula.doc

data-summary-example.xls

coinflip-dierolling-example.xls

simulation-formula.xls

show-central-limit-theorem.xls

SAS program
Notes for SAS Programming.ppt

sas-class1.sas

sas-class2.sas

sas-example-merge-meancomp.sas

A comprehensive example: reg-cityreg.sas

Final review
final-review.ppt

practice-final.doc

data used in practice final




Links to computer related resources

On-campus computer labs

On-line tutoring for Intermediate Excel - made available by UMD peer training program

MS Excel help

Hands-on Tutor for Windows, MS Office and Internet, a CD-ROM published by the Corporation for Research and Educational Networking (CREN), is available at the UMD information technology library (computer and space science building #1400) for a free on-site review or an on-site purchase with $20.
Microsoft Frequently-asked-questions and highlights for Excel 2000
SAS on-line help
SAS Institute, Inc.
Samples & SAS Notes
UMD on campus SAS help

UCLA web resources for SAS

SAS Topics - Data Management (offered by UCLA)
SAS Topics - Regression (offered by UCLA)


Links to public data sets: on-campus, U.S. domestic and international

Data available on campus


U.S. Domestic Data sets

International Data sets