NAME: Modeling home prices using realtor data TYPE: Random sample SIZE: 76 observations, 19 variables DESCRIPTIVE ABSTRACT: The data file contains information on 76 single-family homes in Eugene, Oregon during 2005. This dataset is suitable for a complete multiple linear regression analysis of home price data that covers many of the usual regression topics, including interaction and predictor transformations. Whereas realtors use experience and local knowledge to subjectively value a house based on its characteristics (size, amenities, location, etc.) and the prices of similar houses nearby, regression analysis can provide an alternative that more objectively models local house prices using these same data. SOURCES: The data were provided by Victoria Whitman, a realtor in Eugene, in 2005. The data were used in a case study in Pardoe (2006). VARIABLE DESCRIPTIONS: id = ID number Price = sale price (thousands of dollars) Size = floor size (thousands of square feet) Lot = lot size category (from 1 to 11) Bath = number of bathrooms (with half-bathrooms counting as 0.1) Bed = number of bedrooms (between 2 and 6) BathBed = interaction of Bath times Bed Year = year built Age = age (standardized: (Year-1970)/10) Agesq = Age squared Garage = garage size (0, 1, 2, or 3 cars) Status = act (active listing), pen (pending sale), or sld (sold) Active = indicator for active listing (reference: pending or sold) Elem = nearest elementary school (edgewood, edison, harris, adams, crest, or parker) Edison = indicator for Edison Elementary (reference: Edgewood Elementary) Harris = indicator for Harris Elementary (reference: Edgewood Elementary) Adams = indicator for Adams Elementary (reference: Edgewood Elementary) Crest = indicator for Crest Elementary (reference: Edgewood Elementary) Parker = indicator for Parker Elementary (reference: Edgewood Elementary) SPECIAL NOTES: None. STORY BEHIND THE DATA: The data file contains information on 76 single-family homes in Eugene, Oregon during 2005. At the time the data were collected, the data submitter was preparing to place his house on the market and it was important to come up with a reasonable asking price. Whereas realtors use experience and local knowledge to subjectively value a house based on its characteristics (size, amenities, location, etc.) and the prices of similar houses nearby, regression analysis provides an alternative that more objectively models local house prices using these same data. Better still, realtor experience can help guide the modeling process to fine-tune a final predictive model. For example, both realtor experience and regression modeling results suggest the need for a BathBed interaction term and an Age-squared transformation in the model. PEDAGOGICAL NOTES: It can be challenging when teaching regression concepts to find interesting real-life datasets that allow analyses that put all the concepts together in one large example. For example, concepts like interaction and predictor transformations are often illustrated through small-scale, unrealistic examples with just one or two predictor variables that make it difficult for students to appreciate how these concepts might be applied in more realistic multi-variable problems. This dataset addresses this challenge by allowing for a complete multiple linear regression analysis of home price data that covers many of the usual regression topics, including interaction and predictor transformations. The statistical ideas discussed range from those suitable for a second college statistics course to those typically found in more advanced linear regression courses. REFERENCES: Pardoe, I. (2006). Applied Regression Modeling: A Business Approach. Hoboken, NJ: Wiley. SUBMITTED BY: Iain Pardoe University of Oregon Lundquist College of Business, 1208 University of Oregon, Eugene, OR 97403, USA. ipardoe@lcbmail.uoregon.edu