The 9-week course was divided into two sections. First, we went over the fundamentals of performing a comparative market analysis (CMA) using foreclosure listings and multiple listing service (MLS) listings in Tarrant County, TX. Second, we learned how to automate the process using ArcMap's ModelBuilder.
We used exclusively the HUD foreclosure listings in Texas available for free from the Southwest Alliance of Asset Managers. This is a fantastic resource as the listings can be batch downloaded in Excel format. Each week, we were able to download a fresh batch of new foreclosure listings for Tarrant County. As an aside, HUD foreclosure listings are freely available from most states. To locate the agency that manages these listings for any state, see the HUD Homes website.
For MLS, we used data from the North Texas Real Estate Information Systems, Inc. (NTREIS). Unfortunately, acquiring batch MLS data is expensive and is only available to realtors. In May I downloaded app. 17,000 MLS listings of all statuses, including both sold and active properties. We used this data for the entire course.
In its most straight-forward sense, a CMA compares the selling price of properties (from MLS) in the same neighborhood of an active foreclosure listing. The lower the price of the foreclosure as compared to the average selling price the better the deal...on the surface, anyway. There are so many factors to take into consideration, but first let's consider this straight-forward CMA.
To accomplish this bare bones CMA, you first need to geocode both the foreclosure and the MLS listings. You then need to generate a buffer that will designate the neighborhood surrounding the foreclosure property. Then, spatially join the sold MLS points (containing the selling price) to the buffer, making sure to average the selling price field. Then subtract the foreclosure list price from the average selling price, and you are all set.
This is similar to the CMA reports I have seen from many commercial vendors.
Yes, you are correct. Now, let's take a look at the many, many wrinkles that make this analysis so much fun.
- In our CMA, we want to compare the price per square foot. The cost per square foot often decreases as the size of the property increases.
- This causes a large problem as many foreclosure listings, including the HUD data we used, do not report the square foot of the property. Yes, this data is available in the MLS, but the MLS only includes those properties that have been recently active. The answer is to acquire this data from the local appraisal district. In our case, the Tarrant Appriasal District (TAD). We acquired the complete primary real estate account data from TAD, which includes the square foot (living area), but unfortunately there is no reliable field in common with the foreclosure data so we could not make a tabular relationship. So, we acquired the complete parcel shapefile, tabular joined that to the primary real estate account, and then geocoded the foreclosure listings directly to the parcel boundary. We were receiving accuracy results in the upper 90 percentile. Then, we spatially join the parcel shapefile to the geocoded foreclosure shapefile. Whew...we finally obtained the square foot of each foreclosure and calculated the price per square foot.
- The MLS data already included the price per square foot.
- A standard spatial join between the foreclosure buffer and the MLS sales is not appropriate as not all properties are comparable. Differences between the year built, number of bathrooms, etc. can decrease the accuracy of a CMA. For example, consider a foreclosure buffer with 6 recently sold properties, and 2 of the properties were built in 2004 while the other 4 were built in the 1950s. If the foreclosure property was built in the 1950s, it might not be wise to include the two newer properties in the CMA as the prices of those two might be substantially higher than the other four.
- This caused a major wrinkle, as I know of no way to exclude such properties from a spatial join using ArcGIS built-in tools. In other words, I know of no way to perform a spatial join filtered by a query based on the values of each feature of the target layer. When planning this class I knew this would be the major stumbling point, so I created a Python script that did just this. After I clean it up a bit this Fall semester, I will post it here and to ArcScripts.
- Demographics, especially crime rates and potentially employment outlook, can play a major role in an investor's decision to invest in a property. After the CMA, it is then necessary to filter, rank, or weight the results by these demographic attributes.
- We used block group data provided by Applied Geographic Solutions (AGS), which the library has been purchasing the last few years. We are shifting to accessing the data via SimplyMap, but this will be a whole other post.
- Defining a neighborhood by a circular buffer is not the most reliable way to define a neighborhood. A better method is to include only those properties in the same subdivision.
- While spotty subdivision data is included in MLS, it is not included at all in most foreclosure listings. The solution again was to turn to TAD. If you geocode both the foreclosure listings and MLS data using a parcel shapefile as the reference, you can spatally join the parcel shapefile back to the two geocoded point layers to obtain the subdivision. Then, you need to perform another query-based spatial join (see above), which to the best of my knowledge is not included within the standard ArcGIS tools. The script that I wrote allows for this as well.
- Yeah, there are other issues, but this is enough for now.
Using the ArcGIS ModelBuilder, this be automated up to a point, but not as smoothly as I hoped. The Python script I wrote can be incorporated as a geoprocessing object into a Model, so that went fairly smoothly. Two major difficulties arose. First, there are limitations that make it difficult to set parameters for temporary layers that are contingent on other parameters. Second, I just find the ModelBuilder flunky and inconsistent in general. While everyone did a fantastic job on the first section, there were varying degrees of success automating the process. I recommended to everyone interested to take the Python Scripting for ArcGIS course I am teaching Spring 2008 semester.
Yes, initial development on these special courses can be quite exhaustive. I expect to be teaching this course once per year for the foreseeable future and I plan to hold a library workshop this Fall semester showing everyone how GIS can help investors flip houses. If this topic can not get folks into thye library for a workshop, I do not know what will. (Actually, the four workshops held in Spring 2007 semester averaged over 20 attendees each.) This workshop will be part of our GIS Day activities, but this deserves its own post as well.