Thursday, October 25, 2007

Stuck in Traffic: Find the Road Segments With the Highest Pollution Levels

Workshop This Afternoon!!

Download Full Workshop Materials
View Zohoshow Presentation
View Handout
View Flyer
View Google Maps Demo

This has been a whirlwind frantic rush to pull together but just about everything is ready to go for this afternoon's GIS workshop. Should be a great one...

Dr. Melanie Sattler, Civil & Environmental Engineering professor, will lead the workshop with a 30 minute discussion of why air pollution is important in the D/FW Metroplex and discuss the various campus research efforts to study and model air quality.

I will then lead everyone though a hands-on GIS exercise that will include the following steps:
  • Calculate total pounds of air emissions within 5 miles of everyone's home address (using spatial join)
  • View distribution of emissions across region (using fishnet polygon)
  • Join various measurement data to air monitoring stations (such as ozone and wind direction)
  • Calculate estimation of ozone concentrations for entire region (using IDW)
  • Calculate average ozone concentrations for highway road segments (using spatial join)
How can I expect to guide novice users through such procedures in a reasonable amount of time? Every procedure is automated using the ModelBuilder and VBA, and is accessible via a custom toolbar.


Data Sources Include:
If you're in the area, drop on by...

Friday, October 05, 2007

Large Table Manipulation: ArcMap vs. Microsoft Access

Helped an engineering student yesterday to display a table with over 500,000 XY coordinates and convert it to a shapefile. The table the student brought was a 2.5 gig CSV file. It gave me some unexpected troubles.

The conclusion is that the manipulation of large tables is often times best done using Microsoft Access. Specifically, converting field types for large tables with hundreds of thousands of records is best done in Microsoft Access, and not in ArcMap.

If you are interested in the particulars, read on below.

Brought the table into ArcMap, but every field in the table was read by ArcMap as a text field. Opened the table in ArcMap, and all the values were numerical coordinates. Opened the table in Excel, and of course the table only partially opened as there were too many records. The portion of the table that did open were all numerical coordinates, but Excel was also reading the fields as text. No problem. Switched back over to ArcMap and attempted to add two new double fields and use the field calculator to copy the text fields into the double fields. ArcMap is unable to add fields to text files, so we had to export the table as a DBF first. This took over 5 minutes as there was so much data. Every time we tried to add a new double field to the large DBF file, the screen would turn white and would hang there indefinitely. We tried this twice, and gave up on each attempt after 5 minutes of inactivity.

Then I had the idea to import the table into a blank Microsoft Access database where I can directly change the field types from text to double. Before changing the field types, I opened the data table and sure enough I saw that every field was surrounded by quotation marks {"} as a text qualifier. THAT was the problem! Switched over the design view and changed the field type and sure enough that did the trick. ArcMap honors Access field specifications and we were able to view the event class in no time. Took a bit more patience waiting while we exported the shapefile, but it was nice to see the completion of what should have been a routine process.

Why were there quotation text qualifiers around each field? Well, the student told me the file was created using Excel 2007, which I have not yet used. Perhaps this is standard procedure for Excel 2007? If not, perhaps there is some setting in Excel that will inadvertently place text qualifiers around fields?

Repeating from above: The conclusion is that the manipulation of large tables is often times best done using Microsoft Access.

Wednesday, October 03, 2007

Interesting GIS/Geography Related Dissertations

[updated 10/05 to include links to free dissertation abstracts and 24-page previews, compliments of UMI Dissertation Publishing at ProQuest. See comments for more info. Thanks Mike!]

Here's the latest installment of GIS-related dissertations that have caught my eye. Previous lists include 11/05, 01/06, 01/06, 04/06, 07/06, 10/06, 04/07. (Of course, this is not a comprehensive list...only those I find interesting.)

Sunday, September 30, 2007

Summer 2008: GIS Librarianship Course

The confirmation is only preliminary at this point, but I am preparing to teach a GIS Librarianship course next summer (2008) for the University of Arizona's School of Information Resources and Library Science. This course will be 100% distance education, so I encourage any library student or librarian out there with an interest in GIS to consider enrolling. I will post more details here as the confirmation becomes more official and as the date gets closer.

This is so, so great. For the past few years, I have been teaching GIS and spatial analysis courses for Earth & Enviro Sciences, Marketing, and Real Estate, but it has always been a dream of mine to work with library science students to attempt to pass on and convey my excitement and enthusiasm for GIS librarianship. As part of this dream, I have posted practicum opportunities at the UNT School of Library and Information Sciences for the last 4 years. My only bite came over the summer, but the night before the practicum was to start, he/she canceled due to an emergency. Ah well...

Anyway, I am typing this and am giving my first concrete thoughts as to how to structure such a course. Here is my first draft/first thoughts outline of the course:
  • What is GIS and what contributions can a library provide to this subject area?
    • This will be a nice, but brief overview of GIS and GIS librarianship. The final section (below) will consider the role of GIS librarianship in greater detail, but this will serve as a nice introduction.
  • Basics of GIS
    • My thoughts are this will take up the first half the course, and will be a condensed version of an Intro to GIS course, with a focus on data types, data acquisition, and data preparation. In essence, the pre-analysis concerns of GIS.
  • Marketing GIS services
  • Managing GIS data
    • This will focus on metadata and the use of various types of metadata catalogs.
  • Data Acquisition
    • This will focus on selecting and processing datasets from data vendors, public data archives, and government organizations.
  • Types of GIS Services Offered by Libraries
    • I think it best to end the course with this topic. At this point everyone will have a deeper understanding of the potential and can actually contribute to the conversation. This might even make a great final project.
I am sure I am missing something(s) here, but I feel pretty good about getting at least this much down. Of course, if anyone has any comments, suggestions, or criticisms (keep those clean, yeah?) feel free to leave them as comments here or email me at: been [[-at-]] uta [[-dot-]] edu.

Academic Cycle: Projection Time!

GIS projects must be due soon in many of our classes as folks from all over campus are coming into the GIS Lab for assistance with projecting their data. Funny how everyone from different departments and different classes are all focused on the same thing. Wonder if this has any relationship to how women's menstrual cycles sync after living close together? Does this mean the various departments are psychically, biologically, or spiritually moving in unison? Ha!

Anyway, by far the most popular question in the lab recently has been confusion about defining and projecting data. Map of Texas showing up in the Pacific? Saw this a few times in the lab yesterday. Always the same cause. An undefined shapefile, most of the time geographically referenced, was projected to some form of Lambert Conformal Conic without defining the projection first!! ArcMap can not transform a projection without defining the initial projection first. Students from many disciplines are having a tough time understanding when and why it is necessary to define their projection before re-projecting their data.

As stated in the ArcGIS 9.2 Help, the projection definition "records the coordinate system information for the specified input dataset or feature class including any associated projection parameters, datum and spheroid. It creates or modifies the feature class's projection parameters."

So, how can you know when your data's projection needs to be defined? Well, first clue is ArcMap will give you a popup letting you know. These are not ads. Read them. Second clue is that your data will not be in the correct position, such as a map of Texas in the middle of the Pacific. However, if all of your data is consistently un-defined, then you will only notice this if you add another layer that is properly defined. Tough part is that if a student adds 10 layers whose projections are undefined, and then adds another layer that is properly defined, it might seem as if the defined layer is the one in the incorrect position. Like a democratic vote, un-defined wins 10 to 1.

Why is this confusion happening more than it used to? My opinion is the loss of ArcMap's assumed geographic projection definition. ArcMap 9.1 (and previous versions) would detect and assume that un-defined shapefiles were geographically referenced (decimal degrees). If the data is indeed geographically referenced, then ArcMap's assumption removes the need to define the file before projecting it. However, 9.2 makes no such assumptions. Undefined data must be defined, whether it is geographically referenced or not. Which is better? I sure do not know, but I do know that this is forcing our students (and faculty) to give a lot more thought to projections and datums than they needed to in the past.

Friday, September 28, 2007

First GIS Workshop - Standing Room Only!

Yesterday's workshop was a success on every conceivable level.

We had 30 participants, which in a 27 PC lab is the best. Breakdown was 4 faculty, 12 grads, 8 undergrads, 2 staff, and 4 non-affiliates (City of Arlington intern, couple of investors, and an alumnus). By the 5 minute mark every seat was taken and a noticeable number of people dropped by and left when they saw how crowded it was.

Everyone seemed very engaged. Lots of discussions amongst everyone, and the topics invariably lead to how the techniques and/or data sources we used can be incorporated into their own research. Almost everyone was able to complete the entire workshop, which culminated in the creation of a Google Map highlighting their results. These Google Maps were generated using our ArcMap2GMap script.

The university Public Affairs Office sent out a nice press release, that was mentioned in the Ft. Worth Star Telegram. The Shorthorn (university newspaper) printed a fantastic story this morning that featured a number of nice interviews with students.

Let's hope this excitement bleeds over into our next workshop on air pollution in the metroplex, scheduled for October 25. Details here.

Wednesday, September 26, 2007

Sleepin' at the Bowl: Best Locations for New Hotels in Arlington, TX

Workshop Tomorrow!!

Download Full Workshop Materials (coming soon)
View Zohoshow Presentation
View Handout (1st draft)
View Flyer

Last week, I posted the basic details about this semester's 3 workshops. Well, after a round of late night shifts, the new hotel workshop is just about ready to go.

No doubt about it, there is so much data included in this workshop. Everyone will see that I got carried away... Here are highlights of the details of the data sources used. Complete details in the handout.
Here is an overview of the how the workshop will go:
  1. Explore the data (attributes and symbology)
    1. All library GIS workshops are designed to educate, intrigue, and entertain both GIS users and those merely interested. This portion, while slightly slow for the GIS users, makes those new to GIS comfortable with the software.
  2. Join hotel data to Tarrant Appraisal District parcels
    1. This will allow us calculate ratios such as receipts per sq f, and to calculate the total sq ft of hotel space.
  3. Calculate densities
    1. We will calculate point densities of:
      1. number of units available
      2. number of hotels
      3. receipts per unit
      4. receipts per sq ft
  4. Suitability site selection
    1. reclassify and combine
  5. Narrow suitability raster to identify most suitable locations
  6. Generate Google Map displaying results
It should be a lot of fun. If you are in the area, you should definitely stop by... ;)

What did I not have enough time to incorporate? While we do have the zoning and landuse data in the database, it is not incorporated in the site selection process...mainly due to lack of time. It can, of course, be referenced at any time by adding it as a layer.

Sunday, September 23, 2007

Texas River Information Management System (TRIMS)

Attended a demonstration of the new Texas River Information Management System (TRIMS) in early September at the Ellison Miles Geotechnology Institute (located in Farmers Branch, TX on the Brookhaven College campus). TRIMS went live in late August.

This ArcGIS Server driven web mapping application is a fantastic resource for us here in the Trinity River Basin (North-Central, Central, and into Eastern Texas).

Features from the demonstration invite:
  1. Access to over 30 spatial layers including aerial photography, streams and rivers,reservoirs, 2000 census data, elevation, political districts, USGS topographic maps, groundwater, boundaries, dischargers, water quality sampling sites, and roads
  2. Metadata for each layer
  3. Ability to measure length and area and create graphics
Here is a list of the layers included.

We discussed two primary uses for the interactive website. First, as a tool for landowners to manage natural resources. Second, as an educational tool to be used in the K-12 and college classroom. There is no doubt that the ease of use and thorough content in TRIMS will serve both of these purposes. I shared this resource with our science & engineering librarians, and we have been pushing this resource to our non-GIS savvy students. As all of the data is derived from public resources, we have access to all of the information available through TRIMS in GIS format, or in formats that can be incorporated into a GIS. We are planning a workshop next Spring semester targeting our education faculty and students, and this is one of the key resources I plan on showcasing to them.

One fantastic feature I want to point out is the live steam gauge data. Using the site ID number, clicking on any gauge will create a link directly to the USGS real time water data. For example the map on the left, the gauge at the Clear Fork of the Trinity River in Fort Worth links out to the following real-time water data. This is a great example of what is most great about this site. The way it brings together numerous public data sources into one user-friendly interface. Yes, as the Map Explorer points out, some folks will complain that it is not as easy to use as Google mapping products, but this is because of the 30 spatial layers and the great markup ability included in ArcGIS Server.

This is a great resource that I look forward to promoting as often as I can.

Some background info about TRIMS:
As part of Governor Perry’s Trinity River Basin Environmental Restoration Initiative, the Trinity River Authority’s Clean River Program in cooperation with TCEQ has funded the Trinity River Information Management System (TRIMS) through Texas A&M University’s Institute of Renewable Natural Resources.

Coast Guard Site Selection & ModelBuilder

I am teaching the spatial analysis course this semester, and the first two projects everyone is working on is based on a suitability site selection exercise. First, they need to identify the best locations for the development of a new Coast Guard facility on the Gulf Coast. Second, they need to develop an ArcMap Model to automate this suitability process.

Here is their exercise scenario (fictitious, of course) :
The U.S. Coast Guard is concerned about the possible damage to the environment and natural habitat along the Texas Gulf Coast if an offshore oil spill were to occur. The Coast Guard has obtained funds to build a new facility whose primary purpose is to protect the environment.
They are required to use at least one Census 2000 attribute by tract, and whichever data sources they deem necessary from the Texas General Land Office. There are some great Gulf Coast environmental and habitat resources at the GLO, including the Environmental Sensitivity Index Shoreline, In-Situ Burn Exclusion Areas, Offshore Oil/Gas Platforms, Priority Protected Habitat Areas, Wildlife Refuges, and much more. For our educational purposes, it is nice that some of the GLO's files, such as the TXDOT Roads/Highways, are undefined shapefiles to give everyone the challenge of defining it.

After the initial suitability site selection project, everyone will create an automated Model using ArcMap's ModelBuilder. Standard suitability analysis such as this is so straight forward that it works well in Models. However, I found a bug in the latest ArcGIS 9.2 SP3 service pack. Seems as if the Classify button is grayed out when the Reclassify tool is incorporated into a Model. Of course this only occurs when the reclassification scheme is a parameter. I never saw this before and can only assume this is the result of this latest service pack, which was released in August. Searched the user forums, but surprisingly could not find anyone else who come across this problem. I found a work-around, however, which has thus far worked well enough for me not to really complain about this. Found that if you also add the reclassification field as a parameter, then the Classify button become activated. Seems as if the Model is not accepting the reclassification field as specified by the developer, and is requiring the user to specify the reclass field. Good enough. Perhaps this is a new safeguard feature and not a bug.

With that said, their presentations for their initial possible site selections are next week. We've got some great students this semester and I am really looking forward to seeing what they came up with.

Friday, September 21, 2007

Web Data Subscriptions: Simply Map & Geolytics On-Line

Our data users are so loving these new online data subscription options available through products such as the Geolytics On-Line Data & Maps and Simply Map. They can hop online, on or off campus, and access the data they need. These products make accessing demographic and business data as accessible as finding journal articles.

I had two faculty this past week alone make an effort to let me know how pleased they are that they and their students can access such data so easily online. Previously, we purchased such datasets via CD or downloadable files. Geolytics products were traditionally distributed as CDs to install on stand-alone PCs. Acquisition departments do not enjoy purchasing electronic monographs such as these, and then users needed to come into the library to access this data when the majority of our electronic holdings are accessible from off-campus. We have also been purchasing Applied Geographic Solutions (AGS) marketing data on the block group level from Spatial Insights, but this data was also distributed on CD for a single user. We worked out a solution where we received permission from Geolytics and AGS to serve the CD data using Citrix (and we continue to do so), but this requires the client to install an application. While the installation is small and free, it imposes numerous barriers that are not present for the majority of the library's other databases. Also, students do not normally have permissions to install software in campus labs.

Now, not only can users access their data so easily via the web, but they have choices to generate and download customized maps, delimited files, and even shapefiles containing their selected attributes. Geolytics CDs have always had this ability, but as of this summer, Simply Map now also provides the ability to generate and download shapefiles.

Seems like every month or so, Geolytics sends an email about another title they now provide via the web, and I discussed over the summer with the folks over at Geographic Research, Inc. about plans they have to build addons to their base Simply Map package. I hope they compete viciously with each other. That's the best!

Bottom line is the users are very happy, and when they actually take the time to thank me then that makes me happy as well...

By the way, I am sitting all alone here by myself during my first couple hours providing GIS Research Assistance in SL. Sure, it gives me a chance to write up this post, but someone come over and chat a bit, yeah?

GIS Research Assistance in Second Life


Well folks, our library is making the move and has decided to test the possibility of providing research assistance in Second Life (SL). The pilot project will focus exclusively on GIS research assistance, after which we will examine its outcomes and decide how to proceed.

So, starting with the public opening of the new Info International Island tomorrow (Friday, 09/21), anyone needing GIS research assistance can receive live assistance help from Mapz Oh during the times listed below. The hours listed here are in SLT (Second Life Time), which is the same as PST.
  • Sundays 5-8 p.m.
  • Wednesdays 5-8 p.m.
  • Fridays 2-4 p.m.
If you drop by, be sure to fill out our quick and convenient form to let us know how we are doing. Be sure also to pick up your GIS t-shirt and steaming tea mug...

We also created a new SL group, entitled GIS Researchers. The purpose of this group is to (1) provide an in-world network of enthusiasts and experts who can provide research assistance, and (2) to announce any GIS research activities in SL, such as presentations, showcases, discussions, etc.

Bear in mind, however, that things move and evolve so fast in SL that the purpose, direction, and goals we initially have for this new project may change as we see what needs (if any) arise.

Here is a sample from the GIS Research Assistance notes available in-world:

GIS Research - Types of Assistance Provided Here
For this SL venture of GIS Research Assistance, we envision providing assistance with RL geographic information systems research.

Here is an overview of the services we envision providing:

1. Locating geospatial data sources
2. Integrating these sources into your GIS
3. Performing GIS analysis (Beware, however, that we are most adept at using ESRI ArcGIS software)
4. Assistance automating ArcGIS via Models, Python, and VBA
5. Pleasant conversation about all things GIS

GIS Research - Why Offer GIS Assistance via SL?
Three reasons. First, it is my belief that virtual reality will play a major, possibly a dominant, role in the future. As information professionals (librarians), it is our responsibility to learn to interact with and to master these tools. Second, at this point we do not know how many of our students use Second Life. If any GIS users do use Second Life, they will surely use this service and then we can actually know and count the number of users. Third, why not? It is so easy to provide this additional service that we have nothing to lose and everything to gain.

Thanks so, so much to the great librarians at Info International, especially Abbey Zenith, for allowing us to set this up on their space. A great big thanks also to Razitra Artizar for SL enthusiasm and guidance. You guys are the best.

Thursday, September 20, 2007

Sho’ah’s (שואה) Lasting Impact: Mapping Jewish Migration


Sho’ah’s (שואה) Lasting Impact: Mapping Jewish Migration: http://gis.uta.edu/maus/

As we did last year with Mapping the Afghan Population in the US ( see background), the Information Literacy Librarian and I teamed up to create a project that will integrate numeric and spatial resources into the freshman English Composition courses.

All sections of Freshman Composition I (app. 60) are reading Maus: A Survivor's Tale, by Art Spiegelman. As of this moment, 17 sections have signed up to bring their class in to use the interactive map we designed. This is a whole lot of students who will get exposed to detailed aspects of GIS and geography. I love it.

The Lesson: GIS & Information Literacy
The lesson, from which we hope to publish results, is measuring the ability of interactive GIS and multimedia to enhance students' ability to select a research topic/question. The students first read a bio from Holocaust survivor Madeline Deutsch and write a research topic. Then, we add two additional immersive layers of information. First we play a brief multimedia clip of Madeline describing her ordeal and its aftermath. Second, we lead them through an exercise where they explore Madeline's life before, during, and after the war using the interactive map we created.

We met with the first two classes this morning and our impression is that it was very successful. The instructor agreed and seemed very enthusiastic, which is always a good thing.

Other Uses?
Of course. I am currently working with numerous faculty on campus who teach or whose research interests include WW II or the Holocaust. However, it was the certainty of exposing so many freshman to these types of resources that made this project one of our highest priorities last summer. Perhaps researchers and/or educators outside of our fair university may have a use as well.

Data Sources
Jewish population figures were gathered from the American Jewish Society Yearbook. Big thanks to Texas Christian University Library for loaning them to us. The map features were gathered from historical maps available from the U.S. Holocaust Museum. Historical world shapefiles provided by ThinkQuest.org.

What's Under the Hood?
Technically, the GIS data is stored in an SDE database (Oracle) and is being published as a web service via ArcIMS. Using ASP.NET and the Google Maps API, we integrate the ArcIMS layers with Google Maps and voila...

(Same thanks as last year.)

Updates from the interface used last year with Mapping the Afghan Population:
  1. Drop-down menu for all features selected to activate individual pins
  2. Integrate HTML markup into Google info-windows
  3. 'Loading' animation during ASP.NET postback
  4. Restructured XML database to speed server-side processing
  5. Added ability to zoom to a particular X,Y,Zoom level when a particular background map is selected

Wednesday, September 19, 2007

Fall 2007: Library GIS Workshops

First few weeks of the semester have been a rush, and our first GIS workshop is set for next Thursday. From the Superbowl, to air quality, to flipping houses, we're hoping to make a big splash with this semester's workshop lineup. (For previous workshops, see Spring 2006, Fall 2006, and Spring 2007.)

Starting last semester, we began co-presenting the workshops with faculty whose research expertise was in the subject area, but not necessarily using GIS. This turned out to be very successful and we are repeating that this semester.

Here is an overview. I will provide links to the workshop materials as they become available...

Title: Sleepin’ at the Bowl: Best Locations for New Hotels for Superbowl XLV
Time: September 27, 2 - 4pm
Location: Central Library, Room B20 (basement)
Description: Learn how Geographic Information Systems (GIS) can be used to analyze the number of hotels, rooms, and occupancy rates in Arlington. We will then identify the best location for the construction of a new hotel to help handle the influx of football fans. Data sources include hotel data from the Texas Comptroller of Public Accounts and commercial property data from the Tarrant Appraisal District.

Title: Stuck in Traffic: Find the Road Segments With the Highest Pollution Levels (co-presented with Dr. Melanie Sattler)
Time: October 25, 2 - 4pm
Location: Central Library, Room B20 (basement)
Description: Learn how Geographic Information Systems (GIS) can be used to statistically estimate pollution levels within the DFW metroplex and to then identify the street segments that traverse through the regions with the highest levels. Data sources include pollution measurements from the Texas Commission on Environmental Quality, the Environmental Protection Agency, and various datasets from the North Central Texas Council of Governments.

Title: Buy & Sell That House: Find Houses to Flip (co-presented with Dr. Andy Hansz)
Time: November 14, 230 - 4pm
Location: Central Library, Room B20 (basement)
Description: Learn how Geographic Information Systems (GIS) can be used to analyze the residential property market in Tarrant County to identify ideal investment houses to flip. Data sources include sample Multiple Listing Service (MLS) data from the North Texas Real Estate Information Systems (NTREIS) and foreclosure listings from the Department of Housing and Urban Development (HUD).

Tuesday, August 14, 2007

Area Weighted Join vs. Standard Join

Students from an advanced suitability analysis course this summer needed to create a report that specified the percent of the join layer that intersected the target layer. For example, they needed to calculate the percent of each use from a landuse shapefile that intersect each zip code in Texas. I wrote a quick script for the class that generated the report they needed, but the implications are astounding to me.

Here is what I mean by astounding. As a test, I calculated the median household income and the total population within a 1-mile radius around each dance club in Arlington, TX comparing the following methods: (1) Total population using ArcMap's standard spatial join tool, (2) Total population using an area-weighted summation, (3) Average median household income using standard spatial join, (4) Average median household income using an area-weighted average.

The following table displays the results.



Field A displays the dance club's name. Fields B, C, and D above display the difference between using the standard ArcMap spatial join tool and a weighted-average spatial join tool when calculating average household income. Fields E, F, and G display the differences when calculating the total population.

The differences in both cases are quite high. I am thinking of myself and all of the students who I have seen naively rely on the standard spatial join tool for these types of calculations. Wow...

Why is there such a difference?

An area-weighted spatial join between two polygons comes in two flavors, depending on whether ti is calculating an average or a sum.

If it is calculating an average, the formula is [area-percent] * [value] + [area-percent] * [value]... The most important consideration is the percentages of the join features that are within each target feature. For example, in a particular zip code, there might be 3 block groups. Let's further suppose that block group 1 comprises 50%, block group 2 comprises 35%, and block group 3 comprises 15%.

If it is calculating a sum, the most important consideration is the percent of the join feature that actually intersects the target feature. The formula is ( [% area intersects target feature] * [value] + [% area intersects target feature] * [value] ) / number of intersecting join features. This is why you will see a much larger error when using the standard spatial join tool for summations than for averages. If 2% of a block group intersects a zip code, the standard tool will include the entire population of the block group instead of only 2%.

Is This a Perfect Solution?

No. This assumes a perfectly even distribution within each join feature. It is, however, a huge improvement.

Where Can I Get the Script?

Download it here. Extract the compressed archive and you will see three Python scripts and an ArcGIS toolbox. Open ArcMap or ArcCatalog, ensure ArcToolbox is visible, and add the Spatial Join Tools.tbx (single-click).

Caveat: These scripts are first drafts and have not been tested on any systems other than the ArcINFO Desktop 9.1 & 9.2 systems here at UT Arlington. There is no documentation. Also, the scripts run on the slow side. Eventually these will be optimized, but at the current time they are presented as is.

Description of the three tools:
  1. Average Area Weighted: Use this tool to calculate an area-weighted average spatial join between two polygons.
  2. Sum Area Weighted Join: Use this tool to calculate an area-weighted summation spatial join between two polygons.
  3. Percent Area Report: Use this tool to generate a report that specifies the percent of the join layer that intersected the target layer.

Saturday, August 11, 2007

GIS to Select Foreclosure Residential Properties

This summer I taught a graduate real estate course exploring the ways GIS can be used select foreclosure listings for possible investment. As far as student satisfaction is concerned, this was the most successful real estate course I taught. The course was very tight and practical, and hopefully everyone will go out and make some big bucks with the skills they learned...well, when the market eventually turns around, eh?

The 9-week course was divided into two sections. First, we went over the fundamentals of performing a comparative market analysis (CMA) using foreclosure listings and multiple listing service (MLS) listings in Tarrant County, TX. Second, we learned how to automate the process using ArcMap's ModelBuilder.

What data did we use?

We used exclusively the HUD foreclosure listings in Texas available for free from the Southwest Alliance of Asset Managers. This is a fantastic resource as the listings can be batch downloaded in Excel format. Each week, we were able to download a fresh batch of new foreclosure listings for Tarrant County. As an aside, HUD foreclosure listings are freely available from most states. To locate the agency that manages these listings for any state, see the HUD Homes website.

For MLS, we used data from the North Texas Real Estate Information Systems, Inc. (NTREIS). Unfortunately, acquiring batch MLS data is expensive and is only available to realtors. In May I downloaded app. 17,000 MLS listings of all statuses, including both sold and active properties. We used this data for the entire course.

How Can a CMA Help to Locate Foreclosure Properties for Investment?

In its most straight-forward sense, a CMA compares the selling price of properties (from MLS) in the same neighborhood of an active foreclosure listing. The lower the price of the foreclosure as compared to the average selling price the better the deal...on the surface, anyway. There are so many factors to take into consideration, but first let's consider this straight-forward CMA.

To accomplish this bare bones CMA, you first need to geocode both the foreclosure and the MLS listings. You then need to generate a buffer that will designate the neighborhood surrounding the foreclosure property. Then, spatially join the sold MLS points (containing the selling price) to the buffer, making sure to average the selling price field. Then subtract the foreclosure list price from the average selling price, and you are all set.

This is similar to the CMA reports I have seen from many commercial vendors.

Hey, This Oversimplified CMA Is Not Very Useful!

Yes, you are correct. Now, let's take a look at the many, many wrinkles that make this analysis so much fun.
  • In our CMA, we want to compare the price per square foot. The cost per square foot often decreases as the size of the property increases.
    • This causes a large problem as many foreclosure listings, including the HUD data we used, do not report the square foot of the property. Yes, this data is available in the MLS, but the MLS only includes those properties that have been recently active. The answer is to acquire this data from the local appraisal district. In our case, the Tarrant Appriasal District (TAD). We acquired the complete primary real estate account data from TAD, which includes the square foot (living area), but unfortunately there is no reliable field in common with the foreclosure data so we could not make a tabular relationship. So, we acquired the complete parcel shapefile, tabular joined that to the primary real estate account, and then geocoded the foreclosure listings directly to the parcel boundary. We were receiving accuracy results in the upper 90 percentile. Then, we spatially join the parcel shapefile to the geocoded foreclosure shapefile. Whew...we finally obtained the square foot of each foreclosure and calculated the price per square foot.
    • The MLS data already included the price per square foot.
  • A standard spatial join between the foreclosure buffer and the MLS sales is not appropriate as not all properties are comparable. Differences between the year built, number of bathrooms, etc. can decrease the accuracy of a CMA. For example, consider a foreclosure buffer with 6 recently sold properties, and 2 of the properties were built in 2004 while the other 4 were built in the 1950s. If the foreclosure property was built in the 1950s, it might not be wise to include the two newer properties in the CMA as the prices of those two might be substantially higher than the other four.
    • This caused a major wrinkle, as I know of no way to exclude such properties from a spatial join using ArcGIS built-in tools. In other words, I know of no way to perform a spatial join filtered by a query based on the values of each feature of the target layer. When planning this class I knew this would be the major stumbling point, so I created a Python script that did just this. After I clean it up a bit this Fall semester, I will post it here and to ArcScripts.
  • Demographics, especially crime rates and potentially employment outlook, can play a major role in an investor's decision to invest in a property. After the CMA, it is then necessary to filter, rank, or weight the results by these demographic attributes.
    • We used block group data provided by Applied Geographic Solutions (AGS), which the library has been purchasing the last few years. We are shifting to accessing the data via SimplyMap, but this will be a whole other post.
  • Defining a neighborhood by a circular buffer is not the most reliable way to define a neighborhood. A better method is to include only those properties in the same subdivision.
    • While spotty subdivision data is included in MLS, it is not included at all in most foreclosure listings. The solution again was to turn to TAD. If you geocode both the foreclosure listings and MLS data using a parcel shapefile as the reference, you can spatally join the parcel shapefile back to the two geocoded point layers to obtain the subdivision. Then, you need to perform another query-based spatial join (see above), which to the best of my knowledge is not included within the standard ArcGIS tools. The script that I wrote allows for this as well.
  • Yeah, there are other issues, but this is enough for now.
How is This Process Automated Without Any Coding?

Using the ArcGIS ModelBuilder, this be automated up to a point, but not as smoothly as I hoped. The Python script I wrote can be incorporated as a geoprocessing object into a Model, so that went fairly smoothly. Two major difficulties arose. First, there are limitations that make it difficult to set parameters for temporary layers that are contingent on other parameters. Second, I just find the ModelBuilder flunky and inconsistent in general. While everyone did a fantastic job on the first section, there were varying degrees of success automating the process. I recommended to everyone interested to take the Python Scripting for ArcGIS course I am teaching Spring 2008 semester.

Lot of Work For One Class..What Now?

Yes, initial development on these special courses can be quite exhaustive. I expect to be teaching this course once per year for the foreseeable future and I plan to hold a library workshop this Fall semester showing everyone how GIS can help investors flip houses. If this topic can not get folks into thye library for a workshop, I do not know what will. (Actually, the four workshops held in Spring 2007 semester averaged over 20 attendees each.) This workshop will be part of our GIS Day activities, but this deserves its own post as well.

Monday, April 23, 2007

Position: Map and Data Services Librarian (University of Illinois at Chicago)

Map and Data Services Librarian and Assistant Professor (University of Illinois at Chicago)
"The University of Illinois at Chicago Library seeks a dynamic and energetic librarian to provide reference, research consultation/user education services for cartographic material and social science data sets, as well as implementation of GIS applications within the library and the university community."
There sure seems to be a need for good GIS folks in Illinois, eh? Last week, I posted the newly available position at the University of Illinois Library at Urbana-Champaign, and now there is a similar opening in Chicago.

As this position's title suggests, the posting describes this position as a mixture between GIS, traditional cartographic materials, and data/numeric services. However, there does seem to be a strong focus on the social sciences.

Here are some snippets:
  • Works closely with primary users in a wide variety of academic areas including urban planning, history, earth sciences, public administration, sociology, political science, architecture, social work, education, public health.
  • Participates in collection development decisions for cartographic materials and social science data (maps, atlases, remote sensing images, geospatial data).
  • Participates in planning, design, and maintenance of web pages that include specific information about cartographic resources and GIS and social science data.
  • Creates local indexes (property listings, remote sensing products, aerial photography, etc.).
  • Works with other library units to assure consistent policies for the cataloging of maps, remote sensing imagery, aerial photographs, and data sets. Creates and reviews metadata for digital geospatial data and data sets, as needed.
Salary begins at $40,000, with faculty status.

Friday, April 20, 2007

Firefox Mapping Extensions in a Single .XPI Package

Lots and lots of fantastic mapping extensions for the Firefox browser. Which ones, you ask? Well, the 16 extensions listed below.

There are also a couple of extensions that allow users to back up all the extensions installed in their browser and package them into one .xpi extension file. The extensions are Firefox Extension Backup Extension and its partner Compact Library Extension Organizer.

So, I used FEBE and CLEO to backup and package all of the extensions in the following list into one convenient .xpi extension file: FFmapping.xpi. Install this file and you will have just about every Firefox mapping extension that I am aware of...Well, those that are compatible with Firefox 2.0 anyways. Now, the server might not be configured properly for direct extension installations, so you might need to download the extension first (right-click/save as), and then File/Open and browse to the file. This will get it installed.

Why use Firefox over IE? Well, here is one reason.

Google/Yahoo!/Live Local et al Map Extensions

  • Full Map
    • See more of the actual map on Google Maps. Rotate through 3 modes.
  • Map+
    • View a Yahoo! map of a selected address without having to open a new window or tab.
  • Map This
    • This extension will let you get a Google map for any address on a web page.
  • All Your Maps Are Belong To Us
    • Translates URLs for other mapping sites to Google Maps.
  • Firefox Toolbar for LookLOCAL Maps
    • The LookLOCAL Firefox Toolbar is a convenient extension to the Firefox browser that enables you to map a location, get directions, or search for products and services from any web page you are on without first navigating to an online mapping site...
  • MapIt!
    • Highlight an address and get a map and/or driving directions using your favorite online mapping site.
  • GDirections
    • Finds directions on Google and Yahoo Maps based on your selected text and one of various home addresses.
  • Freeway Driving Directions
    • Uses your favorite driving directions web site -- Expedia, MapQuest, or Yahoo! Maps -- to display the driving directions in a new tab when you highlight an address on any web page.
Region Specific
  • Streetmap
    • Simple UK streetmap search from context menu.
  • BuscaDirs
    • Gets a map for a selected address in Argentina, as well as for some cities in other Spanish-speaking countries that use the same address format.
  • Locate Address in Israel
    • Enables you to locate Address written in hebrew using mapa.co.il map database.
Geotags (Websites & Images)
  • Shazou
    • The product called Shazou (pronounced Shazoo it is Japanese for mapping) enables the user with one-click to map and geo-locate any website they are currently viewing.
  • Nearby
    • Shows you Flickr photos, provides GeoURL links and Degree Confluences nearby the website your viewing.
  • GeoURL
    • Opens useful sites for pages geographically marked with ICBM or geo.position META tags.
  • Photo Map
    • Display user contributed photos on a map...
Track Packages

Wednesday, April 18, 2007

American Hospital Directory: Summary Hospital Data Maps


The American Hospital Directory (AHD) now supplements their public hospital data with Google-powered maps. I (and many of our health care researchers) have used AHD for hospital data, but the ability to access and view the hospital data through maps is fantastic and so very useful.

Check out their map of the 596 Texas hospitals in their database.

Hospitals are broken up into 8 categories:
  1. Short term acute care
  2. Critical access
  3. Psychiatric
  4. Long term
  5. Rehabilitation
  6. Children's
  7. Other
  8. Unknown
Great data is provided for each hospital, including revenue, number of beds, discharges, type of service, inpatient origin, financial statistics, and more. There are a number of subscription services available on their site, but this free data is great.

Here is the press release announcing their new mapping service.

Tuesday, April 17, 2007

Position: GIS Librarian (University of Illinois Library at Urbana-Champaign)

Position: GIS Librarian (University of Illinois Library at Urbana-Champaign)

"The University of Illinois Library is seeking an energetic and creative person to serve as the lead in the Library's collection and delivery of digital geospatial data and associated geographic information systems (GIS) services. The GIS Librarian will coordinate all aspects of the Library's digital geospatial information program at the intersection of user needs, technology, and data content. Creating a portal to geospatial data, the GIS Librarian will move the profession and services offered from traditional map librarianship towards geoinformatics with an increased interest in modeling geospatial data and in techniques applied to geospatial information systems for data management, retrieval, and analysis."

What a position! I have not posted a position here in a while as my regular blogging habits gave way to the business of this semester, but this position seems so exciting that I can not help myself... The folks over at Urbana-Champaign sure know what it takes to define the role of a GIS Librarian. If anyone out there is unsure about such a role, this job announcement is a fantastic concise summary.

Here are some snippets from the job announcement:
  • "This is a full-time faculty position in the Map and Geography Library of the University Library."
  • "Provides reference service, research assistance, and instruction in the selection and possible uses of digital geospatial data through in-person and remote one-on-one interactions, workshops/seminars, websites, blogs, and other avenues of widely distributed communication..."
  • "Works with teaching faculty to implement GIS modules in courses. Prepares datasets to support course assignments."
  • "GIS Librarian will craft a digital geospatial data collection development plan and will develop a campus clearinghouse and archive for digital geospatial data."
  • "GIS Librarian will describe acquired data using appropriate metadata schema and mount and maintain data on Library or University servers, providing access to the campus community at large."
  • "Serves as Library's contact person for campus site-license software such as ESRI and ERDAS
    products."
  • "Develops and maintains close liaison relationships with local and state geospatial data producers."
There is no specific mention of a salary here, but considering the requirements and the high research level of the university, the salary should be nice.

Monday, April 16, 2007

GIS-Related Dissertations: Latest Batch


Here's the latest installment of GIS-related dissertations that have caught my eye. Previous lists include 11/05, 01/06, 01/06, 04/06, 07/06, 10/06. (Of course, this is not a comprehensive list...only those I find interesting.)

Now, I will continue to link to the ProQuest Digital Dissertations public database, which provides free abstracts and 24 page previews for many dissertations and theses. However, due to Proquest's migration to the ProQuest Dissertations & Theses (PQDT) database, the public database will provide only brief citations beginning on July 21, 2007. I will link to full-text versions if Google can point them out to me.
  • Airfare, competition, and spatial structure: New evidence in the United States airline deregulation, by Gong, Gang, Ph.D., Boston University, 2006, 170 pages.
    • "The dynamics of airline deregulation has caused dramatic changes in airfare and competition structure... The spatial distribution of airfare has not been even. Pricing dynamics have resulted in geographic patterns of lower airfare for cities in the west and southwestern United States while higher airfare was found in the South, New England, and Midwest."
    • I find this research extremely interesting and I bet a lot of folks would find this intriguing as well. Perhaps if I can pull the data together, this would make an excellent workshop next Fall 2007 semester. See here for Spring 2007 workshops.
  • Communal ontology for navigation support in urban region: Getting directions from familiar landmarks, by Hong, Ilyoung, PhD, State University of New York at Buffalo, 2007.
    • "This dissertation proposes a communal ontology as a type of regional knowledge with a formal structure that can be incorporated with geographical information systems. As part of an effort toward the realization of community wayfinding, this research explores several methodologies. To figure out what the shared geographical places are, the preference and degree of familiarity of different places are measured using the behavioral geographer's methodology. For investigating similar geographical interests within a community, social network analysis is conducted with the help of a person-place matrix and centrality measures are calculated."
  • A co-evolutionary cellular automata for the integration of spatial and temporal scales in forest management planning, by Mathey, Anne-Helene, PhD, The University of British Columbia, 2006.
    • "This thesis presents a case for more holistic numerical planning tools which can handle spatial objectives and inter-temporal trade-offs. A novel algorithm based on cellular automata (CA) is designed to address forest planning objectives that are both spatial and temporal and subject to global constraints."
  • A geographic information system prototype for archived data from intelligent transportation systems: A multidimensional analysis, by Cusack, Maggie, PhD, State University of New York at Albany, 2006.
    • "This work suggests a GIS prototype that will exploit existing industry data collection technologies, and apply sound Information Science (IS) principles to a growing transportation industry database problem. The prototype demonstrates a rational approach to applying those principals to the ITS data archiving and retrieval problem, with emphasis on the possibilities for data analysis."
  • Archaeological predictive model of southwestern Kansas, by Campbell, Joshua Stewart, M.A., The University of Kansas, 2006, 131 pages.
    • "Knowledge on the archaeological condition of southwestern Kansas is anomalously low, therefore a high-resolution archaeological predictive model has been constructed for the High Plains region of southwestern Kansas. Using quantitative data about the environment as independent variables, the model was constructed using a combination of Geographic Information Systems (GIS) and statistical software."
  • Association between ozone and emergency department visits: Application of geostatistics and geographic information systems (GIS), by Choi, Mona, Ph.D., University of Maryland, Baltimore, 2006, 123 pages.
    • "Using traditional statistics and geostatistics in combination with GIS, the association between ozone concentration and emergency department (ED) admissions for cardiovascular and respiratory conditions were examined at the ZIP code level... Findings suggest that respiratory and cardiovascular ED visits increased even at lower ozone concentration than the EPA's air quality standards."
  • A Web-based spatial decision support system for utilizing organic wastes as renewable energy resources in New York State, by Ma, Jianguo, Ph.D., Cornell University, 2006, 129 pages.
    • "As the 3rd largest dairy state in the nation and the host for many food waste generators, New York State produces a large amount of organic waste. Recently there has been a renewed interest in farm-based co-digestion, which has created strong needs for research in this field... [A] Web-based spatial decision support system (SDSS) is developed by integrating geographic information systems (GIS), the Internet, and modeling. ArcGIS, Manifold, VB.Net, JavaScript and HTML are used during the design process. This system consists of three modules: (1) Dynamic Mapping and Querying; (2) Food Waste Estimator; and (3) Co-digestion Economic Analysis."
  • Bahamian cave and karst geodatabase, and GIS analysis of San Salvador Island, Bahamas, by Walker, Adam Dennis, M.S., Mississippi State University, 2006, 94 pages.
    • Full-Text
    • "A geodatabase and a data management program have been created to store and manipulate cave and karst feature data from the Bahamas. A geographic information system was used to recognize any spatial patterns in the cave and karst data from San Salvador Island."
  • Comprehensive conservation modeling: A spatially explicit individual-based approach using grizzly bears as a case study, by Backus, Vickie Marie, Ph.D., The University of Utah, 2006, 239 pages.
    • "This dissertation illustrates how a mechanistic bottom-up approach to constructing a spatially explicit individual-based model (IBM) provides the proper theoretical and operational frameworks for constructing population viability analysis (PVA) models that avoid many of the substantive and theoretical criticism of the conventional demographic models used in PVA. Using Java™, such a model is developed for the grizzly bear population of the Cabinet-Yaak Ecosystem."
  • Creation of a system for assessing and communicating the risks associated with terrestrial chemical spills, by Bryant, Derek L., Ph.D., Vanderbilt University, 2006, 85 pages.
    • Full-Text
    • "Adequately preparing for and responding to potential terrestrial (land-based) chemical spills are critical to the protection of human health and ecology. In this research, an environmental risk management system is developed to support analysis and facilitate decision-making for terrestrial chemical spill planning and response... The system leverages geographic information systems (GIS) technology to assess and delineate the immediate threat to human and environmental receptors from terrestrial chemical spills. It characterizes a spilled chemical's ability to immediately impact human health, groundwater, surface water, and soil resources, and incorporates these four receptors into an overall measure of terrestrial chemical risk."