Friday, October 05, 2007

Large Table Manipulation: ArcMap vs. Microsoft Access

Helped an engineering student yesterday to display a table with over 500,000 XY coordinates and convert it to a shapefile. The table the student brought was a 2.5 gig CSV file. It gave me some unexpected troubles.

The conclusion is that the manipulation of large tables is often times best done using Microsoft Access. Specifically, converting field types for large tables with hundreds of thousands of records is best done in Microsoft Access, and not in ArcMap.

If you are interested in the particulars, read on below.

Brought the table into ArcMap, but every field in the table was read by ArcMap as a text field. Opened the table in ArcMap, and all the values were numerical coordinates. Opened the table in Excel, and of course the table only partially opened as there were too many records. The portion of the table that did open were all numerical coordinates, but Excel was also reading the fields as text. No problem. Switched back over to ArcMap and attempted to add two new double fields and use the field calculator to copy the text fields into the double fields. ArcMap is unable to add fields to text files, so we had to export the table as a DBF first. This took over 5 minutes as there was so much data. Every time we tried to add a new double field to the large DBF file, the screen would turn white and would hang there indefinitely. We tried this twice, and gave up on each attempt after 5 minutes of inactivity.

Then I had the idea to import the table into a blank Microsoft Access database where I can directly change the field types from text to double. Before changing the field types, I opened the data table and sure enough I saw that every field was surrounded by quotation marks {"} as a text qualifier. THAT was the problem! Switched over the design view and changed the field type and sure enough that did the trick. ArcMap honors Access field specifications and we were able to view the event class in no time. Took a bit more patience waiting while we exported the shapefile, but it was nice to see the completion of what should have been a routine process.

Why were there quotation text qualifiers around each field? Well, the student told me the file was created using Excel 2007, which I have not yet used. Perhaps this is standard procedure for Excel 2007? If not, perhaps there is some setting in Excel that will inadvertently place text qualifiers around fields?

Repeating from above: The conclusion is that the manipulation of large tables is often times best done using Microsoft Access.

19 comments:

Emilio Mayorga said...

Alternatively, manipulation of csv files (such as removing quotes) can be done with a good text editor -- specifically not Notepad. TextPad is a good option (http://textpad.com/), but there are many others.

Anonymous said...

in arcmap tools -> add xy data didnt work?

Anonymous said...

Alternatively, manipulation of any large text file is best left to the tools which were created over 20 years ago for this very purpose: awk, sed, grep, cut, paste, etc. Spreadsheets and database programs are not an ideal environment for pre-processing large text files.

mapz said...

emilio and dylan, you guys are so right. Tried opening the table in Notepad, but it could not handle the size of the file. Using textpad or a more traditional straight text editor without the limitations of Notepad is the best and most direct solution... Where were you guys yesterday when I was struggling with this, eh?? (kidding)

Appreciate the tip!

mapz said...

anonymous, you can only display xy data using numeric fields for x and y. The quotes in the x and y fields were forcing ArcMap to read these fields as text. The fields must somehow be converted to a numeric field format (such as a double) to view the xy data. What I did was import the table into Access and force the field types into a double. Another solution, as suggested by emilio and dylan, is to open the file in a text editor (not Notepad) and replace the quotes. Without the quotes, ArcMap would read these fields as doubles.

Anonymous said...

It would also be possible to write a small script (fewer than 10 lines) in a language such as Python which would read the CSV file line by line and extract the quotes.

mapz said...

justin, yep this sure would be an easy Python script to write. Read the file line by line, replace the quotes, and write the file out again.

Anonymous said...

also, with one click you can convert csv to dbf if you handle a lot of csv or xls files and want them to be dbf
http://www.ozgrid.com/file-conversion/advanced-csv-converter.htm#buy

Dylan said...

Hi, as a follow-up here are some generic constructs which make awk, sed, or grep extremely useful when working with large quantities of fussy text:
sed 's/"//g' infile > outfile
This would replace, globally, all occurrences of '"' with an empty string. Stream editors are the best way to handle big files like this. The best part is that you don't need to open any expensive or CPU-intensive apps to use them. And, they are free!.
Good Luck on future stuff,

Dylan

Mike C. said...

For manipulating large text formats you can use Ultraedit32. I have used to convert various text files and it worked excellent. Replacing characters is very quick, saving the same and a lot of useful characteristics.

Unknown said...

Is anyone aware of being able to GIS map some data using excel and Access so that as you zoom into an area & can see a small scale graph of frequency distributions then be able to effectively zoom out to the frequency distributions for the area..For example, I have 3 zones for sampling, within that i have sub zones which are further divided up into sample areas...each need a graph.

Any ideas about how to do this?

Anonymous said...

As this student was using Excel 2007 he probably also has ACCESS 2007. This program now can link large textfiles in stead of downloading. Now it is possible to use the import/export specifications for the linking process as well, so why using Excel first? Directly link the textfile using a self managed specification is much easier...

Anonymous said...

Manipulating numeric data can be done with numeric software, as MATLAB (expensive), OCTAVE (free), and many similar others.
One has to study the the syntax, but it worth it.

Anonymous said...

Wow thanks for this post! It is really helpful.. best regards, nikolai

Anonymous said...

Good Morning Children Ltd was first thought of in February 2006 with the company being launched in April 2006, closely followed by the website launch in Semptember 2006.  It is run by Mike and Chrissie who have about 50 years of teaching experience between them, plus five academic years of providing our resources to the primary education market.  Our rapidly expanding business has led to thousands of classes in the UK and beyond regularly using our resources.
Google Search Engine
Creative Thinking Skills
Primary Teaching Resources
Thinking and Reasoning

Bella Tran said...

Thanks for all your information,
happy wheels | juegos kizi | mobogenie descargar | agario | agar io | my little pony games | fireboy and watergirl | five nights at freddy's 4 | h1z1 | Gmod

juegosfriv2017.live said...

Vielen Dank für eine weitere fantastische Buchung. Wo sonst könnte jemand diese Art von Informationen in einer so perfekten Art zu schreiben? Ich habe eine Rede nächste Woche, und ich war auf der Suche nach mehr Info.

Robert11 said...

I have read your article; it is very informative and helpful for me. I admire the valuable information you offer in your articles. Thanks for posting it 먹튀검증커뮤니티

Robert11 said...

Thank you for your post, I look for such article along time, today i find it finally. this post give me lots of advise it is very useful for me buy website traffic