Friday, April 10, 2015

Data Normalization, Geocoding, & Error Assessment

Goals and Objectives: The main goal for this lab was to take the data provided by the Wisconsin DNR and locate all the sand mines using geocoding in ArcGIS. Geocoding is needed because we have to normalize the data that is incorrect. These errors make finding the locations harder than it needs to be for whoever is using the data. We also needed to compare our normalized data with the actual locations of the sand mines so that we could see how far off we were on the distances between the two points.

Methods: Our first step was to normalize the data that was given to us. Some sand mines had an address and some used the PLSS for their locations. There were even mines that had both. To help normalize the table I had to separate and create new fields such as PLSS, Street Address, City, State, and Zip Code. Some fields that didn't have an address or PLSS I just left blank. Using the Geocoding Tool in ArcGIS the addresses where placed on the map. The problem is most of these addresses was that they were in the wrong spot. With the PLSS the address won't even map, I had to use the PLSS quarter-quarter layer to pinpoint an address for the mine. Once I figured out where all the mines were I used the Merge Tool to combine the mine locations that my classmates had to geocode. Now the actual mine locations were to be added the map so that we could see how close the geocoded locations to the actual locations. For this I used the Point Distance Tool which showed the distance between each of the mines.



Figure 1: (Table 1) This is a picture of the WI DNR data that we received. This table is an example of data that is not normalized

Figure 2: (Table 2) This is a picture of that same data normalized and using the new fields such as PLSS, County, etc.


Results: Here are the results comparing where the geocoded mines are in relation to the distance of the actual mine locations. The purple dots represent the sand mines the class located and the green dots show where the mine is. The table shows the distances between the actual and geocoded mines locations.


Figure 3: A map showing the geocoded and actual sand mines across WI.


Figure 4: This is the table that shows the distance between the two locations


Discussion: There are always going to be people out there that cut corners and don't provide the best work. With gross, systematic, and random errors it can mess up data that really shouldn't be too hard to mess up in the first place. The problem with the table we got from the WI DNR was that there was operational error. The information placed in the table was disorganized and this made some of the attribute data input confusing.

Conclusion: This lab taught me that fixing other peoples mistakes can take up a lot of your own time. From normalizing the data to locating it on a map it can get a little hectic. The main point is to check for data accuracy because even though you may geocode you might not get the address that matches up on the map.

Sources:

Wisconsin DNR

Lo, C. P., Albert K. W. Yeung, and C. P. Lo. "Chapter 4." Concepts and Techniques in Geographic Information Systems. Upper Saddle River, NJ: Pearson Prentice Hall, 2003. N. pag. Print


No comments:

Post a Comment