Parsing a United States Location String with PHP
So I needed a way to parse/extract a location string into the city and state or zip code for a client’s site. If you don’t quite know what I’m talking about, head over to Superpages and look at the location field. A couple Google searches later, I decided to just go ahead an write my own function. This sounds easy until you think about it. Any of the following inputs need to be able to be parsed.
- dallas tx
- dallas, texas
- los angeles california
- washington disctrict of columbia
- richmond virginia
- charleston west virginia
- 90001
This is beyond the scope of a simple regular expression. We can’t simply use the last word of the input either since the state could be multiple words like in case #4. There’s no way to determine which word or group of words is the state without testing the string against an array of states. As you can see, cases like #5 and #6 complicate things even further because if we test from the end of the string, virginia could also be west virginia
Here is the function I came up with. Please let me know if you find a bug or condition that cannot be parsed.
/** ** You are free to use and distribute this function provided this notice stays in tact. ** ** parseLocation returns array containing city and state ** Sample Usage $str = 'dallas tx'; $parsed = parseLocation($str); print 'Your city is '.$parsed['city'].', '.$parsed['state']; ** Original parseLocation function obtained from EoTz.com. **/ function parseLocation($str){ $state_list = array('AL'=>"Alabama", 'AK'=>"Alaska", 'AZ'=>"Arizona", 'AR'=>"Arkansas", 'CA'=>"California", 'CO'=>"Colorado", 'CT'=>"Connecticut", 'DE'=>"Delaware", 'DC'=>"District Of Columbia", 'FL'=>"Florida", 'GA'=>"Georgia", 'HI'=>"Hawaii", 'ID'=>"Idaho", 'IL'=>"Illinois", 'IN'=>"Indiana", 'IA'=>"Iowa", 'KS'=>"Kansas", 'KY'=>"Kentucky", 'LA'=>"Louisiana", 'ME'=>"Maine", 'MD'=>"Maryland", 'MA'=>"Massachusetts", 'MI'=>"Michigan", 'MN'=>"Minnesota", 'MS'=>"Mississippi", 'MO'=>"Missouri", 'MT'=>"Montana", 'NE'=>"Nebraska", 'NV'=>"Nevada", 'NH'=>"New Hampshire", 'NJ'=>"New Jersey", 'NM'=>"New Mexico", 'NY'=>"New York", 'NC'=>"North Carolina", 'ND'=>"North Dakota", 'OH'=>"Ohio", 'OK'=>"Oklahoma", 'OR'=>"Oregon", 'PA'=>"Pennsylvania", 'RI'=>"Rhode Island", 'SC'=>"South Carolina", 'SD'=>"South Dakota", 'TN'=>"Tennessee", 'TX'=>"Texas", 'UT'=>"Utah", 'VT'=>"Vermont", 'VA'=>"Virginia", 'WA'=>"Washington", 'WV'=>"West Virginia", 'WI'=>"Wisconsin", 'WY'=>"Wyoming"); // lets see if the camma forms a valid state and city if(strstr($str,',')){ $parts = explode(',',$str); $state = trim($parts[1],", "); $city = trim($parts[0],", "); } if(!array_key_exists(strtoupper($state),$state_list) && !in_array(ucwords(strtolower($state)),$state_list)){ $parts = preg_split("/[\s,;]+/",$str); $state = array_pop($parts); // first see if the last array element is a state abbreviation if(strlen($state) == 2 && array_key_exists(strtoupper($state),$state_list)){ $state = strtoupper($state); $city = implode(' ',$parts); } else { // since it's not an abbreviation let's see if the last element is the full name of a state if(in_array(ucwords(strtolower($state)),$state_list)){ $state = ucwords(strtolower($state)); $city = implode(' ',$parts); //check if this could be the wrong state (i.e. virginia could be west virginia) if(in_array(ucwords(strtolower($parts[count($parts)-1] . ' ' . $state)),$state_list)){ $state = ucwords(strtolower(array_pop($parts) . ' ' . $state)); $city = implode(' ',$parts); } } else { // we need at least 2 words left to continue if(count($parts) < 2) return false; $state = array_pop($parts) . ' ' . $state; if(in_array(ucwords(strtolower($state)),$state_list)){ $state = ucwords(strtolower($state)); $city = implode(' ',$parts); } else { // we need at least 2 words left to continue if(count($parts) < 2) return false; // check if the 3rd word from the end forms a valid state name $state = array_pop($parts) . ' ' . $state; if(in_array(ucwords(strtolower($state)),$state_list)){ $state = ucwords(strtolower($state)); $city = implode(' ',$parts); } else { return false; } } } $state = array_search($state,$state_list); } } // here we can query the result against a database to make sure it's valid /*$sql = 'SELECT `city`,`state` FROM `city_state` WHERE `city`="'.$city.'" and `state`="'.$state.'"'; $result = mysql_query($sql); if(mysql_num_rows($result) > 0){ return array('city'=>mysql_result($result,0,'city'),'state'=>mysql_result($result,0,'state')); } else { return false; }*/ return array('city'=>mysql_result($result,0,'city'),'state'=>mysql_result($result,0,'state')); } |