Parsing a United States Location String with PHP
So I needed a way to parse/extract a location string into the city and state or zip code for a client’s site. If you don’t quite know what I’m talking about, head over to Superpages and look at the location field. A couple Google searches later, I decided to just go ahead an write my own function. This sounds easy until you think about it. Any of the following inputs need to be able to be parsed.
- dallas tx
- dallas, texas
- los angeles california
- washington disctrict of columbia
- richmond virginia
- charleston west virginia
- 90001
This is beyond the scope of a simple regular expression. We can’t simply use the last word of the input either since the state could be multiple words like in case #4. There’s no way to determine which word or group of words is the state without testing the string against an array of states. As you can see, cases like #5 and #6 complicate things even further because if we test from the end of the string, virginia could also be west virginia.
Here is the function I came up with. Please let me know if you find a bug or condition that cannot be parsed.
DEMO: locationparse.php
DOWNLOAD: locationparse.phps
/** ** You are free to use and distribute this function provided this notice stays in tact. ** ** parseLocation returns array containing city and state ** Sample Usage $str = 'dallas tx'; $parsed = parseLocation($str); print 'Your city is '.$parsed['city'].', '.$parsed['state']; ** Original parseLocation function obtained from EoTz.com. ** Last updated: 05/24/2011 ** **/ function parseLocation($str){ $state_list = array('AL'=>"Alabama", 'AK'=>"Alaska", 'AZ'=>"Arizona", 'AR'=>"Arkansas", 'CA'=>"California", 'CO'=>"Colorado", 'CT'=>"Connecticut", 'DE'=>"Delaware", 'DC'=>"District Of Columbia", 'FL'=>"Florida", 'GA'=>"Georgia", 'HI'=>"Hawaii", 'ID'=>"Idaho", 'IL'=>"Illinois", 'IN'=>"Indiana", 'IA'=>"Iowa", 'KS'=>"Kansas", 'KY'=>"Kentucky", 'LA'=>"Louisiana", 'ME'=>"Maine", 'MD'=>"Maryland", 'MA'=>"Massachusetts", 'MI'=>"Michigan", 'MN'=>"Minnesota", 'MS'=>"Mississippi", 'MO'=>"Missouri", 'MT'=>"Montana", 'NE'=>"Nebraska", 'NV'=>"Nevada", 'NH'=>"New Hampshire", 'NJ'=>"New Jersey", 'NM'=>"New Mexico", 'NY'=>"New York", 'NC'=>"North Carolina", 'ND'=>"North Dakota", 'OH'=>"Ohio", 'OK'=>"Oklahoma", 'OR'=>"Oregon", 'PA'=>"Pennsylvania", 'RI'=>"Rhode Island", 'SC'=>"South Carolina", 'SD'=>"South Dakota", 'TN'=>"Tennessee", 'TX'=>"Texas", 'UT'=>"Utah", 'VT'=>"Vermont", 'VA'=>"Virginia", 'WA'=>"Washington", 'WV'=>"West Virginia", 'WI'=>"Wisconsin", 'WY'=>"Wyoming"); // lets see if the comma forms a valid state and city if(strstr($str,',')){ $parts = explode(',',$str); $state = trim($parts[1],", "); $city = trim($parts[0],", "); } if(!array_key_exists(strtoupper($state),$state_list) && !in_array(ucwords(strtolower($state)),$state_list)){ $parts = preg_split("/[\s,;]+/",$str); $state = array_pop($parts); // first see if the last array element is a state abbreviation if(strlen($state) == 2 && array_key_exists(strtoupper($state),$state_list)){ $state = strtoupper($state); $city = implode(' ',$parts); } else { // since it's not an abbreviation let's see if the last element is the full name of a state if(in_array(ucwords(strtolower($state)),$state_list)){ $state = ucwords(strtolower($state)); $city = implode(' ',$parts); //check if this could be the wrong state (i.e. virginia could be west virginia) if(in_array(ucwords(strtolower($parts[count($parts)-1] . ' ' . $state)),$state_list)){ $state = ucwords(strtolower(array_pop($parts) . ' ' . $state)); $city = implode(' ',$parts); } } else { // we need at least 2 words left to continue if(count($parts) < 2) return false; $state = array_pop($parts) . ' ' . $state; if(in_array(ucwords(strtolower($state)),$state_list)){ $state = ucwords(strtolower($state)); $city = implode(' ',$parts); } else { // we need at least 2 words left to continue if(count($parts) < 2) return false; // check if the 3rd word from the end forms a valid state name $state = array_pop($parts) . ' ' . $state; if(in_array(ucwords(strtolower($state)),$state_list)){ $state = ucwords(strtolower($state)); $city = implode(' ',$parts); } else { return false; } } } $state = array_search($state,$state_list); } } // Here we can query the result against a database to make sure it's valid. Ignore this section if you don't want to check against a database. /*$sql = 'SELECT `city`,`state` FROM `city_state` WHERE `city`="'.$city.'" and `state`="'.$state.'"'; $result = mysql_query($sql); if(mysql_num_rows($result) > 0){ return array('city'=>mysql_result($result,0,'city'),'state'=>mysql_result($result,0,'state')); } else { return false; }*/ return array('city'=>$city,'state'=>$state); } |
I found this parser script very helpful. Its really a great job done by u .
Thanx for d same.
Comment by Vipin — October 7, 2009 @ 12:52 am
Jesus H. Christ, you rawk!!
Comment by Communibus Locis — October 13, 2009 @ 6:05 pm
Should the last line of the code be (if you aren’t using a database)?
//return array(‘city’=>mysql_result($result,0,’city’),’state’=>mysql_result($result,0,’state’));
return array(‘city’=>$city,’state’=>$state);
Comment by Communibus Locis — October 13, 2009 @ 6:12 pm
Yes, thanks for pointing that out. Should be fixed.
Comment by chris — October 13, 2009 @ 6:20 pm
So, unless there is a comma in the string it doesn’t work for me. Most of the examples you list at the top of the article don’t work, only the ones with a comma in them.
Also, the last line has a syntax error, it should be: return array(‘city’=>$city,’state’=>$state); instead of return array(‘city’=$city,’state’=>$state);
Comment by Nate — May 21, 2011 @ 10:55 pm
WordPress messed up the formatting.
$parts = preg_split(“/[s,;]+/”,$str);
should be
$parts = preg_split(“/[\s,;]+/”,$str);
(Added the back slash before the “s”)
This should solve your issue. Thanks.
Comment by chris — May 24, 2011 @ 12:24 am
What if want to convert $str “TN” to “TENNESSEE”?
Comment by Steve — March 25, 2012 @ 2:21 pm
Steve,
The method is designed to parse a city and state. If you want only a state or if you want to check to see if the string only contains a state, I would just make a separate method to search that array of states before running parseLocation().
Chris
Comment by chris — April 4, 2012 @ 12:11 pm