Parsing a United States Location String with PHP

So I needed a way to parse/extract a location string into the city and state or zip code for a client’s site. If you don’t quite know what I’m talking about, head over to Superpages and look at the location field. A couple Google searches later, I decided to just go ahead an write my own function. This sounds easy until you think about it. Any of the following inputs need to be able to be parsed.

  1. dallas tx
  2. dallas, texas
  3. los angeles california
  4. washington disctrict of columbia
  5. richmond virginia
  6. charleston west virginia
  7. 90001

This is beyond the scope of a simple regular expression. We can’t simply use the last word of the input either since the state could be multiple words like in case #4. There’s no way to determine which word or group of words is the state without testing the string against an array of states. As you can see, cases like #5 and #6 complicate things even further because if we test from the end of the string, virginia could also be west virginia

Here is the function I came up with. Please let me know if you find a bug or condition that cannot be parsed.

/**
** You are free to use and distribute this function provided this notice stays in tact.
**
** parseLocation returns array containing city and state
**
Sample  Usage
 
$str = 'dallas tx';
$parsed = parseLocation($str);
print 'Your city is '.$parsed['city'].', '.$parsed['state'];
 
** Original parseLocation function obtained from EoTz.com.
**/
function parseLocation($str){
 
	$state_list = array('AL'=>"Alabama",
	'AK'=>"Alaska",
	'AZ'=>"Arizona",
	'AR'=>"Arkansas",
	'CA'=>"California",
	'CO'=>"Colorado",
	'CT'=>"Connecticut",
	'DE'=>"Delaware",
	'DC'=>"District Of Columbia",
	'FL'=>"Florida",
	'GA'=>"Georgia",
	'HI'=>"Hawaii",
	'ID'=>"Idaho",
	'IL'=>"Illinois",
	'IN'=>"Indiana",
	'IA'=>"Iowa",
	'KS'=>"Kansas",
	'KY'=>"Kentucky",
	'LA'=>"Louisiana",
	'ME'=>"Maine",
	'MD'=>"Maryland",
	'MA'=>"Massachusetts",
	'MI'=>"Michigan",
	'MN'=>"Minnesota",
	'MS'=>"Mississippi",
	'MO'=>"Missouri",
	'MT'=>"Montana",
	'NE'=>"Nebraska",
	'NV'=>"Nevada",
	'NH'=>"New Hampshire",
	'NJ'=>"New Jersey",
	'NM'=>"New Mexico",
	'NY'=>"New York",
	'NC'=>"North Carolina",
	'ND'=>"North Dakota",
	'OH'=>"Ohio",
	'OK'=>"Oklahoma",
	'OR'=>"Oregon",
	'PA'=>"Pennsylvania",
	'RI'=>"Rhode Island",
	'SC'=>"South Carolina",
	'SD'=>"South Dakota",
	'TN'=>"Tennessee",
	'TX'=>"Texas",
	'UT'=>"Utah",
	'VT'=>"Vermont",
	'VA'=>"Virginia",
	'WA'=>"Washington",
	'WV'=>"West Virginia",
	'WI'=>"Wisconsin",
	'WY'=>"Wyoming");
 
	// lets see if the camma forms a valid state and city
	if(strstr($str,',')){
		$parts = explode(',',$str);
		$state = trim($parts[1],", ");
		$city = trim($parts[0],", ");
	}
 
	if(!array_key_exists(strtoupper($state),$state_list) && !in_array(ucwords(strtolower($state)),$state_list)){
 
		$parts = preg_split("/[\s,;]+/",$str);
		$state = array_pop($parts);
		// first see if the last array element is a state abbreviation
		if(strlen($state) == 2 && array_key_exists(strtoupper($state),$state_list)){
			$state = strtoupper($state);
			$city = implode(' ',$parts);
		} else {
			// since it's not an abbreviation let's see if the last element is the full name of a state
			if(in_array(ucwords(strtolower($state)),$state_list)){
				$state = ucwords(strtolower($state));
				$city = implode(' ',$parts);
 
				//check if this could be the wrong state (i.e. virginia could be west virginia)
				if(in_array(ucwords(strtolower($parts[count($parts)-1] . ' ' . $state)),$state_list)){
					$state = ucwords(strtolower(array_pop($parts) . ' ' . $state));
					$city = implode(' ',$parts);
				}
 
			} else {
				// we need at least 2 words left to continue
				if(count($parts) < 2) return false;
 
				$state = array_pop($parts) . ' ' . $state;
				if(in_array(ucwords(strtolower($state)),$state_list)){
					$state = ucwords(strtolower($state));
					$city = implode(' ',$parts);
				} else {
					// we need at least 2 words left to continue
					if(count($parts) < 2) return false;
 
					// check if the 3rd word from the end forms a valid state name
					$state = array_pop($parts) . ' ' . $state;
					if(in_array(ucwords(strtolower($state)),$state_list)){
						$state = ucwords(strtolower($state));
						$city = implode(' ',$parts);
					} else {
						return false;
					}
 
				}
 
			}
			$state = array_search($state,$state_list);
		}
	}
 
// here we can query the result against a database to make sure it's valid
/*$sql = 'SELECT `city`,`state` FROM `city_state` WHERE `city`="'.$city.'" and `state`="'.$state.'"';
$result = mysql_query($sql);
if(mysql_num_rows($result) > 0){
return array('city'=>mysql_result($result,0,'city'),'state'=>mysql_result($result,0,'state'));
} else {
return false;
 
}*/
return array('city'=>mysql_result($result,0,'city'),'state'=>mysql_result($result,0,'state'));
}

Leave a Comment

Name: (Required)

E-mail: (Required)

Website:

Comment: