EoTz.com Web Development Blog

July 21, 2008

Parsing a United States Location String with PHP

Filed under: PHP — Tags: , — chris @ 8:15 pm

So I needed a way to parse/extract a location string into the city and state or zip code for a client’s site. If you don’t quite know what I’m talking about, head over to Superpages and look at the location field. A couple Google searches later, I decided to just go ahead an write my own function. This sounds easy until you think about it. Any of the following inputs need to be able to be parsed.

  1. dallas tx
  2. dallas, texas
  3. los angeles california
  4. washington disctrict of columbia
  5. richmond virginia
  6. charleston west virginia
  7. 90001

This is beyond the scope of a simple regular expression. We can’t simply use the last word of the input either since the state could be multiple words like in case #4. There’s no way to determine which word or group of words is the state without testing the string against an array of states. As you can see, cases like #5 and #6 complicate things even further because if we test from the end of the string, virginia could also be west virginia.

Here is the function I came up with. Please let me know if you find a bug or condition that cannot be parsed.

DEMO: locationparse.php
DOWNLOAD: locationparse.phps

/**
** You are free to use and distribute this function provided this notice stays in tact.
**
** parseLocation returns array containing city and state
**
Sample  Usage
$str = 'dallas tx';
$parsed = parseLocation($str);
print 'Your city is '.$parsed['city'].', '.$parsed['state'];
** Original parseLocation function obtained from EoTz.com.
** Last updated: 05/24/2011
**
**/
function parseLocation($str){
	$state_list = array('AL'=>"Alabama",
	'AK'=>"Alaska",
	'AZ'=>"Arizona",
	'AR'=>"Arkansas",
	'CA'=>"California",
	'CO'=>"Colorado",
	'CT'=>"Connecticut",
	'DE'=>"Delaware",
	'DC'=>"District Of Columbia",
	'FL'=>"Florida",
	'GA'=>"Georgia",
	'HI'=>"Hawaii",
	'ID'=>"Idaho",
	'IL'=>"Illinois",
	'IN'=>"Indiana",
	'IA'=>"Iowa",
	'KS'=>"Kansas",
	'KY'=>"Kentucky",
	'LA'=>"Louisiana",
	'ME'=>"Maine",
	'MD'=>"Maryland",
	'MA'=>"Massachusetts",
	'MI'=>"Michigan",
	'MN'=>"Minnesota",
	'MS'=>"Mississippi",
	'MO'=>"Missouri",
	'MT'=>"Montana",
	'NE'=>"Nebraska",
	'NV'=>"Nevada",
	'NH'=>"New Hampshire",
	'NJ'=>"New Jersey",
	'NM'=>"New Mexico",
	'NY'=>"New York",
	'NC'=>"North Carolina",
	'ND'=>"North Dakota",
	'OH'=>"Ohio",
	'OK'=>"Oklahoma",
	'OR'=>"Oregon",
	'PA'=>"Pennsylvania",
	'RI'=>"Rhode Island",
	'SC'=>"South Carolina",
	'SD'=>"South Dakota",
	'TN'=>"Tennessee",
	'TX'=>"Texas",
	'UT'=>"Utah",
	'VT'=>"Vermont",
	'VA'=>"Virginia",
	'WA'=>"Washington",
	'WV'=>"West Virginia",
	'WI'=>"Wisconsin",
	'WY'=>"Wyoming");
	// lets see if the comma forms a valid state and city
	if(strstr($str,',')){
		$parts = explode(',',$str);
		$state = trim($parts[1],", ");
		$city = trim($parts[0],", ");
	}
	if(!array_key_exists(strtoupper($state),$state_list) && !in_array(ucwords(strtolower($state)),$state_list)){
		$parts = preg_split("/[\s,;]+/",$str);
		$state = array_pop($parts);
		// first see if the last array element is a state abbreviation
		if(strlen($state) == 2 && array_key_exists(strtoupper($state),$state_list)){
			$state = strtoupper($state);
			$city = implode(' ',$parts);
		} else {
			// since it's not an abbreviation let's see if the last element is the full name of a state
			if(in_array(ucwords(strtolower($state)),$state_list)){
				$state = ucwords(strtolower($state));
				$city = implode(' ',$parts);
				//check if this could be the wrong state (i.e. virginia could be west virginia)
				if(in_array(ucwords(strtolower($parts[count($parts)-1] . ' ' . $state)),$state_list)){
					$state = ucwords(strtolower(array_pop($parts) . ' ' . $state));
					$city = implode(' ',$parts);
				}
			} else {
				// we need at least 2 words left to continue
				if(count($parts) < 2) return false;
				$state = array_pop($parts) . ' ' . $state;
				if(in_array(ucwords(strtolower($state)),$state_list)){
					$state = ucwords(strtolower($state));
					$city = implode(' ',$parts);
				} else {
					// we need at least 2 words left to continue
					if(count($parts) < 2) return false;
					// check if the 3rd word from the end forms a valid state name
					$state = array_pop($parts) . ' ' . $state;
					if(in_array(ucwords(strtolower($state)),$state_list)){
						$state = ucwords(strtolower($state));
						$city = implode(' ',$parts);
					} else {
						return false;
					}
				}
			}
			$state = array_search($state,$state_list);
		}
	}
// Here we can query the result against a database to make sure it's valid. Ignore this section if you don't want to check against a database.
/*$sql = 'SELECT `city`,`state` FROM `city_state` WHERE `city`="'.$city.'" and `state`="'.$state.'"';
$result = mysql_query($sql);
if(mysql_num_rows($result) > 0){
return array('city'=>mysql_result($result,0,'city'),'state'=>mysql_result($result,0,'state'));
} else {
return false;
}*/
return array('city'=>$city,'state'=>$state);
}

8 Comments »

  1. I found this parser script very helpful. Its really a great job done by u .
    Thanx for d same.

    Comment by Vipin — October 7, 2009 @ 12:52 am

  2. Jesus H. Christ, you rawk!!

    Comment by Communibus Locis — October 13, 2009 @ 6:05 pm

  3. Should the last line of the code be (if you aren’t using a database)?

    //return array(‘city’=>mysql_result($result,0,’city’),’state’=>mysql_result($result,0,’state’));
    return array(‘city’=>$city,’state’=>$state);

    Comment by Communibus Locis — October 13, 2009 @ 6:12 pm

  4. Yes, thanks for pointing that out. Should be fixed.

    Comment by chris — October 13, 2009 @ 6:20 pm

  5. So, unless there is a comma in the string it doesn’t work for me. Most of the examples you list at the top of the article don’t work, only the ones with a comma in them.

    Also, the last line has a syntax error, it should be: return array(‘city’=>$city,’state’=>$state); instead of return array(‘city’=$city,’state’=>$state);

    Comment by Nate — May 21, 2011 @ 10:55 pm

  6. WordPress messed up the formatting.

    $parts = preg_split(“/[s,;]+/”,$str);

    should be

    $parts = preg_split(“/[\s,;]+/”,$str);

    (Added the back slash before the “s”)

    This should solve your issue. Thanks.

    Comment by chris — May 24, 2011 @ 12:24 am

  7. What if want to convert $str “TN” to “TENNESSEE”?

    Comment by Steve — March 25, 2012 @ 2:21 pm

  8. Steve,
    The method is designed to parse a city and state. If you want only a state or if you want to check to see if the string only contains a state, I would just make a separate method to search that array of states before running parseLocation().
    Chris

    Comment by chris — April 4, 2012 @ 12:11 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress