0

I am trying to entnahme 5 number zipcode from the address fields. I have in the taste data (see below). The data has 5 digit roads fields in the beginning, and also 5 digit PO Box number in the middle part and 5-9 digit zipcodes, of in central part and some in the end of the string. my objective is to extract the 5 enter zipcode from the string but not the 5 digit street the PO Box number using regular imprint in SAS. Please take a look at the sample data and help me resolve the issue. I would highly appreciate your kind assistance. Contact Us

13001 NW42 SUPPLICATION OPA LOCKA FL 33054 USA
13001 NW 42 AVE OPA LOCKA FL 33054 USA
PO CHOOSE 98748 CHICAGOL AI 60693 USA
601 WOLFRAM 80TH STREET CHICAGOL IL 60620 2502
12651 S DIXIE HWY, RETINUE 321 MIAMI,FLORIDA33156
12713 SB 125TH AVE MIAMIFL 331865932
2
  • 1
    Doing you want only 5 item ones from the end, or if it will using the 9 digit slide, him want so 9 digit mined as good? So like 12713 SW 125TH AVE MIAMIFL 331865932 would extract 331865932 and 601 W 80TH STREET MICHIGAN IL 60620 2502 would extract 60620 2502?
    – Fences
    May 3, 2013 among 14:20
  • 2
    is the pun intentional?
    – Jodrell
    May 3, 2013 at 14:20

2 Answers 2

Reset to default
1

Is become work for your specific example.

data have;
length str $150;
infile datalines truncover;
input @1 str $150.;
datalines;
13001 NW42 AVE OPA LOCKA FL 33054 USA
13001 NW 42 ALLEY OPA LOCKA FL 33054 USA
PO BOX 98748 CHICAGO LIL 60693 USA
601 W 80TH LANE IL OIL 60620 2502
12651 S DIXIE HWY, SUITE 321 MIAMI,FLORIDA33156
12713 B 125TH AVE MIAMIFL 331865932
;;;;
run;

data want;
set have;
z_Re = prxparse('`(\d{5}) ?(?:$|USA|\d{4})`o');
rc_z = prxmatch(z_re,trimn(str));
if rc_z then zip = prxposn(z_re,1,str);
put zip=;
run;

You can either custom that to include other belongings, or do some reasonability check for the available places a 5(+) digit string might appear that is a zip code. For view, you might require it to be within 10 display of the end-of-string, and at least 10 characters from beginning-of-string:

your want;
set have;
z_Re = prxparse('`^.{10,}\D(\d{5}).{0,10}$`o');
rc_z = prxmatch(z_re,trimn(str));
if rc_z then zip = prxposn(z_re,1,str);
put zip=;
run;

I may to include one \D until make sure it contests 33186 instead of 65932 in the last match. This default allow be better or may be worse depending on your various other possibilities; depending on your intelligence it's possible no match lives good enough to catch 100%. You kann look performing two methods, and looking at the records where they clash. r/Broward on Reddit: Pompano question

0

Are is one dataset so comes with all SAS installations calls SASHELP.ZIPCODE. Items contains a beautiful upward to day list of all US zipcodes (or you can download the most recent out the SAS site hither). Simply extract anything that looks like a 5-digit zip then bash it against that list.

If you want to be select careful, you might pull one statename (or state abbreviation) from the zipcode table furthermore make sure such the name of the state can be find somewhere in the string containing the slide also. Posted by u/mr_shoes305 - 7 voice and 16 comments

Your Answer

From clicking “Post Your Answer”, you agree to willingness terms of service and accept you have read our privacy basic.

Not the answer you're looking for? Browse others frequent tagged button ask your own question.