search a string to create dummy variables



  • I have a string variable I have created that is a concatenation of 12 columns, each of the 12 columns contains 4 or 5 digit character. I want to create a series of dummy variables, based on the contents of this string (named diag_string) . My code is as follows:

    case when diag_string like '%A0[0-9]%' THEN 1 ELSE 0 END AS  [A00_A09] 
    case when diag_string like '%A[15-19]%' THEN 1 ELSE 0 END AS  [A15_A19] 
    case when diag_string like '%A[20-49]%' THEN 1 ELSE 0 END AS  [A20_A49] 
    case when diag_string like '%A[50-64]%' THEN 1 ELSE 0 END AS  [A50_A64] 
    case when diag_string like '%A[65-79]%' THEN 1 ELSE 0 END AS  [A65_A79] 
    case when diag_string like '%A[80-89]%' THEN 1 ELSE 0 END AS  [A80_A89] 
    case when diag_string like '%A[90-99]%' THEN 1 ELSE 0 END AS  [A90_A99] 
    

    So the first dummy variable, [A00_A09], should be set to 1 where the string contains any values that are A01 through to A09.

    The above is only a subset of the code, there are other dummy variables beginning with B all the way through to Z, but are split up in different ways, for example:

    case when diag_string like '%Z0[0-13]%' THEN 1 ELSE 0 END AS  [Z00_Z13] 
    case when diag_string like '%Z[20-29]%' THEN 1 ELSE 0 END AS  [Z20_Z29] 
    case when diag_string like '%Z[30-39]%' THEN 1 ELSE 0 END AS  [Z30_Z39] 
    case when diag_string like '%Z[40-54]%' THEN 1 ELSE 0 END AS  [Z40_Z54] 
    case when diag_string like '%Z[55-65]%' THEN 1 ELSE 0 END AS  [Z55_Z65] 
    case when diag_string like '%Z[70-76]%' THEN 1 ELSE 0 END AS  [Z70_Z76] 
    case when diag_string like '%Z[80-99]%' THEN 1 ELSE 0 END AS  [Z80_Z99
    

    There is an error somewhere, in that, for example, a record is being flagged as having a code A50 to A64, but inspection of the record shows that this is not the case.

    I am not sure why this code is not working. Is anyone able to advise what the issue might be?



  • I think I see your issue. [] represent a single character. The way you have it written %A[50-64]%', the character immediately following the A needs to be 5, or anything from 0-6 or a 4. Basically, any number 0 to 6.

    If A will always be a leading charator? You can try the below code. Otherwise, put the '%' back before the 'A'. I like to avoid leading '%' to make the code more efficient.

    case when diag_string like 'A0[0-9]%' THEN 1 ELSE 0 END AS  [A00_A09] 
    case when diag_string like 'A1[5-9]%' THEN 1 ELSE 0 END AS  [A15_A19] 
    case when diag_string like 'A[2-4][0-9]%' THEN 1 ELSE 0 END AS  [A20_A49] 
    case when diag_string like 'A5[0-9]%' OR diag_string like 'A6[0-4]%' THEN 1 ELSE 0 END AS  [A50_A64] 
    case when diag_string like 'A6[5-9]%' OR diag_string like 'A7[0-9]%' THEN 1 ELSE 0 END AS  [A65_A79] 
    case when diag_string like 'A8[0-9]%' THEN 1 ELSE 0 END AS  [A80_A89] 
    case when diag_string like 'A9[0-9]%' THEN 1 ELSE 0 END AS  [A90_A99]
    

    Further documentation on how LIKE interprits string can be found https://docs.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql?view=sql-server-ver15 .




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2