Thursday, August 20, 2009

Implicit (and Therefore Perhaps Unexpected) Conversions

I was surprised this morning when I saw a colleague post a message in the MSDN T-SQL Forum where he passed a numeric column as the first argument to the RIGHT() function. In other words, something like this:

declare @t table (IntegerColumn int)
insert @t select 12345
union all select 987
union all select 23
select LastDigit=right(IntegerColumn,1) from @t
/*
LastDigit
---------
5
7
3
*/
Huh? I didn’t know you could do that!

If you look at Books Online for the RIGHT() function, it states the following about the first argument:

"character_expression … is an expression of character or binary data. character_expression can be a constant, variable, or column. character_expression can be of any data type, except text or ntext, that can be implicitly converted to varchar or nvarchar. Otherwise, use the CAST function to explicitly convert character_expression."

Hmmm... So it specifically states that it “can be of any data type” and will be “implicitly converted to varchar”.

Intrigued, I wondered if other string functions did the same thing.

The same information regarding “implicit conversion” is not in the documentation for several of the other string functions; however, with a couple exceptions, they all perform the implicit conversion, as you can see in my findings below (using SQL2005):

select left(12345,2)            --Returns '12'
select left($273.56,4) --Returns '273.'
select right(987.510,4) --Returns '.510'
select right(1.23e10,5) --Returns 'e+010'
select substring(12345,3,2) --ERROR: Data Type INT is invalid for argument 1

select len(12345) --Returns 5
select len(1e10) --Returns 6 (Because 1e10 converts to '1e+010')

select lower(12345) --Returns '12345'
select lower(1.23e10) --Returns '1.23e+010'
select upper(1.23e10) --Returns '1.23E+010'
select ltrim(12345) --Returns '12345'
select rtrim(12345) --Returns '12345'

select ascii(12345) --Returns 49, the ASCII value of '1'
select unicode(12345) --Returns 49, the UNICODE value of '1'

select replace(12345,2,9) --Returns '19345'
select stuff(12345,3,1,9) --Returns '12945'

select replicate(23,3) --Returns '232323'
select reverse(123.45) --Returns '54.321'

select charindex(4,543210) --ERROR: Data Type INT is invalid for argument 1
select charindex('4',543210) --Returns 2
select patindex('[0-9]%',12345) --ERROR: Data Type INT is invalid for argument 2

select soundex(12345) --Returns '0000'
select difference(12345,98765) --Returns 4, meaning least possible SOUNDEX difference

select quotename(12345) --Returns '[12345]'
Just for kicks, what about datetime?:

declare @dt datetime, @dt2 datetime
set
@dt='2009-08-15 11:15:00'
set @dt2='2009-01-31'

select left(@dt,6) --Returns 'Aug 15'
select right(@dt,9) --Returns '9 11:15AM'
select substring(@dt,5,2) --ERROR: Data Type DATETIME is invalid for argument 1

select len(@dt) --Returns 19
select lower(@dt) --Returns 'aug 15 2009 11:15am'
select upper(@dt) --Returns 'AUG 15 2009 11:15AM'
select ltrim(@dt) --Returns 'Aug 15 2009 11:15AM'
select rtrim(@dt) --Returns 'Aug 15 2009 11:15AM'

select ascii(@dt) --Returns 65, the ASCII value of 'A'
select unicode(@dt) --Returns 65, the UNICODE value of 'A'

select replace(@dt,15,99) --Returns 'Aug 99 2009 11:99AM'
select stuff(@dt,5,2,@dt2) --Returns 'Aug Jan 31 12:00AM 2009 11:15AM'

select replicate(@dt,2) --Returns 'Aug 15 2009 11:15AMAug15 2009 11:15AM'
select reverse(@dt) --Returns 'MA51:11 9002 51 guA'

select charindex('5',@dt) --Returns 6
select patindex('A%',@dt) --ERROR: Data Type DATETIME is invalid for argument 2

select soundex(@dt) --Returns 'A200'
select soundex(@dt2) --Returns 'J500'
select difference(@dt,@dt2) --Returns 2

select quotename(@dt) --Returns '[Aug 15 2009 11:15AM]'
Is this a good thing, or a bad thing? I haven’t decided yet, though I’m leaning more towards “bad thing”. I thought T-SQL was a little more strongly-typed than this. Anyway, keep this knowledge in the back of your head and try not to accidentally use a numeric or datetime column in a string function in your coding.

8 comments:

  1. Implicit conversion is a Good Thing. If treating it as text works, why not? Errors only help the developer. But if the developer forgot to type something, should a run-time error actually catch it? Perhaps some pre-compiler can throw a warning.

    Other than that, text is text is text. And if it isn't text, we'll make it text. The only time that does not make sense, is when the FUNCTION itself requires a non-text parameter.

    I don't necessarily like SQL Server's data type priority, but that is something else.

    ReplyDelete
  2. I understand what you're saying.

    I guess it has to do with where I came from. In the Visual FoxPro world, we didn't have strong datatyping for variables. In other words, I could assign a data value to a variable and then turn right around and assign a number to the same variable, and then later assign a string to the same variable. That's a really foreign concept in C#, but not in VFP.

    However, if I didn't pass a string as the first argument to RIGHT(), it would be an error... and it would be a run-time error (because VFP is an interpreted language and datatypes of variables are not rigid and no datatype checking is done when the code is parsed for coding errors).

    But in SQL, the code is compiled, and the datatype of a variable (or column) is set in stone, so it will know at COMPILE time whether the argument you're passing to RIGHT() is a string or not, so I expected it to yell at me before it even executed.

    You see what I'm saying?

    ReplyDelete
  3. Yes, i do see what you are saying. Indeed, i agree to some extent. That is, that the compiler should produce a warning: "Um, you know that's a number you are providing to a FUNCTION that expects INT. [Ignore] [Cancel]"

    But to not allow implicit conversions at all, well, that's just crazy-talk. :)

    I find implicit conversions nice for code that accepts input that may not be in the correct format. To allow the input with error, text must be used to store it. But, if the data actually is a number, it *still* gets treated as text. And CASTing it, well, that's just cumbersome.

    What next? Remove string dates unless they match the SQL standard for date literals?

    Another point here is context. We don't need to supply a TABLE name when context gets it.

    TABLE A: A, B
    TABLE B: B, C

    SELECT A FROM A works. SELECT A.A FROM A would just be redundant.

    SELECT C FROM A, B should technically force SELECT B.C FROM A, because two TABLEs are in context. And yet, since it can be figured out, it works. (I think it is poor coding, but no reason it shouldn't work.)

    Only SELECT B FROM A,B should fail. Why? Because, as the error states, the COLUMN is ambiguous. IOW, context is not enough to figure out what is being requested.

    Similarly, if a FUNCTION requires text as an input, and an INT is provided, the context of the FUNCTION implies CAST. What that you say? The compiler should scream at the coder? Perhaps it should, but the request has been made, and context does answer any questions. Hmm... maybe the user made a mistake and didn't know what the FUNCTION did? Well, relying on a warning instead of reading the docs is not really a good reason to break it for everyone else.

    ReplyDelete
  4. I understand...

    Ambiguity MUST be checked (as in your example with TableA and TableB)... you can't make assumptions about that.

    Yes, I figured the compiler should scream at the coder; however, you bring up a really good point with dates. It's nice and convenient to pass string literals to date-related functions. In VFP, we would have to specifically indicate that it's a date by using curly brackets (and the carat indicates that the date is in YYYY-MM-DD format)... for example, MONTH({^2009-07-25}).

    Anyway, I don't have a big problem with the implicit conversions... I certainly don't think the behavior should be changed... it just kinda surprised me, that's all.

    Thanks for your thoughts... I may have jumped on the "bad thing" bandwagon too soon... I think I meant something more like "surprising thing".

    ReplyDelete
  5. Brad, i forget the basics all the time. Posts like yours that illuminate a bit of testing, are refreshing. (Yes, that is a pun.)

    ReplyDelete
  6. THE PROBLEM, my dear, is when that implicit conversion kills performance. And the choices the "optimizer" makes aren't always the logical ones.

    for example, given table Tab (strCol varchar(10)).
    insert into Tab select '1';
    insert into Tab select '2';
    "select * from Tab where strCol = 2;"
    performs an implicit convert(int,strCol) instead of convert(varchar,2);

    What's the difference? well, if Tab has 10 million rows, the database must perform that convert(int,strCol) 10 million times.

    "select * from Tab where strCol = '2';" fixes the problem (no convert), as does
    "select * from Tab where strCol = convert(varchar,2);"

    I vote for "implicit conversion is bad".

    ReplyDelete
  7. @Anonymous:

    Certainly true... and important.

    BTW, You might be interested in a post on implicit conversions by Paul White that he just wrote a couple days ago:

    http://sqlblog.com/blogs/paul_white/archive/2011/07/19/join-performance-implicit-conversions-and-residuals.aspx

    If you aren't already reading Paul's blog, add it to your reader now... Excellent stuff.

    Best...

    --Brad

    ReplyDelete
  8. I have read your blog its very attractive and impressive. I like your blog. MSBI online course Bangalore

    ReplyDelete