SQL Server: Weird behavior with CONTAINSTABLE, ISABOUT and weighted terms

I came across the following weird behavior in an ISABOUT query in SQL SERVER, which sums up to the following bullets. Further down you can read the exact queries that reproduce this behavior and a not so elegant solution.
Best match for word “weird” is result with KEY 1:

  • weight(1): RANK of KEY 1 is 249
    (results order 1,2,3)
  • weight(0.8): RANK of KEY 1 is 321
    (weight down => rank up, results order 1,2,3)
  • weight(0.2): RANK of KEY 1 is 998
    (weight down => rank up, results order 1,2,3)
  • weight(0.17): RANK of KEY 1 is 802
    (weight down => rank down, results order 2,3,1)
  • weight(0.16): RANK of KEY 1 is 935
    (weight down => rank up, results order 2,1,3)
  • weight(0.01): RANK of KEY 1 is 50
    (weight down => rank down, results order 3,2,1)

As you can see, from 0.2 to 0.17 ranking decreases and results are messed up! From 0.16 results are inverted (the weight values that reproduce this behavior depend on terms, columns searched, etc). It seems there is a point where weight means the opposite, something like “do not include this term”.

Microsoft states here that the actual value of RANK is meaningless. But what about results order!

These are the exact queries that I used to reproduce this behavior:

QUERY 1 (weight 1): (Initial ranking)

    SELECT * FROM 
    CONTAINSTABLE(FullTextTable, FullTextColumn, 'ISABOUT ("weird" weight (1) ) ') 
    ORDER BY RANK DESC, [KEY]
    
    KEY	    RANK
    1	249
    2	156
    3	114

QUERY 2 (weight 0.8): (Ranking increases, initial order is preserved)

    SELECT * FROM 
    CONTAINSTABLE(FullTextTable, FullTextColumn, 'ISABOUT ("weird" weight (0.8) ) ') 
    ORDER BY RANK DESC, [KEY]

     KEY     RANK
     1  321
     2  201
     3  146

QUERY 3 (weight 0.2): (Ranking increases, initial order is preserved)

    SELECT * FROM 
    CONTAINSTABLE(FullTextTable, FullTextColumn, 'ISABOUT ("weird" weight (0.2) ) ') 
    ORDER BY RANK DESC, [KEY]
    
     KEY    RANK
     1	998
     2	877
     3	692

QUERY 4 (weight 0.17): (Ranking decreases, best match is now last, inverted behavior for these terms begin at 0.17)

    SELECT * FROM 
    CONTAINSTABLE(FullTextTable, FullTextColumn, 'ISABOUT ("weird" weight (0.17) ) ') 
    ORDER BY RANK DESC, [KEY]

     KEY      RANK
     2   960
     3   958
     1   802

QUERY 5 (weight 0.16): (Ranking increases, best match is now second)

    SELECT * FROM 
    CONTAINSTABLE(FullTextTable, FullTextColumn, 'ISABOUT ("weird" weight (0.16) ) ') 
    ORDER BY RANK DESC, [KEY]

     KEY      RANK
     2   978
     1   935
     3   841

QUERY 6 (weight 0.01): (Ranking decreases, best match is last again)

    SELECT * FROM 
    CONTAINSTABLE(FullTextTable, FullTextColumn, 'ISABOUT ("weird" weight (0.01) ) ') 
    ORDER BY RANK DESC, [KEY]
 
     KEY    RANK
     3	105
     2	77
     1	50

This of course causes major problems when you use a custom “word-breaker”, creating something like this:

    CONTAINSTABLE(FullTextTable, FullTextColumn, 
          'ISABOUT (
              "wordA wordB wordC" weight (0.8), 
              "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6), 
              "wordA*" weight (0.1), 
              "wordB*" weight (0.1), 
              "wordC*" weight (0.1), 
           ) ')

For now, until a better solution is found, I just use weights above 0.2!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s