bosla

Search and regular expressions - how to?

Recommended Posts

I'm not yet sure how the search (and thus the saved active search) works... I've tried some expressions :

 

sap | crm => finds articles containing either "sap" or "crm"

sap crm => finds articles containing both "sap" and "crm"

sap -crm => finds articles contaning "sap" but not "crm"

 

So far, it seems that the pipes means "or", the hypen "not". Is that so?

 

How to group (for exemple to search for the term "sap crm" as such instead of the two keywords "sap" and "crm")?

 

Furthermore, it seems that non-English caracters will not work at all:

 

israel => finds 413 articles

 

israel | aviv => finds 421 articles (that seems logical)

israel | israël => finds only 1 article ?!

lyon | rhône | rhone | isère => finds exactly one article, where none of the key words is contained ?!

 

But then, 

 

régie => finds 205 articles

régie sap => finds 20 articles

régie | sap => finds 1920 articles

 

Does that mean that "é" is ok, "ë" is not ?

 

Any wildcards? (Ex : Alban* = Albania, Albanien, Albanisch etc).

 

thanks a lot in advance for your help,

Boris

 

Share this post


Link to post
Share on other sites

Hello, 


Yes you are right for the regular expression parameters.


For grouping, actually you should wrap the words in slashes /..../ and use the OR operator (|) between the keywords (word1 | word2 | ....) to separate them, or just separate words with spaces /word1 word2 word3/ to search for all of them.


So you should have similar regexp /word1 | word2 | word3 | ..../ for filtering by several keywords.


Regarding non-English characters, you shouldn't experience similar behavior with "ë" for example. 

Are you sure that when you searching for israël for example, you are searching in all of your subscriptions? Try with "israël" only. 

For wildcards, at this stage we don't support them. 

Share this post


Link to post
Share on other sites

Dear Wesson, thanks for the answer. I think there's something wrong with non-English caracters, e.g.

Iran => 265 articles
Iran | Teheran => 265 articles
Teheran => 0
Téhéran => 1 (but the articles contains neither Teheran nor Téhéran)

Now let me try to search for any one of these three :

Iran | Teheran | Téhéran => 1
Iran | ( Teheran | Téhéran ) => 266
( Iran | Teheran ) | Téhéran => 1

So it seems that A|(B|C) works but A|B|C or ( A| B )| C not, which is not what I would have expected.

I'll try a third term which I'm sure does not exist :

Iran | Teheran | blablabla => 265 (this looks fine as A or B exists)
Iran | Teheran | blébléblé => 33 (too few)
Iran | Teheran | blé => 298 (the word blé exists but I would not expect it to be so frequent)
Iran | Teheran | bléßß => 298 (too many)
Iran | Teheran | bléblébléßß => 33 (too few)
Iran | Teheran | schúnlkj => nothing
Iran | Teheran | sch§n => 75 (too few)
Iran | Teheran | sxchýý => 265 ok
Iran | Teheran | schèè => 463 (the caracter chain "schèè" is impossible)

 

The non-existing term alone:

bléblébléßß => 33 (impossible)
sxchýý => 0
schèè => 198 (impossible)
schúnlkj => 0

 

 

Another example:

consultant => 11608
consültant => 0
consültänt => 0

travaillez => 150 (looks fine)
travailléz => 150 (é matches e)
travaillëz => 1 (ë matches only once)

It seems that "é" matches "e" but not the other way around, but the same is not true for "ä" matching "a". And "ë" means "get me only one article"?! 

For me, those regular expressions are really mysterious... I read that you're using Sphinx as underlying engine? How does the interface translate my search term into a Sphinx query? Maybe that could help me understand where my difficulties come from?

 

Sorry for the inconvenience...

Share this post


Link to post
Share on other sites

And the solution is... quotation marks !!

 

"Isère" => 15 right
Isère => 683 wrong
Isere => 10
Isere | "Isère" => 25 right
 
Iran | Teheran => 266
Iran | Teheran | Téhéran => 1 only which is nonsense
Iran | Teheran | "Téhéran" => 266 right as Téhéran is not found
 
So just another lesson learnt... if you want to use non-English characters, you'll have to put them into quotation marks, and everything works fine.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now