regex help

M

mdew

This isnt specifically a perl question, I'm running squid, and running it
through a regex. The question, I'm trying to filter out some spam/ad sites
using regex, I started with the "Penis Enlarging" websites.

I want to match "penis" and "enlarge" in any order, So far I've got

[(penis)(large)(.*)]
I've also tried [(penis)(large)(.*)]{1,} with no luck. Anyone a regex king
that could help me out? :)
 
A

Anno Siegel

mdew said:
This isnt specifically a perl question, I'm running squid, and running it
through a regex. The question, I'm trying to filter out some spam/ad sites
using regex, I started with the "Penis Enlarging" websites.

I want to match "penis" and "enlarge" in any order, So far I've got

[(penis)(large)(.*)]
I've also tried [(penis)(large)(.*)]{1,} with no luck. Anyone a regex king
that could help me out? :)

Matching "this" or "that" in any order *can* be done in a single
(Perl-) regex, but it's a nuisance and doesn't scale. Use an extra
regex for each word.

Anno
 
M

mdew

mdew said:
This isnt specifically a perl question, I'm running squid, and running
it through a regex. The question, I'm trying to filter out some spam/ad
sites using regex, I started with the "Penis Enlarging" websites.

I want to match "penis" and "enlarge" in any order, So far I've got

[(penis)(large)(.*)]
I've also tried [(penis)(large)(.*)]{1,} with no luck. Anyone a regex
king that could help me out? :)

Matching "this" or "that" in any order *can* be done in a single (Perl-)
regex, but it's a nuisance and doesn't scale. Use an extra regex for
each word.

I'm testing to 2 possibilities, to prevent legit websites from being
unnecessarily filtered, I'm thinking of *penis*enlarge* and in reverse
*enlarge*penis*. Whats the proper regex way of doing this?
 
V

Vlad Tepes

mdew said:
mdew said:
I want to match "penis" and "enlarge" in any order, So far I've got

[(penis)(large)(.*)]
I've also tried [(penis)(large)(.*)]{1,} with no luck. Anyone a regex
king that could help me out? :)

Matching "this" or "that" in any order *can* be done in a single (Perl-)
regex, but it's a nuisance and doesn't scale. Use an extra regex for
each word.

I'm testing to 2 possibilities, to prevent legit websites from being
unnecessarily filtered, I'm thinking of *penis*enlarge* and in reverse
*enlarge*penis*. Whats the proper regex way of doing this?

print "Spam!" if (/penis enlarge/i || /enlarge penis/i );
 
S

Sam Holden

I'm testing to 2 possibilities, to prevent legit websites from being
unnecessarily filtered, I'm thinking of *penis*enlarge* and in reverse
*enlarge*penis*. Whats the proper regex way of doing this?

/enlarge/ || /penis/

Your just trying to make us use rudy words (like "enlarge") aren't you :)
 
M

mdew

/enlarge/ || /penis/

Your just trying to make us use rudy words (like "enlarge") aren't you :)

I'm no big perl guru, but doesnt || mean "OR", so any url's with "enlarge"
in the title would get mark as spam? how about,

(enlarge AND penis) OR (penis AND enlarge)

the OR could be ditched, for say to regex, am i looking at this the right
way?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,135
Messages
2,570,783
Members
47,341
Latest member
hanifree

Latest Threads

Top