![]() |
<http://www.uwasa.fi/~ts/info/proctips.html> Copyright © 1999-2002 by Prof. Timo Salmi Last modified Thu 19-Dec-2002 15:47 |
![]() |
|
Although there already is an abundance of procmail material on the net, here are some of my own tips and observations. This tips page is a companion of my Foiling Spam with an Email Password System page. The items on this page are in no particular order.
Find out what your email directory is. Go ("cd") to the directory where your email folders are located and type "pwd". Assume in this item that you get "/home/myid/Mail". Further assume in the example that "/home/myid" is your home directory so that you can use the variable "${HOME}" to denote it.
Find out where your system's Bourne shell is located by typing "which sh". Assume that you get "/usr/bin/sh".
Prepare a "~/.procmailrc" file with a suitable editor. For example you might use "emacs ~/.procmailrc". To start with, put something like this into the ~/.procmailrc file:
#Preliminaries SHELL=/usr/bin/sh #Use the Bourne shell (check your path!) MAILDIR=${HOME}/Mail #First check what your mail directory is! LOGFILE=${MAILDIR}/procmail.log LOG="--- Logging ${LOGFILE} for ${LOGNAME}, " #Whatever recipes you'll use #The order of the recipes is significant :0 * ^From: scam@cyberspam\.com /dev/null # Accept all the rest to your default mailbox :0: ${DEFAULT}For the "~/.procmailrc" file a read permission for the user him/herself will be sufficient. To ensure, give the command "chmod u+r ~/.procmailrc".
Find out where the "procmail" program is located on your system by typing "which procmail". Assume below that you get "/usr/local/bin/procmail". Also check what your id is: "whoami". Assume that you get "myid".
Next comes the crucial step. Put the following line in your "~/.forward" file. Include the quotes (") into the ~/.forward file contents.
"|IFS=' ' && exec /usr/local/bin/procmail || exit 75 #myid"Set adequate permissions for accessing the "~/.forward" file: "chmod 644 ~/.forward". Lastly, check ("ls -lFd ~/") that your main directory permissions are at least (the equivalent of) "drwx--s--x". If not, "chmod u+rwx ~/" and "chmod og+x ~/".
You should now be set to go. To check, send an email to yourself to see if it gets through. If there is a problem see the advice on troubleshooting.
#The executable file named "proctest"
#!/bin/sh
#
# You need a test directory.
TESTDIR=/home/myid/test/
if [ ! -d ${TESTDIR} ] ; then
echo "Directory ${TESTDIR} does not exist; First create it"
exit 0
fi
#
#Feed an email message to procmail. Apply proctest.rc recipes file.
#First prepare a mail.msg email file which you wish to use for the
#testing.
procmail ${TESTDIR}/proctest.rc < mail.msg
#
#Show the results.
less ${TESTDIR}/Proctest.log
clear
less ${TESTDIR}/Proctest.mail
#
#Clean up.
rm -i ${TESTDIR}/Proctest.log
rm -i ${TESTDIR}/Proctest.mail
The beauty of this method is that besides "proctest.rc" you can easily edit also "mail.msg" for testing different kinds of incoming mail and the behavior of your recipes in various situations. Note, however, that it is best to test only for one email message at a time. In other words, do not put more than one email message into the mail.msg test file.
A question remains. Where does one get the structure of a posting for the "mail.msg" test posting? Easy. Invoke elm, select a suitable, existing posting, and make a copy of it to "mail.msg" by pressing C (capital C) and reply mail.msg to "Copy message to:". Other mail programs probably have similar options.
Below is the proctest.rc recipe file which I used in preparing for this item:
SHELL=/bin/sh TESTDIR=/home/myid/test MAILDIR=${TESTDIR} LOGFILE=${TESTDIR}/Proctest.log LOG="--- Logging for ${LOGNAME}, " #Troubleshooting: VERBOSE=yes LOGABSTRACT=all #Let's test stripping lines from the email message's header :0 fwh | egrep -vi "(^Content-|^MIME-Version:.)" #If it is from myself, store the email message :0: * $ ^From:.*${LOGNAME} ${TESTDIR}/Proctest.mail #Otherwise, discard the email message :0 /dev/null
#Let's test stripping lines from the email message's header,
#but only when they're there
:0 fwh
* ^(Mime-Version:|Content-)
| formail -IMime-Version: -IContent-
To continue myself. The flags are as follows: "f" use the pipe as a
filter, "w" execute before proceeding, "h" it is about the header of
the email message.
The formail -I switch means that if the field is found it is to be replaced with a similar field with and "Old-" prefix, provided that the field is not empty (if it is empty the field is removed).
#Trivial catching of potential spam towards the end of a ~/.procmailrc #Place only after accepting all the mailing lists you want to receive :0: * ! ^TO_ts@([-a-z0-9_]+\.)*uwasa\.fi * ! ^TO_timo\.salmi ${HOME}/.mail/PotentialSpam.mailFor entering an "or" rule, consider the following example:
#Accept email from Era Eriksson, the author of the major procmail FAQ :0: * ^From:.*reriksso@([-a-z0-9_]+\.)*helsinki\.fi|\ ^From:.*era@iki\.fi ${DEFAULT}Let's look at a few details:
:0: * 1^0 ^From:.*reriksso@([-a-z0-9_]+\.)*helsinki\.fi * 1^0 ^From:.*era@iki\.fi ${DEFAULT}Likewise, you could alternatively use ( ) grouping
:0: * ^From:.*(\ reriksso@([-a-z0-9_]+\.)*helsinki\.fi|\ era@iki\.fi) ${DEFAULT}
Feedback:
That condition looks a bit ugly to me. Let me refrase it to show
you what I mean:
* ^From:.*(reriksso@([-a-z0-9]+\.)*helsinki|era@iki)\.fi
(an underscore can not be part of a hostname, as far as I
know.)
Yes, many of the rules presented in this FAQ can be written more
concisely and/or effectively. The rules, as presented in the FAQ,
are often formulated for easier understanding than efficiency. But
it is useful to improve on the efficiency after one first has got
the basic logic of a rule outlined.
#Test if the message has a "Subject:" header and has a subject in it
#(The brackets [] contain a space and a tab)
:0:${TESTDIR}/Proctest.mail.lock
* ^Subject:
* ^Subject:[ ]*\/[^ ].*
| echo "A ^Subject: header found with" >> ${TESTDIR}/Proctest.mail &&\
echo "${MATCH}" >> ${TESTDIR}/Proctest.mail
Likewise, a single command can be subdivided for easier
documentation:
| echo "A ^Subject: header found but there is no subject" \
>> ${TESTDIR}/Proctest.mail
Below is another example with a slightly different
syntax using the semicolon ";" as the separator. The example also
demonstrates how to save diskspace by zipping email from a
particular source. You'll need Info-ZIP's zip and unzip in order to
be able to apply it. (They are available from the proper Unix
section of Garbo program
archives at the University of Vaasa,
Finland.)
:0w:Test.mail.lock
* ^From:.*test
| unzip ${HOME}/mail/Test.zip; \
cat >> Test.mail; \
zip -oj9 ${HOME}/mail/Test.zip Test.mail; \
rm -f Test.mail
What happens on the action line is this:
SHELL=/bin/sh TESTDIR=/home/myid/test MAILDIR=${TESTDIR} LOGFILE=${TESTDIR}/Proctest.log LOG="--- Logging for ${LOGNAME}, "First, a few environment variables are included.
#Troubleshooting: VERBOSE=yes LOGABSTRACT=allThe above means: Use full reporting for the debugging.
#An auxiliary regular expression to detect text, #The brackets [] contain a space and a tab GETTEXT="[ ]*\/[^ ].*"If the same expression is used several times in a recipe file, it is convenient to put the expression into an environment variable instead of writing it out repeatedly.
Of course, there are other options for extracting the subject into an environment variable. One is to utilize "formail" which is a companion to the procmail program. If you include the following expression at the beginning of your ~/.procmailrc recipes file, you will have the variable ${SUBJECT} available for the rest of the recipes file.
#Environment variables for procmail
#
#Get the subject
#Discard some dangerous special chars + any leading and trailing blanks
SUBJECT=`formail -xSubject: \
| sed -e 's/[;\`\\]/ /g' \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
For an example of usage see the
Foiling Spam with an Email Password System page.
Feedback:
Extracting the header from inside procmail using the \/ token is
_much_ faster than the formail solution.
Feedback:
If the SUBJECT variable is left empty, apply quotes on the first
line, i.e.
SUBJECT=`formail -x"Subject: "\
#Header logging
:0hc:${HOME}/.mail/Procmail.head.lock
| cat >> ${HOME}/.mail/Procmail.head
That eliminates a cat and a shell process, plus the pipe and
extra reads and writes.
Now, if you want to overwrite the file with each new message [or do
some further shell operations within the pipe], then the cat command
is a reasonable choice.
[A further point] That would have been an odd name for the lockfile.
Why not $HOME/headers.cut.$LOCKEXT?
Perhaps the strongest generic trick against spam is to shirk any email that is not addressed to you directly, since most spam is addressed to some kind of mailing lists. Of course, you first will have to accept email from any legitimate mailing list which you have subscribed to. If you put a suitable recipe after your recipes that accept the legitimate email lists much of the incoming spam will be caught. Below is a simplified And a bit munged) version of what I do in my own ~/.procmailrc:
#Catch potential spam :0 * !^TO_(ts|timo\.salmi)@([-a-z0-9_]+\.)*uwasa\.fi { :0 fwh * ^Content-Length: | formail -IContent-Length: :0: Spam.mail }If you look carefully through this page, you'll find explanations for all the details in the above recipe. It will be a good exercise to do so. :-)
Since so much, if not practically all spam comes from forged sender addresses it is much more effective to block certain suspect email routes than to try to match the elusive spammers. The scoring recipe example below treats as spam all email that is routed via dialsprint.net and that is not addressed to "me" personally.
#Spam avoidance of certain routes and if not for me personally :0: * -1^0 * 1^0 ? formail -x"Received:" | egrep -is "dialsprint\.net" * 1^0 ! ^TO_(myid|myFirstName[ _\.]myLastName)@([-a-z0-9_]+\.)*myhost\.mydom Spam.mail
:0B: * (remove@|removeme@) PotentialSpam.mail
:0D: * ^Subject:.*ADV PotentialSpam.mail
:0: * (^Subject:.*make.*money.*fast|^Subject:.*\$\$\$) PotentialSpam.mail
Feedback:
The regexp:
(remove@|removeme@)
is much slower than
remove(me)?@
Having the 'top-level' of the regexp be a alternation (via '|')
slows down matching by quite a bit. The more that can be factored
out at the beginning of the regexp, the better. The same goes for
the recipe that matches against the Subject: header field:
^Subject:.*(make.*money.*fast|\$\$\$)
is faster than:
(^Subject:.*make.*money.*fast|^Subject:.*\$\$\$)
My comment: Of course it is commendable to be efficient, especially
where easy understanding is not compromised. However, if the two
clash, I often prefer clarity of expression and convenience over a
strict maximization of code efficiency. Don't we have our powerful
modern computers to perform our tasks for us, not vice versa :-).
(This is not about the particular feedback above. The improvements
are useful. They are both legible and instructive.)
More feedback:
The "* ^Subject:.*ADV" rule is overly simplistic and catches
many non-spam subjects. Maybe rather something like
"* ^Subject:\<*ADV\>"
My comment. Ok. Let's try
:0D:
#(The brackets [] start with a space and a tab)
* ^Subject:.*([ \{<]+)ADV([ :\}>]+|$) |\
^Subject:.*(\[+)ADV(:)?(\]+|$)
PotentialSpam.mail
It is far from perfect, but it should work reasonably well for regular
purposes. Spam detection requires experimenting anyway. Regular
expressions are not easy. They are quite a large subject area of their
own.
The above assumes that the is (at least) one space after the "Subject:" header before the subject begins. This can be ensured by first applying "formail -z" which you can have high up your ~/.procmailrc. For example I have the upper two lines in mine.
:0 fwh
| formail -z -iContent-Length:
:0D:
* ^Subject:.*([ \{<]+)ADV([ :\}>]+|$) |\
^Subject:.*(\[+)ADV(:)?(\]+|$)
PotentialSpam.mail
See the other items in this tips file for an explanation of the
"fwh" flags. The formail program with the "-z" switch will insert
the desired blanks into the header. The "-iContent-Length:" switch
(which is outside the theme of the current item) will replace the
Content-Length: headers with Old-Content-Length: headers.
I use a slightly different recipe in my own ~/.procmailrc recipes file:
:0D
* ^Subject:.*([ ]|<|\[)ADV([ ]|>|:|\]|$)
{
:0
{ RULE="Catch potential spam by detecting an ADV keyword" }
:0
/dev/null
}
If you wonder about the "RULE" variable, see the item about logging which rules have been used.
On to a different facet. Some ISPs (Internet Service Providers) do now allow numbers in the email addresses. Thus, you may identify some of the forged spam by catching a violation in this respect. The following recipe catches email with numbers in the user id before the @ mark from the all the various nodes on "respectable.net".
:0: * ^From:.*[0-9].+@([-a-z0-9_]+\.)*respectable.net PotentialSpam.mail
Date: Thu, 19 Dec 2002 10:44:44 +1000 From: Philip Gunter To: Timo Salmi Subject: A procmail tidbit Hi Timo, thanks for your excellent procmail reference. Here is a small recipe you might like to add to your site. It limits the number of emails being forwarded from an account, useful to stop sms storms. Cheers, Philip. :0 { :0 { # remove any sms-alert files older than 5 minutes GLOP_=`find /var/tmp/sms -name sms-alert\* -cmin +5 -exec rm -f {} \;` # Create an sms-alert file for this message. GLOP_=`touch /var/tmp/sms/sms-alert$$` # Count the number of sms-alert files COUNT=`ls /var/tmp/sms | grep sms-alert | wc -l` COUNT1=`expr ${COUNT}` # Check if number of alerts in the last 5 minutes is less than 2ISLT=`expr ${COUNT1} \< 2`
} :0: # if the expression is true then forward the email * ISLT ?? ^^1^^ ! 0123456789@pager.net }
#Truncate messages longer than 4000 bytes to 100 lines
:0
* > 4000
{
:0 fwh
* ^Content-Length:
| formail -IContent-Length:
:0:Truncated.mail.lock
| head -100 >> Truncated.mail
}
Some details:
#Truncate messages longer than 4000 bytes to 100 + 10 lines
:0
* > 4000
{
:0 fwh
* ^Content-Length:
| formail -IContent-Length:
:0c:Truncated.mail.lock
| head -100 >> Truncated.mail &&\
echo "-:-:-:- (snip) -:-:-:-" >> Truncated.mail
:0:Truncated.mail.lock
| tail +101 | tail -10 >> Truncated.mail
}
A few observations:
#The executable file named "greptest" #!/bin/sh egrep -i '(ts|timo\.salmi)@([-a-z0-9_]+\.)*uvasa\.fi' mail.msg # #Allow a quick visual comparison on the screen echo "" cat mail.msg #The mail.msg target file with the trial text for the matching ts@uvasa.Fi ts@loisto.uvasa.fi Timo.Salmi@uvasa.Fi Timo.Salmi null@uvasa.fiThen, just give the command "greptest" and visually compare the outputs.
Miscellaneous notes:
From: scam@cyberspam.com (The Big Bad Spammer)The first solution that comes to mind is the following, but it is not entirely accurate.
:0: * ^From:.*\.com * !^From:.*\.com\. * !^TO_(ts|timo\.salmi)@([-a-z0-9_]+\.)*uwasa\.fi ProbableComSpam.mail
# Get the sender's address # Discard any leading and trailing whitespaces FROMADDR_=`formail -rt -xTo: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` # Test if the email came from the .com domain :0: * $ ? echo ${FROMADDR_} | egrep -is '\.com$' ComDomain.mail
:0: * ^From:.*\.hk|\ ^From:.*\.kr|\ ^From:.*\.tr * !^From:.*\.hk\.|\ !^From:.*\.kr\.|\ !^From:.*\.tr\. * !^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom ProbableSpam.mailAn aside: You could also utilize a more condensed format:
* ^From:.*\.(hk|kr|tr)(Condensing the rest of the above recipe is left as an exercise.)
Using scoring is one option. The recipe could also be rewritten as
#Define getting the sender's address #Discard any leading and trailing whitespaces FROM_=`formail -rt -xTo: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` #Whatever other recipes in between. #Spam screening of certain susceptible domains :0: * -1^0 * 1^0 $ ? echo ${FROM_} | egrep -is '\.hk$' * 1^0 $ ? echo ${FROM_} | egrep -is '\.kr$' * 1^0 $ ? echo ${FROM_} | egrep -is '\.tr$' * 1^0 !^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom ProbableSpam.mailThere also is the option
:0:
* ^From:.*\.hk([ >]|$)|\
^From:.*\.kr([ >]|$)|\
^From:.*\.tr([ >]|$)
* ! ^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail
#Accept all email from myself, weed out autoreplies :0: * ^From:.*myid@([-a-z0-9_]+\.)*myhost\.mydom * ! ^X-Loop: myid@myhost\.mydom ${DEFAULT}Next, let's extend the matching to more fields in the header:
:0 * ? formail -x"From" -x"From:" -x"Reply-To:" -x"Errors-To:"\ | egrep -i "scam@cyberspam\.com" /dev/null
FROM="^(From[ ]|(Old-|X-)?(Resent-)?(From|Reply-To|Sender):)(.*\<)?"
#(whatever else in between)
:0
* $ ${FROM}scam@cyberspam\.com
/dev/null
:0 * ? formail -x"Received:"\ | egrep -i "cyberspam\.com" /dev/nullSpam email is sometimes indicated by a missing or an empty "From:" line in the header. Furthermore, the "From:" line might contain an empty <> instead of having a proper address within the <>. Using scoring we might have something like
:0:
* 1^0 ^From:([ ]$|$)
* 1^0 ! ^From:
#A catch: Don't use here the word-boundary operators \< \>
#Use just the plain <>
* 1^0 ^From:.*<>
NoFrom.mail
Under a worst-case scenario, the various sender headers might all be empty. To test for this unlikely eventuality we can utilize the fact that formail would put a "foo@bar" into the "FROM_" under such circumstances.
# Define getting the sender's address # Discard any leading and trailing whitespaces FROM_=`formail -rt -xTo: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` # Test if the sender could not be identified at all :0: * FROM_ ?? foo@bar NoSender.mailAs always, there are several alternatives to solving a problem. Consider a potential case where a spammer poses as the mailer-daemon but the "From:" header is either missing or total gibberish. How to detect this situation? The second condition in the recipe below ensures that there is "From:" line in the header, and that it has some elementary validity.
:0: * ^From[ ]*MAILER-DAEMON * ! ? formail -x"From:" | egrep -is "[a-z]" ProbableSpam.mail
REPLYTO_=`egrep "^Reply-To:" | head -1 \ | formail -c -rt -xTo: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
#Get the sender's address, the generic version FROM_=`formail -rt -xTo: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` #Get the sender's host FHOST_=`echo "${FROM_}" | awk -F@ '{ print $2 }'` #Build the postmaster's address FMAST_="postmaster@${FHOST_}"Thus, you have the postmaster's alleged address available as ${FMAST_} from this point on in your recipes file. Note, however, that all validity testing of the address is missing.
What happens in the FROM_ formula:
#Get the sender's address, ignore Reply-To: FROM_=`formail -I"Reply-To:" -rt -xTo: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
#Ensure a whitespace exists between field name and content #Comment "Old-" the Content-Length field from all the headers :0 fwh | formail -z -i"Content-Length:" #(whatever else in between) :0 * From:.*the-mailing-list-maintainer * ^TO_the@first\.recipient\.edu { :0 fw | formail -I"To:" -I"X-" -I"Content-Type:" -I"MIME-Version:"\ -A "To: Maintainer's long recipient list suppressed" \ | sed -e '/^This is a multi-part /,/^Content-Transfer-Encoding: /d' \ -e '/------=_NextPart_/,$d' :0: ${DEFAULT} }
Consider the following simple spam foiling recipe. It will put the email into the ProbableSpam.mail file if the score adds up to at least to one. If the first condition is met, 1 is added to the score. Ditto for the second condition. Thus if either of the tell-tale spam signals occur, the score will be positive (that is greater than zero) and the action (storing the email message into the ProbableSpam.mail file) will be enacted.
:0: * 1^0 ^Subject:.*make money fast * 1^0 ^Subject:.*\$\$\$ ProbableSpam.mailThe example above uses equally-weighted scoring. One can also have unequal scores. Below, a hit of the second condition gives two points while a hit of the first only gives one.
* 1^0 ^Subject:.*make money fast * 2^0 ^Subject:.*\$\$\$Scoring can be used to build some extremely trivial artificial intelligence into the recipes. Consider the following
:0: * -1^0 * 1^0 ^Subject:.*money * 1^0 ^Subject:.*fast * 1^0 ^Subject:.*\$\$\$ ProbableSpam.mail
:0:
* ^Subject:[ ]*\/[^ ].*
* -2^0
* 1^0 MATCH ?? ()\<easy\>
* 1^0 MATCH ?? ()\<fast\>
* 1^0 MATCH ?? ()\<(cash|money)\>
* 1^0 MATCH ?? \$\$\$
ProbableSpam.mail
#Catch potential spam by examining the email route :0: * 1^0 ? formail -x"Received:" | egrep -i "157\.161\.140\.2" * 1^0 ? formail -x"Received:" | egrep -i "199\.217\.231\.46" * 1^0 ? formail -x"Received:" | egrep -i "212\.106\.213\.36" * 1^0 ? formail -x"Received:" | egrep -i "216\.154\.1\.82" ProbableSpam.mail
#Avoid a specific forgery spam :0: * -1^0 * 1^0 ^From:.*mikerobbins2000@hotmail\.com * 1^0 ? formail -x"Received:" | egrep -is "psi\.net" Spam.mailScoring and ordinary conditions can be mixed in the rules. For example the two recipes below achieve roughly the same thing, but the latter option produces less steps if the email is for you.
:0: * -1^0 * 1^0 ? formail -c -x"Received:" | fgrep -is 'alladvantage.com' * 1^0 ? formail -c -x"Received:" | fgrep -is 'ameritech.net' * 1^0 ? formail -c -x"Received:" | fgrep -is 'bellatlantic.net' * 1^0 ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom ProbableSpam.mail :0: * ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom * 1^0 ? formail -c -x"Received:" | fgrep -is 'alladvantage.com' * 1^0 ? formail -c -x"Received:" | fgrep -is 'ameritech.net' * 1^0 ? formail -c -x"Received:" | fgrep -is 'bellatlantic.net' ProbableSpam.mailThe formail switches in the above are
:0: * ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom * ^Received:.*(\ alladvantage\.com|\ ameritech\.net|\ bellatlantic\.net) ProbableSpam.mail
:0: * 1^0 ^Subject:([ ]$|$) * 1^0 !^Subject: NoSubject.mailAs usual, the brackets [] contain a space and a tab.
There are other options to test for an empty "Subject:" or an entirely missing "Subject:" field. The one below puts the subject contents in a variable. The actual recipe then tests if the value of the "SUBJ_" variable is empty. (Also see the feedback about the syntax.)
#Get the subject discarding any leading and trailing blanks SUBJ_=`formail -xSubject: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` #Test for an empty or missing subject :0: * SUBJ_ ?? ^^^^ NoSubject.mail
:0 fhw * To.*myoldid@myoldhost.myolddom | formail -i "To: mynewid@mynewhost.mynewdom"
MAILDIR=/home/myid/Mail #The location of your own mail directory # Whatever other preliminaries # Whatever other recipes # Test if the email's sender is in the blacklisted :0 * ? formail -x"From" -x"From:" -x"Sender:" \ -x"Reply-To:" -x"Return-Path:" -x"To:" \ | egrep -is -f black.lst /dev/null
abc23@airnewz.ccn abdu@advis.com.tr adexec@mail.com dinner@dine.com friend@public.com helpingyou@mail.com mk1977@ms1.kingnet.com.tw nb8MAMxhq@mail.com no@body.com owieuj@peterlink.ru patkline00@usa.net promotions@web-vertise.com unknown@unknown.com
#Get the sender's bare email address from the first "From" line FROM_=`formail -c -x"From " \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g' \ | awk '{ print $1 }'` #Get the original subject of the email #Discard superfluous tabs and spaces #On some systems -xSubject: has to be -x"Subject: " SUBJ_=`formail -c -xSubject: \ | expand \ | sed -e 's/ */ /g' \ | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` #Whatever other recipes you'll use :0 * ^From:.*infolist@([-a-z0-9_]+\.)*infohost\.infodom # Avoid email loops * ! ^X-Loop: myid@myhost\.mydom { :0c: #Preserve a copy of the email Infolist.mail :0fwh #Adjust some headers before forwarding | formail -A"X-Loop: myid@myhost.mydom" \ -A"X-From-Origin: ${FROM_}" \ -i"Subject: $SUBJ_ (fwd)" # Forward the email :0 !mydept@myhost.mydom }
:0
* ^From.*info.gov
* ! ^X-Loop: myid@myhost\.mydom
{
:0fwh
| formail -A"X-Loop: myid@myhost.mydom"
:0c
! friend@somehost.domain
:0
! myid2@myhost.mydom
}
The X-Loop is not relevant from the point of the stated problem, but
using it as a safeguard is always advisable.
Feedback:
The reason that the first one does not work is that the
recipients' addresses are separated by space while they should be
separated by a comma [as in]
:0
! friend@somehost.domain,myid2@myhost.mydom
(I have not tested this one.)
#Define getting the sender's address
#Discard any leading and trailing whitespaces
FROM_=`formail -rt -xTo: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
#Whatever other recipes in between.
#Return certain email
:0
#
# Is the email from a frequent spam domain?
# (Note: fgrep takes no regular expressions)
* ? formail -c -x"Received:" | fgrep -is 'cyperspam.com'
#
# Is it for a mailing list rather than to me?
* ! ^TO_(myid|myFirst\.mySecond)@([-a-z0-9_]+\.)*myhost\.mydom
#
# Avoid forgeries that pretend to be from my own site
* ! $ ? echo ${FROM_} | fgrep -is 'myhost.mydom'
* $ ? echo ${FROM_} | fgrep -is '.'
* $ ? echo ${FROM_} | fgrep -is '@'
#
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
# Make a temporary file of the message to be returned
:0c:formail.lock
# Discard whitespaces, insert a leading blank
| expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
# Prepare and send the rejection
# Be sure to customize your sendmail path
:0:formail.lock
| (formail -r -I"Subject: Rejected mail: Recipient refusal" \
-A"X-Loop: myid@myhost.mydom" ; \
echo "--- begin rejected mail ---" ; \
cat return.tmp ; \
echo "--- end rejected mail ---" ; \
rm -f return.tmp) \
| /usr/lib/sendmail -t
}
:0
* ^From:.*(charpie|charpie5266)@mydeja\.com
{ REJECT="charpie5266@mydeja.com" }
:0
* ^From:.*umidextr@([-a-z0-9_]+\.)*mindfall\.com
{ REJECT="umidextr@mindfall.com" }
:0
* ^From:.*(rasch|Greg.*\.Rasch)@([-a-z0-9_]+\.)*millkirn\.com
{ REJECT="rasch@millkirn.com" }
:0
* ^From:.*(daren|Daren[_\.]Risenthal)@([-a-z0-9_]+\.)*slunet\.org
{ REJECT="daren@slunet.org" }
:0
* ! REJECT ?? ^^^^
{
:0
{ RULE="These users I do not want to talk with" }
:0cw
| expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
:0:procmail.lock
| (formail -r -I"To: ${REJECT}" \
-I"Subject: Rejected mail: Recipient refusal" \
-A"X-Loop: myid@myhost.mydom" ; \
echo "--- begin rejected mail ---" ; \
cat return.tmp ; \
echo "--- end rejected mail ---" ; \
rm -f return.tmp) \
| /usr/lib/sendmail -t
}
Note how the above set of rules has two parts, the actual detection
plus the return address definition, and the return action. The
latter could be written in many alternative ways, including
:0
* ! REJECT ?? ^^^^
{
:0cw
| expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
:0 fwh
| formail -r \
-A"Subject: Rejected mail: Recipient refusal" \
-A"From: myid@myhost.mydom" \
-A"X-Loop: myid@myhost.mydom" ; \
echo "--- begin rejected mail ---" ; \
cat return.tmp ; \
echo "--- end rejected mail ---" ; \
rm -f return.tmp
:0
! ${REJECT}
}
#Note: On some systems -xSubject: has to be -x"Subject: " SUBJ_=`formail -c -xSubject: \ | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` :0 # Was it to me * ^TO_myoldid@myoldhost\.myolddom # Ignore messages for daemons * ! ^FROM_DAEMON # Avoid email loops * ! ^X-Loop: myid@myhost\.mydom { :0 c ! myid@myhost.mydom :0:dejatold.lock | formail -rD 8192 dejatold.cache :0 eh | (formail -r \ -A"X-Loop: myid@myhost.mydom" \ -I"Subject: Changed email address" ; \ echo "Dear Sender," ; \ echo "" ; \ echo "Thank you for your email about" ; \ echo "\"${SUBJ_}\"" ; \ echo "" ; \ echo "My email address has changed." ; \ echo "Old: myoldid@myoldhost.myolddom" ; \ echo "New: myid@myhost.mydom" ; \ echo "Your email has been forwarded to my new address." ) \ | /usr/lib/sendmail -oi -t }Some explanations:
From: ts@UWasa.Fi (Timo Salmi)
Newsgroups: comp.mail.misc
Subject: Re: Procmail: How do I filter by the body
Date: Sun Apr 23 09:34:38 EET DST 2000
X-Comment: Slightly modified
I am trying to save all the messages that
come to me with "mypassword" in the body to a folder called
password. How do I do that?
As the manuals state:
Hence, all there is to it is
:0 B: * mypassword passwordIf you want your password case sensitive then use ":0 BD:".
From: ts@UWasa.Fi (Timo Salmi)
Newsgroups: comp.mail.misc
Subject: Re: Question of procmail newbie
Date: Tue Nov 23 23:09:41 EET 1999
X-Comment: Slightly modified
How could I solve the following problem with procmail: I receive e-mails with a body like this:
#Preliminaries
SHELL=/usr/bin/sh #Use the Bourne shell (check your path!)
CATE=`cat | egrep "^Category:" | awk '{ print $2 }'`
SCAT=`cat | egrep "^Subcategory:" | awk '{ print $2 }'`
FILE=`cat | egrep "^File:" | awk '{ print $2 }'`
#Whatever other recipes
:0B:Procmail.lock
* ^Category:[ ].+[a-z0-9]
* ^Subcategory:[ ].+[a-z0-9]
* ^File:[ ].+[a-z0-9]
| mkdir ${CATE} ; mkdir ${CATE}/${SCAT} ;\
cat >> ${CATE}/${SCAT}/${FILE}
#Whatever other recipes
As a validity check the condition lines require that all the
key-lines are present in the email message body and that the lines
contain names.
Next, let's consider a more tricky task. Find from the body of the text the last line that potentially contains the string "mailto:". Insert the contents of that line into a MAILTO_ variable.
:0 * ^Subject:.*Whatever { :0 { MAILTO_=`sed -e '1,/^$/ d' \ | egrep "mailto:" \ | tail -1 \ | expand \ | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g' \ | sed -e 's/[^o]://g' -e 's/^://g' \ | awk -F: '{ print $2 }' | awk '{ print $1 }'` } :0: WhichEverFolderYouWant }Consider the MAILTO_ construct. (The test of the recipe should be self-explanatory.)
:0 { :0 fhw | cat - ; \ echo "===== Filtered email =====" :0: ${DEFAULT} }So far so good. Next let's add the forwarding so that the token will only appear in the forwarded message. (If you wish to change that, adjust the order of the rules.)
:0 { :0c: ${DEFAULT} :0 fhw | cat - ; \ echo "======= Forwarded Mail ==========" :0 !forward@myhost.mydom }Finally, let's add avoiding email loops.
# Discard loops :0 * ^X-Loop: myid@myhost\.mydom /dev/null :0 { :0c: ${DEFAULT} :0 fhw | cat - ; \ echo "======= Forwarded Mail ==========" :0 fhw | formail -A"X-Loop: myid@myhost.mydom" :0 !forward@myhost.mydom }
# Matching a few undelivery and such reports :0: * ^Subject:.*Undeliver(ed|able) (e)?mail|\ ^Subject:.*Returned (spam )?(e)?mail * ^TO_(myid|firstname\.lastname)@([-a-z0-9_]+\.)*myhost\.mydom Returned.mailConsider the first rule of the recipe above. It will match all email with the following on the "Subject:" line in the header:
* ^Subject:[ ]+Undeliver(ed|able) (e)?mailIn other words only spaces and/or tabs are allowed between "Subject:" and the start of the actual subject.
Let's consider another example. Say that we have two hosts
:0: * ^From:.*cyber.com([^\.]|$) ProbableSpam.mailThat is, do not allow a dot after the .com or alternatively require that the line ends there. However, cyber.comet would be matched! Thus, depending on what you want to achieve, you might have e.g.
:0:
* ^From:.*cyber.com( |"|>|$)
ProbableSpam.mail
What is the difference between the rules below?
* ^From:.*myid@([-a-z0-9_]+\.)*myhost.mydom * ^From:.*myid@([-a-z0-9_]+\.)?myhost.mydom * ^From:.*myid@([-a-z0-9_]+\.)+myhost.mydomThe first one matches any of
Symbol | Interpretation |
---|---|
* | Match zero or more times |
? | Match zero or one times |
+ | Match one or more times |
. | Any character |
[ ] | Match from the list within the backets |
^ | The start of the line (within [] however, a negation) |
$ | The end of the line |
\ | Quote the next character to take it literally |
( ) | Grouping |
VAR1_=Whichever expression you devise :0: * VAR1_ ?? regexp whereverBut you can build rules like
VAR1_=Whichever expression you devise VAR2_=whatever :0: * $ VAR1_ ?? ${VAR2_} whereverNote, however, that the above still is regular expression matching, not an equality.
The blank after the first $ is significant. It tells that the variable references on the line (${VAR2_}) are to be expanded, not to be taken as a literal text.
Feedback:
That's easily resolved using $\var expansion and anchoring both ends
of the regexp:
* VAR1_ ?? $ ^^$\VAR2_^^That condition will succeed if and only if VAR1_ and VAR2_ have the same contents, with the possible exception of VAR1_ having one more trailing newline than VAR2_.
Philip Guenther
(Timo's addendum: As far as I understand \< is a word-boundary in procmail. Hence \< is best avoided, when not used as an actual boundary.)
#Get the subject discarding any leading and trailing blanks #Note: On some systems -xSubject: has to be -x"Subject: " SUBJ_=`formail -xSubject: \ | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` * YourFirstSelectionCriterion { :0 fwh | formail -I"Subject: WhateverYouAdd_1 ${SUBJ_}" :0: YourFirstFolder } * YourSecondSelectionCriterion { :0 fwh | formail -I"Subject: WhateverYouAdd_2 ${SUBJ_}" :0: YourSecondFolder }The flags are as follows: "f" use the pipe as a filter, "w" execute before proceeding, "h" it is about the header of the email message.
The -I option in formail removes and replaces the old header. Should you wish to retain the old subject header with an "Old-" prefix added, use -i instead.
Consider
#Note: On some systems -xSubject: has to be -x"Subject: " SUBJ_=`formail -c -xSubject:` # Responses to filter reports :0: * -1^0 * 1^0 $ ? echo \"${SUBJ_}\" | fgrep -is 'Re: Filter report' * 1^0 ^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom Response.mail
#Note: On some systems -xSubject: has to be -x"Subject: " SUBJ_=`formail -c -xSubject: \ | expand | sed -e 's/[;|\$\`\\]/ /g' \ | sed -e 's/ */ /g' \ | sed -e 's/(/\\\(/g' -e 's/)/\\\)/g' \ | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
SHELL=/usr/bin/sh # Use Bourne shell MAILDIR=${HOME}/Mail # Customize as appropriate LOGFILE=${MAILDIR}/procmail.log # Your procmail log VERBOSE=yes # Produce full information LOGABSTRACT=all # - " -However, this produces so much information that it is not convenient for a routine checking by a visual examination. But you can include a suitable (dummy) variable definition in each one of your recipes and then search the log file for occurrences of that variable. Here is an example demonstrating how it goes. Consider a recipe that originally is
# Discard probable spam mail, set 1 :0: * ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom * 1^0 ^From:.*alladvantage.com * 1^0 ^From:.*ameritech.net * 1^0 ^From:.*bellatlantic.net ProbableSpam.mailChange this to be
:0 * ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom * 1^0 ^From:.*alladvantage.com * 1^0 ^From:.*ameritech.net * 1^0 ^From:.*bellatlantic.net { :0 { RULE="Discard probable spam mail, set 1" } :0: ProbableSpam.mail }Apply the same principle for all your recipes in your ~/.procmailrc file. Then, as email has arrived, you can check which rules have been used by searching the log file with the command grep "RULE=" ${HOME}/Mail/procmail.log. If you need this regularly, make the grep search one of your Unix scripts:
#!/usr/bin/sh grep "Assigning \"RULE=" ${HOME}/Mail/procmail.logIn the altered procmail recipe, further up, carefully note some of the syntax
:0 * ^TO_my-mailing-list { :0 * ^From:.*@([-a-z0-9_]+\.)*myhost\.mydom { :0 { RULE="To my-mailing-list, probably legitimate" } :0: ${DEFAULT} } :0E { :0 { RULE="To my-mailing-list, probably spam" } :0: Spam.mail } }
LOGFILE=$HOME/.MailFilter_log
SHELL=/bin/sh
:0 B
* .*spam
{
LOG="TRAPPED SPAM - "
:0
/dev/null
}
#- Accept All other mail -#
:0
{
LOG="ACCEPTED MAIL - "
:0
$ORGMAIL
}
the out put looks something like this:
TRAPPED SPAM - From spammer@spam.com Thu May 16 03:52:42 2002
Subject: Make Money Fast
Folder:
/dev/null 43140
ACCEPTED MAIL - From goodguy@save.com Thu May 16 03:54:08 2002
Subject: Legitimate email message
Folder:
var/spool/mail/username 4683
My comment: If you look at the example for testing for individual procmail recipes you'll see that
for logging one sets (usually for troubleshooting)
#Troubleshooting: VERBOSE=yes LOGABSTRACT=allFor the method in the feedback above, leave those variables out or set
VERBOSE=noHowever, do not set
LOGABSTRACT=nobecause the you'll miss all but the actual log variable identification. Instead, just leave the line out.
# Customize these paths if they do not match yours
SHELL=/usr/bin/sh
SENDMAIL=/usr/lib/sendmail
:0
* ^Subject:.*Timo testing
{
# Put the email intact in the default folder
:0c:
${DEFAULT}
# The "c" flag above tells the recipe to continue
# Now we prepare a different version of the message
:0
{
# Get the subject into a variable
# Expand the possible tabs into blanks
# Discard any leading and trailing blanks
# On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -xSubject: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
# Get the body of the message into a variable
# Accept only the first five lines
# Discard newlines, i.e. put everything on one line
BODY_=`sed -e '1,/^$/ d' | head -5 | tr -d '\n'`
}
# Prepare and send a message with no body
# -X "" extracts just the header (discards the body)
# Plug in the new subject
# Content fields might cause problems if not discarded
# Change to To: address
:0:proc.lock
| formail -X "" \
-I"Subject: ${SUBJ_} ${BODY_}" \
-i"Content-Type:" \
-i"Content-Length:" \
-I"To: your@second.address" \
| ${SENDMAIL} -t
}
The line
BODY_=`sed -e '1,/^$/ d' | head -5 | tr -d '\n'`retrieves the first five lines from the body of the text. It would be more useful to retrieve a specified number of characters from it. Say we wish to retrieve 160 characters. This is how to do that.
BODY_=`sed -e '1,/^$/ d' | tr -d '\n' | dd bs=1 count=160`Solving the alternative of having a maximum of 160 characters in the concatenated SUBJ_ and BODY_ is left as an exercise to the reader.
There also is another, more important improvement that can be made in the action above. Replace tr -d '\n' with tr '\n' ' ' so that when the lines are concatenated a space is put in between them.
:0
* ^Subject: Whatever
{
:0 fbw
| sed -e '/^-- /,$ d'
:0:
${DEFAULT}
}
Let's look at what we've got:
sed -e '/^-- /,$ d'
with
sed -e '/^-- /,/^$/ d'
will instead delete everything starting from the "-- " until
the first encountered empty line. Thus if there is e.g. an
attachment after the signature, the attachment will not be thrown
away.
Note that the "^H" is not a "^" and an "H", but a CTRL-H, i.e. ASCII 8 (the backspace character). To make the "manuals" file executable type "chmod u+x manuals".
#!/bin/sh
TODIR=${HOME}/myman
echo ${TODIR}
man egrep | sed -e 's/_^H//g' > ${TODIR}/egrep.man
man formail | sed -e 's/_^H//g' > ${TODIR}/formail.man
man procmail | sed -e 's/_^H//g' > ${TODIR}/procmail.man
man procmailex | sed -e 's/_^H//g' > ${TODIR}/procmaex.man
man procmailrc | sed -e 's/_^H//g' > ${TODIR}/procmarc.man
man regexp | sed -e 's/_^H//g' > ${TODIR}/regexp.man
man sendmail | sed -e 's/_^H//g' > ${TODIR}/sendmail.man
ls -lF ${TODIR}
Many of the recipes in this FAQ utilize sed
and/or awk. Some useful links (note, however, as is common with
links, I can't guarantee that they still are current):
Since this is a procmail, not the vacation program advice collection I'll assume that you are reasonably familiar with the vacation program. If not, start with "man vacation". You have to use procmail to customize the ~/.vacation.msg file because when invoked via procmail, the vacation $SUBJECT variable is not necessarily set.
Usually, when vacation is used, it is first called interactively to crate the ~/.vacation.msg file and to replace the ~/.forward file. If you are going to use the procmail solution it is very important not to do this. In particular, the ~/.forward file must not be touched in any way. The reason is that in this solution it is used to to invoke procmail, not vacation. (The vacation program is, of course, called by procmail now.)
# Set a number of variables high up in your ~/.procmailrc
#
VACATION=/usr/bin/vacation
ONVACAT=yes
VACFREQ=5d
VACMSG=${HOME}/.vacation.msg
MYNAME_="MyFirstName MyLastName"
MYEMAIL_=myid@myhost.mydom
# Get the subject discarding any leading and trailing blanks
# Note: On some systems -xSubject: has to be -x"Subject: "
#
SUBJ_=`formail -xSubject: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
# Prepare the vacation message's base
# This is done only once in ~/.procmailrc
#
:0 cwi
* ONVACAT ?? ^^yes^^
| echo "From: ${MYEMAIL_}" > ${VACMSG} ;\
echo "Subject: ${MYNAME_}, away from my mail" >> ${VACMSG} ;\
echo "X-Loop: myid@myhost.mydom" >> ${VACMSG} ;\
echo "" >> ${VACMSG} ;\
echo "Thank you for your email about:" >> ${VACMSG} ;\
echo "\"$SUBJ_\"" >> ${VACMSG} ;\
echo "" >> ${VACMSG} ;\
echo "Your email will be seen to when I return." >> ${VACMSG} ;\
echo "" >> ${VACMSG} ;\
cat ${HOME}/.signature >> ${VACMSG}
# Here we go ivoking vacation and also saving the email
# You might have serveral, different of these recipes
#
:0
* ^Subject:.*Whatever
{
:0
{ RULE="Testing" }
:0 cwi
* ONVACAT ?? ^^yes^^
* ! ^X-Loop:.*myid@myhost\.mydom
| ${VACATION} -t${VACFREQ} myid
:0:
WhateverFolder
}
Feedback:
Maybe I [Collin Park] can add one more comment: I think you need
a global LOCKFILE to cover the area from when you generate the
vacation message to the place where you invoke $VACATION.
Otherwise, message #N may generate .vacation.msg, then
message #N+1 overwrites it before #N invokes $VACATION.
On rare occasions I have also been asked to email my own personal ~/.procmailrc or my own spamfoiling scripts. The answer is a definite no. There are two main reasons. First, that material is private. Second, I have neither the willingness nor the time to send out material to users on individual requests. If and when I want to share my material I make it available for the users to themselves retrieve it via WWW or FTP.
Programming | |
---|---|
![]() |
Turbo Pascal
programming material
|
![]() |
MS-DOS batch programming
material
|
![]() |
Unix Bourne
shell scripts programming material
|
Etc | |
![]() |
More links to Timo's FAQ materials
|
Let's see if we can put to work the methods presented in this FAQ to solve some tasks, part of them having come up on the Usenet news.
Ex.1) Keep a copy of incoming email, and at
the same time, get only the first five lines from the message body
and forward it to another account.
# Discard potential email loops :0 * ^X-Loop: myid@myhost\.mydom /dev/null :0 * Any rule(s) you might wish to have { # Keep a copy, but don't stop yet ( the c ) :0c: ${DEFAULT} # Comment with "Old-" the Content-Length field from the header # Ensure that a whitespace exists between field name and content :0 fwh * ^Content-Length: | formail -z -i"Content-Length:" # Add the loop avoidance # ( f for piping; w for waiting for completion; h for headers ) :0 fwh | formail -A"X-Loop: myid@myhost.mydom" # Truncate the body ( the b ) to five lines :0 fwb | head -5 # Forward to the other account :0 ! myid2@myhost.mydom }It is important to handle the content-length header-field when the length of the email is altered. This is done to ensure that the receiving email program will not break the forwarded message when it is read. The -i switch is used to retain the information about the original message length to the attention of the receiver.
Ex.2) Forward the first 10 lines of the
message body to the user's second account while preserving all the
original message headers -- I.e, at the receiving side, the
user wants to see all the message travel history and only first 10
line of the message body.
This is a more complicated version of the first exercise. The
transformed task is not trivial, since when you forward, the
original message headers will be replaced by your forwarding
headers. Therefore, you'll have to see to preserving also the
original headers. Below is how I would solve the problem based on
several items in this FAQ.
# A trick to extract the subject into a variable
# Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
# The actual recipe to solve the exercise starts here
:0
* Whatever condition(s) you wish to select the messages for forwarding
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
:0c: #If you want to, preserve a full copy of the email, else omit
${DEFAULT}
:0fwh #Preserve the information about the original content length
* ^Content-Length:
| formail -z -i"Content-Length:"
:0fwb #Truncate the body of the message to ten lines
| head -10
:0fwh #Insert a blank line at the beginning of the body for clarity
| cat - ; echo ""
:0fwh #Store the original headers, quoting them to avoid problems
| sed -e 's/^/\> /'
:0fwh #Insert some of your own information before forwarding
| formail -A"X-Loop: myid@myhost.mydom" \
-A"X-Info: Forwarded body truncated to 10 lines" \
-i"Subject: $SUBJ_ (fwd)"
#Finally, forward the adjusted email
:0
!my2dnId@myhost.mydom
}
# Discard potential email loops
:0
* ^X-Loop: myid@myhost\.mydom
/dev/null
Feedback:
The recipe with head probably needs an "i" on the flags line,
as:
Ex.3) Match a potential [TS999] identification in the Subject header, such as "[TS001] Timo testing". If found, insert a "Subject id: [TS999]" as the first line in the body of the message. (The rest of the original subject line must not reappear in the id.)
:0 * ^Subject:.*\/\[TS[0-9]+\] { :0 fhw | cat - ; \ echo "Subject id: ${MATCH}" :0: ${DEFAULT} }But what if you do want to include the rest of the original subject line? In that case use
* ^Subject:.*\/\[TS[0-9]+\].*
Ex.4) Multi-part messages (which typically
include attachments) have in their headers a field like the two
examples below:
Content-Type: multipart/mixed; boundary=ELM965173874-25050-0_
Content-Type: multipart/mixed;
boundary="------------BA45271FBDAA479CECA7E20A"
Write a recipe that inserts into a variable (call it BOUND) the
boundary string. Note that the potential quotes (") are not to be
part of that string. Also note that the header might be divided on
multiple lines as in
Content-Type: multipart/mixed;
boundary=ELM965173874-25050-0_
There are alternative solutions, which not necessarily are quite
equivalent. The first one is putting high up in your ~/.procmailrc
recipe file the line(s)
BOUND1=`formail -z -x"Content-Type:" \
| awk -F= '{ print $2 }' \
| sed -e 's/\"//g' | tr -d '\n'`
A second one is:
:0h
* ^Content-Type:
{ BOUND2=`egrep -i 'boundary=' \
| awk -F= '{ print $2 }' | sed -e 's/\"//g'` }
This was not in the exercise, but you can then have recipes like
:0:
* ! BOUND2 ?? ^^^^
WhateverFolder
Ex.5) Identify if the
arriving email is in Korean. If so, return the message to the sender
and his/her postmaster. Ignore a potential Reply-To: field in the
header. Avoid email loops. Avoid forgeries which appear to come from
your own host. Avoid forgeries which lack a host name. Be careful
not to take Finnish/Swedish or French as
Korean.
This is quite a difficult exercise with many details involved.
# Get the sender's address, ignore Reply-To:
FROM_=`formail -c -I"Reply-To:" -rt -xTo: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
# Get the sender's host
FHOST_=`echo "${FROM_}" | awk -F@ '{ print $2 }'`
# Your path to sendmail
SENDMAIL="/usr/lib/sendmail"
# Reject probable Korean email using character scoring
:0
* ! ^X-Loop:.*myid@myhost\.mydom
* ! $ ? echo ${FHOST_} | fgrep -is 'myhost.mydom'
* $ ? echo ${FHOST_} | fgrep -is '.'
{
:0BD
* -1^1 .
* 2^1 =[0-9A-F][0-9A-F]
* 20^1 [¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿]
* 20^1 [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß]
* 20^1 [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
* 20^1 =[89A-F][0-9A-F]
* -20^1 [åÅäÄöÖàáâçèéêë]
* -20^1 =(E5|C5|E4|C4|F6|D6|E0|E1|E2|E7|E8|E9|EA|EB)
{
:0
{ RULE="Probable Korean email" }
#
:0c:${HOME}/procmail.lock
| expand | sed -e 's/[ ]*$//g' \
| sed -e 's/^/ /' > ${HOME}/procmail.reject.korean
#
:0:${HOME}/procmail.lock
| (formail -r -I"Subject: Autorejected email" \
-I"To: ${FROM_}" \
-I"Cc: postmaster@${FHOST_}" \
-A"X-Loop: myid@myhost.mydom" ; \
echo "--- begin rejected probable Korean email ---" ; \
echo "" ; \
cat ${HOME}/procmail.reject.korean ; \
echo "--- end of rejected probable Korean email ---" ; \
rm -f ${HOME}/procmail.reject.korean) \
| ${SENDMAIL} -t
}
}
Ex.6) If the subject of the email contains the
identifier [INFO], in capitals, put the body of the incoming email into
a temporary file. Ensure that the name of the temporary file is unique.
Insert the full subject line at the top of the temporary file. (Why, and
what then is beyond this exercise.)
#Get the subject discarding any leading and trailing blanks
#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -xSubject: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
# Assign a temporary file name
TMPFILE_=proctemp.$$
:0D
* ^Subject.*\[INFO\]
{
:0 fwbi
| echo "Subject: ${SUBJ_}" > ${TMPFILE_}; \
echo >> ${TMPFILE_}; \
cat >> ${TMPFILE_}
}
Ex.7) If the email comes from a certain
sender, check if the time-zone information is present in the Date
header. If not, add it assuming +3 hours.
#Get the date discarding any leading and trailing blanks
DATE_=`formail -xDate: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
:0
* ^From:.*TheCertainSender
* ! ^Date:.*(EET|DST|GMT)
{
:0 fwhi
| formail -i"Date: ${DATE_} +0300 (EET DST)"
:0:
${DEFAULT}
}
Ex.8) The simple spamfoling recipe below
won't work. Correct it.
:0:
* !^TO$USER@xxxxxxx.xxx
ProbableSpam.mail
:0
{
:0
{ USER=`whoami` }
:0:
* $ ! ^TO_${USER}@([-a-z0-9_]+\.)*xxxxxxx\.xxx
ProbableSpam.mail
}
The ([-a-z0-9_]+\.)* is optional.
Another solution:
:0:
* $ ! ^TO_${LOGNAME}@([-a-z0-9_]+\.)*xxxxxxx\.xxx
ProbableSpam.mail
Ex.9) Insert at the beginning of the
subject the date/time of receiving the incoming message in the
YYYYMMDD HHMMSS format.
:0
* Whatever rules
{
:0
{ SUBJ_=`formail -c -xSubject: \
| sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` }
:0
{ DATETIME_=`date "+%Y%m%d %k%M%S"` }
:0 fhwi
| formail -I"Subject: ${DATETIME_} ${SUBJ_}"
:0:
${DEFAULT}
}
Ex.10) This partly is based on an actual
incident. Consider the follwing recipe with three small, but crucial
syntax errors, and one omission. Find them.
:0
* ^From:.*(\
(abuse(-news)?|acct_closed)@
(pacificnet\.net|\
mindspring\.net|\
InfoAve\.net|\
netcom\.com\|
yahoo\.com|\
alladvantage\.com|\
hotmail\.com))
* ^TO_(myid|myFirstName\.mySecondName)@([-a-z0-9_]+\.)*myhost\.mydom
{
:0
{RULE="Abuse reception notes"}
:0
ReceivedNotes
}
The answer is a bit further down
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:0
* ^From:.*(\
(abuse(-news)?|acct_closed)@\
(pacificnet\.net|\
mindspring\.net|\
InfoAve\.net|\
netcom\.com|\
yahoo\.com|\
alladvantage\.com|\
hotmail\.com))
* ^TO_(myid|myFirstName\.mySecondName)@([-a-z0-9_]+\.)*myhost\.mydom
{
:0
{ RULE="Abuse reception notes" }
:0:
ReceivedNotes
}
Ex.11) Write a recipe to match the subject
line below. The (RECENT) may or may not be there, and the numbers
will change from posting to posting.
Subject: Re: [SpamCop:(RECENT)38.204.225.29,id:16135684] Make lotsof $$$
:0:
* ^Subject: Re: \[SpamCop:(\(RECENT\))?[0-9\.]+,id:[0-9]+\]
WhateverFolder
Ex.12) It is fairly common that spam email
has the same sender and recipient in the From: and To: fields.
Device a recipe that detects such postings.
This is not quite as simple as it first sounds, since it is advisable to take into the account the fact that the contents of the two fields may not be quite identical even in the case of the actual addresses being the same. Thus I would use regular expression matching both ways as below as one of the optional solutions. By default, variable comparisons are regular expression matching, not strict equalities. Also note avoiding email loops and falsely targeting email which one may have sent to oneself.
WHOFROM=`formail -xFrom: \
| expand \
| sed -e 's/ */ /g' \
| sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
WHOTO=`formail -xTo: \
| expand \
| sed -e 's/ */ /g' \
| sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
:0:
* -100^0 ^X-Loop: myid@myhost\.mydom
* -100^0 ^TO_(myid|myFirst\.mySecond)@([-a-z0-9_]+\.)*myhost\.mydom
* -100^0 ^From:.*LegitimateMailingList
* 1^0 $ WHOFROM ?? ${WHOTO}
* 1^0 $ WHOTO ?? ${WHOFROM}
ProbableSpam.mail
Ex.13) Write a (spam avoidance) recipe to
detect email with more than seven recipients in the "To:" header
field. Assume for simplicity that each address will have exactly one
"@" character in it.
:0
* ^Subject:.*The information you requested
{
:0
{
WHOTO=`formail -z -xTo:`
COUNT=`echo ${WHOTO} | sed -e 's/[^@]//g' | wc -c`
COUNT1=`expr ${COUNT} - 1`
ISGT=`expr ${COUNT1} \> 7`
}
:0:
* ISGT ?? ^^1^^
ProbableSpam.mail
}
Ex.14) Make procmail forward email that
arrives between 9am and 5pm to a predefined daytime email
address.
:0
# Omit the condition line below if this is for all email
* ^Subject:.*Whatever
{
:0
{
TIME=`date +%H%M`
ISGT=`expr ${TIME} \> 0900`
ISLT=`expr ${TIME} \< 1700`
}
:0
* ISGT ?? ^^1^^
* ISLT ?? ^^1^^
! daytime_forward_address
}
Ex.15) Write a Procmail recipe which
detects if there is a Word document attached to the incoming email.
# Email with a Word document attached
:0
* ^Content-Type: multipart/
{
:0 B
* ^Content-.*attachment.*name=.*\.(doc|rtf)
{
:0
{ RULE="Email with a Word document attached" }
:0:
WordAttachmentEmail
}
}
Aughey, John
Bump, Jorey
Davey, David
Dnes, Walter
Eriksson, Era
Guenther, Philip
Hebeisen, Christoph
Hirvonen, Hannu
Melish, Jacob
Menezes, Evandro
Novak, Curtis
Park, Collin
van Tol, Ruud
Van Steenkist, Vernon
Any errors and inadequacies are, however, solely my own responsibility.
A legal note: The author shall not be liable to the user, the reply
target or any third party for any direct, indirect or consequential
loss or damage arising from using, abusing, or a failure to be able
to use, the information in this message/file howsoever caused. No
warranty is given that all the information contained is correct, or
that it is current.
[ts@uwasa.fi]
[Photo]
[Programs]
[FAQs]
[Research]
[Lectures]
[Department]
[Faculty]
[University]
[Revalidate]