<http://www.uwasa.fi/~ts/info/proctips.html>
Copyright © 1999-2002 by Prof. Timo Salmi
Last modified Thu 19-Dec-2002 15:47

Visits to this page (with images on) since Oct 7 1999:

Timo's procmail tips and recipes

Although there already is an abundance of procmail material on the net, here are some of my own tips and observations. This tips page is a companion of my Foiling Spam with an Email Password System page. The items on this page are in no particular order.

I want to filter my email automatically. How do I get started with procmail?
Building a testbench. How can I test individual procmail recipes?
I know how to make "and" rules in procmail recipes, but how do I make "or" rules?
How can one perform multiple shell commands on the action line?
How can I find out what the subject of a posting is?
How do I get a copy of the headers of all the incoming email into a separate file?
Would you give some further hints for spam foiling recipes?
I have limited disk space. How can I truncate long messages?
How can I quickly test if my rules with regular expressions match?
How can I detect if the email comes, say, from the .com domain?
What alternatives do I have to detect a sender all through the various header-fields?
How can I extract a valid address from the Reply-To field?
How can I extract the address of the sender's postmaster?
How can I weed out an inordinately long recipient list?
What is this procmail scoring? How can I utilize it?
How can I test if the subject is empty or if the subject field is missing altogether?
How can I modify the "To:" field of the email I received?
I have a long list of spammers in a separate file. How can I utilize it?
How do I forward certain messages that I get, and preserve myself a copy?
How do I forward certain messages to two different addresses?
How do I automatically return certain email messages?
My address has changed. How do I forward a copy to myself and tell the sender?
How can I set variable values based on the text in the body of the email message?
How can I insert some token text in front of the body of incoming email?
Do you have any useful tips for regular expression matching?
How can I test if two procmail variables have the same contents?
I am having difficulties with "<". How does one match it?
How can I insert identification text to the beginning of the subject line?
I tried out your tips, but some of them failed on my system. What next?
Is there a cure for the echo and grep blues?
How do I know which of my many procmail recipes has been enacted?
How can I detect Korean, Cyrillic, or Chinese to avoid such frequent spam?
How can I change the subject line and include part of the message body into it?
How can I remove the signature from the incoming email?
What unix manuals relating to procmail should I get?
Is it possible to use procmail to call the vacation program?
Could you please solve for me this procmail problem of mine?
I liked this material. Do you have anything else on programming?
Exercises
Acknowledgements for useful advice and/or feedback

I want to filter my email automatically. How do I get started with procmail?

Unix email can conveniently be preprocessed with automatic filters such as procmail, the "Autonomous mail processor". This item repeats what already is presented about getting started in many of the other FAQs, including mine on spamfoiling. Nevertheless, this is so crucial that I'll try to give the essential outline also here.

Find out what your email directory is. Go ("cd") to the directory where your email folders are located and type "pwd". Assume in this item that you get "/home/myid/Mail". Further assume in the example that "/home/myid" is your home directory so that you can use the variable "${HOME}" to denote it.

Find out where your system's Bourne shell is located by typing "which sh". Assume that you get "/usr/bin/sh".

Prepare a "~/.procmailrc" file with a suitable editor. For example you might use "emacs ~/.procmailrc". To start with, put something like this into the ~/.procmailrc file:

#Preliminaries
SHELL=/usr/bin/sh               #Use the Bourne shell (check your path!)
MAILDIR=${HOME}/Mail            #First check what your mail directory is!
LOGFILE=${MAILDIR}/procmail.log
LOG="--- Logging ${LOGFILE} for ${LOGNAME}, "

#Whatever recipes you'll use
#The order of the recipes is significant
:0
* ^From: scam@cyberspam\.com
/dev/null

# Accept all the rest to your default mailbox
:0:
${DEFAULT}

For the "~/.procmailrc" file a read permission for the user him/herself will be sufficient. To ensure, give the command "chmod u+r ~/.procmailrc".

Find out where the "procmail" program is located on your system by typing "which procmail". Assume below that you get "/usr/local/bin/procmail". Also check what your id is: "whoami". Assume that you get "myid".

Next comes the crucial step. Put the following line in your "~/.forward" file. Include the quotes (") into the ~/.forward file contents.

"|IFS=' ' && exec /usr/local/bin/procmail || exit 75 #myid"

Set adequate permissions for accessing the "~/.forward" file: "chmod 644 ~/.forward". Lastly, check ("ls -lFd ~/") that your main directory permissions are at least (the equivalent of) "drwx--s--x". If not, "chmod u+rwx ~/" and "chmod og+x ~/".

You should now be set to go. To check, send an email to yourself to see if it gets through. If there is a problem see the advice on troubleshooting.

How can I test individual procmail recipes? I do not wish to disturb my regular ~/.procmailrc recipes file in the process.

There are several options. One method is building a simple test environment as follows. It is a very convenient method. If you apply it right, it allows the testing without affecting your normal flow of email in any way. Create the following "proctest" file, preferably at your path. Make it executable using "chmod u+x proctest". Thus you'll have a new command "proctest" available.

#The executable file named "proctest"
#!/bin/sh
#
# You need a test directory.
TESTDIR=/home/myid/test/
if [ ! -d ${TESTDIR} ] ; then
echo "Directory ${TESTDIR} does not exist; First create it"
exit 0
fi
#
#Feed an email message to procmail. Apply proctest.rc recipes file.
#First prepare a mail.msg email file which you wish to use for the
#testing.
procmail ${TESTDIR}/proctest.rc < mail.msg
#
#Show the results.
less ${TESTDIR}/Proctest.log
clear
less ${TESTDIR}/Proctest.mail
#
#Clean up.
rm -i ${TESTDIR}/Proctest.log
rm -i ${TESTDIR}/Proctest.mail

The beauty of this method is that besides "proctest.rc" you can easily edit also "mail.msg" for testing different kinds of incoming mail and the behavior of your recipes in various situations. Note, however, that it is best to test only for one email message at a time. In other words, do not put more than one email message into the mail.msg test file.

A question remains. Where does one get the structure of a posting for the "mail.msg" test posting? Easy. Invoke elm, select a suitable, existing posting, and make a copy of it to "mail.msg" by pressing C (capital C) and reply mail.msg to "Copy message to:". Other mail programs probably have similar options.

Below is the proctest.rc recipe file which I used in preparing for this item:

SHELL=/bin/sh
TESTDIR=/home/myid/test
MAILDIR=${TESTDIR}
LOGFILE=${TESTDIR}/Proctest.log
LOG="--- Logging for ${LOGNAME}, "

#Troubleshooting:
VERBOSE=yes
LOGABSTRACT=all

#Let's test stripping lines from the email message's header
:0 fwh
| egrep -vi "(^Content-|^MIME-Version:.)"

#If it is from myself, store the email message
:0:
* $ ^From:.*${LOGNAME}
${TESTDIR}/Proctest.mail

#Otherwise, discard the email message
:0
/dev/null

Feedback: The header stripping does not work if any of those header lines is continued. It is almost always an error to use grep/egrep/fgrep when filtering a message header. A better recipe would be the following, utilizing formail:

#Let's test stripping lines from the email message's header,
#but only when they're there
:0 fwh
* ^(Mime-Version:|Content-)
| formail -IMime-Version: -IContent-

To continue myself. The flags are as follows: "f" use the pipe as a filter, "w" execute before proceeding, "h" it is about the header of the email message.

The formail -I switch means that if the field is found it is to be replaced with a similar field with and "Old-" prefix, provided that the field is not empty (if it is empty the field is removed).

I know how to make "and" rules in procmail recipes, but how do I make "or" rules?

Just in case, let's first revisit an "and" rule by a common example:

#Trivial catching of potential spam towards the end of a ~/.procmailrc
#Place only after accepting all the mailing lists you want to receive
:0:
* ! ^TO_ts@([-a-z0-9_]+\.)*uwasa\.fi
* ! ^TO_timo\.salmi
${HOME}/.mail/PotentialSpam.mail

For entering an "or" rule, consider the following example:

#Accept email from Era Eriksson, the author of the major procmail FAQ
:0:
* ^From:.*reriksso@([-a-z0-9_]+\.)*helsinki\.fi|\
  ^From:.*era@iki\.fi
${DEFAULT}

Let's look at a few details:

The "^TO_" in the first recipe is a procmail reserved predefined special expression "which should catch all destination specifications containing a specific address." It must be written in upper case.
The "!" in the first recipe is the familiar operator indicating a negation.
If "${HOME}/.mail" is your mail directory you don't need to spell out the entire path "${HOME}/.mail/PotentialSpam.mail". Just "PotentialSpam.mail" will be sufficient.
The first detail of the "or" example is complicated and is per se unrelated to the "or" issue at hand. The "([-a-z0-9_]+\.)*" expression in "reriksso@([-a-z0-9_]+\.)*helsinki\.fi" sees to it that if Era has several machines in his domain (as I do under mine), all will be matched by the recipe. The "[-a-z0-9_]" matches any of the characters within the brackets "[]", the trailing "+" tells that there must be at least one repeat of those characters, the "\." matches a dot, and the "*" tells that there has to be zero or more repeats if the preceding expression within the parentheses "()". [This item owes heavily to Era's friendly guidance.]
The backslash "\" in "helsinki\.fi" sees to it that the the actual dot (.) is matched. This is because if the "quote next character" "\" is omitted, the "." is taken as a regular expression matching any (exactly one) character.
The "|" in the "|\" indicates an "or" condition, and the "\" quotes the embedded end of line, i.e. tells that the rule is continued on the next line.
The "|" or condition sees to it that the recipe matches email coming from Era either from the "helsinki.fi" or the "iki.fi" domain.
The "${DEFAULT}" puts the email in the regular mailbox.
The trailing ":" in the recipe start line ":0:" tells procmail to use temporary file locking to avoid writing simultaneously arriving potential email on top of each other at your "${DEFAULT}" mailbox. Since no lock file name is given after the ":0:", procmail will provide the lockfile name. Always use this format when delivering to a mail folder, unless the target folder is /dev/null. That is, unless you want the email is discarded.

There are alternatives. Scoring could be used for the same purpose

:0:
* 1^0 ^From:.*reriksso@([-a-z0-9_]+\.)*helsinki\.fi
* 1^0 ^From:.*era@iki\.fi
${DEFAULT}

Likewise, you could alternatively use ( ) grouping

:0:
* ^From:.*(\
reriksso@([-a-z0-9_]+\.)*helsinki\.fi|\
era@iki\.fi)
${DEFAULT}

Feedback: That condition looks a bit ugly to me. Let me refrase it to show you what I mean:

* ^From:.*(reriksso@([-a-z0-9]+\.)*helsinki|era@iki)\.fi

(an underscore can not be part of a hostname, as far as I know.)

Yes, many of the rules presented in this FAQ can be written more concisely and/or effectively. The rules, as presented in the FAQ, are often formulated for easier understanding than efficiency. But it is useful to improve on the efficiency after one first has got the basic logic of a rule outlined.

How can one perform multiple shell commands on the action line?

See the action line below (i.e. the one starting with the "|" pipe). Separate the commands with "&&". If you wish to continue on a second line for readability, apply "\" Alternatively, just one long line could have been used. The recipe below is from a test with the testbench, so it's purpose is just to show this method of giving multiple commands.

#Test if the message has a "Subject:" header and has a subject in it
#(The brackets [] contain a space and a tab)
:0:${TESTDIR}/Proctest.mail.lock
* ^Subject:
* ^Subject:[ ]*\/[^ ].*
| echo "A ^Subject: header found with" >> ${TESTDIR}/Proctest.mail &&\
echo "${MATCH}" >> ${TESTDIR}/Proctest.mail

Likewise, a single command can be subdivided for easier documentation:

| echo "A ^Subject: header found but there is no subject" \
>> ${TESTDIR}/Proctest.mail

Below is another example with a slightly different syntax using the semicolon ";" as the separator. The example also demonstrates how to save diskspace by zipping email from a particular source. You'll need Info-ZIP's zip and unzip in order to be able to apply it. (They are available from the proper Unix section of Garbo program archives at the University of Vaasa, Finland.)

:0w:Test.mail.lock
* ^From:.*test
| unzip ${HOME}/mail/Test.zip; \
  cat >> Test.mail; \
  zip -oj9 ${HOME}/mail/Test.zip Test.mail; \
  rm -f Test.mail

What happens on the action line is this:

The potentially existing "Test.zip" zip-file is unzipped to obtain the earlier email messages that already might be within Test.zip.
The incoming email is appended to the extracted Test.mail file.
The updated Test.mail file is compressed back into the Test.zip zip-file.
The uncompressed Test.mail is deleted.

To be on the safe side procmail is told to wait (the "w" flag in ":0w:Test.mail.lock") until the pipe ("|") has been performed.

How can I find out what the subject of a posting is?

Now is a good time to utilize my testbench in order to find out if a logic works. Build a /home/myid/test/proctest.rc file.

SHELL=/bin/sh
TESTDIR=/home/myid/test
MAILDIR=${TESTDIR}
LOGFILE=${TESTDIR}/Proctest.log
LOG="--- Logging for ${LOGNAME}, "

First, a few environment variables are included.

#Troubleshooting:
VERBOSE=yes
LOGABSTRACT=all

The above means: Use full reporting for the debugging.

#An auxiliary regular expression to detect text,
#The brackets [] contain a space and a tab
GETTEXT="[  ]*\/[^  ].*"

If the same expression is used several times in a recipe file, it is convenient to put the expression into an environment variable instead of writing it out repeatedly.

The first part "[ ]*" of the regular expression matches any number of spaces and tabs (even the case of none) which can lead the subject.
The "\/" is a special procmail-only operand which puts a (possible) match found by the rest of the expression into a variable named MATCH.
"[^ ]" means all other characters but the one's within the brackets. The ".*" means that a match of non-tab, non-space characters is sought for.

#Test if the message has a "Subject:" header and has a subject in it
:0c:${TESTDIR}/Proctest.mail.lock
* ^Subject:
* $ ^Subject:${GETTEXT}
| echo "A ^Subject: header found with" >> ${TESTDIR}/Proctest.mail &&\
echo "${MATCH}" >> ${TESTDIR}/Proctest.mail

The "c" flag in ":0wc" tells that the processing should continue also after this particular recipe has been acted upon. (When the "c" flag is not present, the all the rest of the recipes in proctest.rc are all skipped.) The "w" tells to wait until the "|" pipe has finished.
The ":${TESTDIR}/Proctest.mail.lock" tells which lockfile to use in order to avoid the confusion from the possibility of simultaneous arrival of several email messages. Note that since we use a pipe "|" in the actions part, it is prudent to explicitly give the name of the lock.
Note the first "$" on the "$ ^Subject:${GETTEXT}" condition line. It tells that the environment variables (in this case "GETTEXT") on the line are to be expanded, not to be taken as literal text.

#Test if the message has a "Subject:" header but has no subject in it
:0c:${TESTDIR}/Proctest.mail.lock
* ^Subject:
* $ !^Subject:${GETTEXT}
| echo "A ^Subject: header found but there is no subject" \
>> ${TESTDIR}/Proctest.mail

#Test if the message has a "Subject:" at all
:0c:${TESTDIR}/Proctest.mail.lock
* !^Subject:
| echo "No ^Subject: header was found" >> ${TESTDIR}/Proctest.mail

#Otherwise, discard the message
:0
/dev/null

After the recipes above have been testbenched and cleared, you know that the methods used in them will work for you in your own environment.

Of course, there are other options for extracting the subject into an environment variable. One is to utilize "formail" which is a companion to the procmail program. If you include the following expression at the beginning of your ~/.procmailrc recipes file, you will have the variable ${SUBJECT} available for the rest of the recipes file.

#Environment variables for procmail
#
#Get the subject
#Discard some dangerous special chars + any leading and trailing blanks
SUBJECT=`formail -xSubject: \
| sed -e 's/[;\`\\]/ /g' \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

For an example of usage see the Foiling Spam with an Email Password System page.

Feedback: Extracting the header from inside procmail using the \/ token is _much_ faster than the formail solution.

Feedback: If the SUBJECT variable is left empty, apply quotes on the first line, i.e.
SUBJECT=`formail -x"Subject: "\

How do I get a copy of the headers of all the incoming email into a separate file?

You can use

#Header logging
:0hc:${HOME}/.mail/Procmail.head.lock
| cat >> ${HOME}/.mail/Procmail.head

The "h" flag in ":0hc" tells that the header should be accessed.
The "c" flag in ":0hc" orders the processing to continue also after this recipe. In other words, you put your other recipes, after the header-catching, in the ordinary fashion. The email will reach them.
The ":${HOME}/.mail/Procmail.head.lock" tells which particular lockfile to use.
Since there are no condition lines (lines starting with *) this item will always be acted upon when it is reached. You wanted to log the headers from all the incoming email, right?
The "| cat >> ${HOME}/.mail/Procmail.head" appends the headers to the ${HOME}/.mail/Procmail.head file.

Feedback: Since appending to a file is the result of a normal mailbox delivery, that can be more efficiently written as simply:

:0 hc:
$HOME/headers.cut

That eliminates a cat and a shell process, plus the pipe and extra reads and writes.

Now, if you want to overwrite the file with each new message [or do some further shell operations within the pipe], then the cat command is a reasonable choice.

[A further point] That would have been an odd name for the lockfile. Why not $HOME/headers.cut.$LOCKEXT?

Would you give some further hints for spam foiling recipes?

Besides what is on my page Foiling Spam with an Email Password System and a separate item on detecting the sender, below are some instructive little tricks.

Perhaps the strongest generic trick against spam is to shirk any email that is not addressed to you directly, since most spam is addressed to some kind of mailing lists. Of course, you first will have to accept email from any legitimate mailing list which you have subscribed to. If you put a suitable recipe after your recipes that accept the legitimate email lists much of the incoming spam will be caught. Below is a simplified And a bit munged) version of what I do in my own ~/.procmailrc:

#Catch potential spam
:0
* !^TO_(ts|timo\.salmi)@([-a-z0-9_]+\.)*uwasa\.fi
{
  :0 fwh
  * ^Content-Length:
  | formail -IContent-Length:
  :0:
  Spam.mail
}

If you look carefully through this page, you'll find explanations for all the details in the above recipe. It will be a good exercise to do so. :-)

Since so much, if not practically all spam comes from forged sender addresses it is much more effective to block certain suspect email routes than to try to match the elusive spammers. The scoring recipe example below treats as spam all email that is routed via dialsprint.net and that is not addressed to "me" personally.

#Spam avoidance of certain routes and if not for me personally
:0:
* -1^0
*  1^0 ? formail -x"Received:" | egrep -is "dialsprint\.net"
*  1^0 ! ^TO_(myid|myFirstName[ _\.]myLastName)@([-a-z0-9_]+\.)*myhost\.mydom
Spam.mail

The "?" at the start of the condition executes and evaluates what is on the condition line instead of searching for a literal match.
Procmail's companion program formail is used to extract all the "Received:" routing information from the posting's header. Then "dialsprint.net" is sought for using Unix egrep via the "|" pipe.
This is a sideline, but the simpler, less general form of the last condition line would, of course, be just "* 1^0 ! ^TO_myid@myhost\.mydom"
The scoring system is explained elsewhere on this page, but in brief the score is initialized at -1. Each explicit condition is given a weight of 1. If the total score is at least 1 (i.e. positive) then the action (storing to the Spam.mail file) is initiated.

Fairly often there is a tell-tale exhortation to email to a remove@ or a removeme@ address within the actual message. As you may know, these are just common ploys of the spammers to get your address confirmed to make matters even worse for you.

:0B:
* (remove@|removeme@)
PotentialSpam.mail

The "B" flag tells the recipe to search through the body of the email message.
Note the "or" testing on the conditions line.
Note again the file locking (the trailing : in ":0B:"). Since the email message is directed to a folder, we do not need explicitly to name the lockfile. We can let procmail do it. As a default it will use the name PotentialSpam.mail.lock
The "B" means the body and only the body of the message. The header is not included. However, I have as hearsay that some procmail versions have a bug in this respect, but I have not been able to test that situation myself.

The subject line of the allegedly more respectable [sic] unsolicited advertising has an "ADV" marker in upper case on the subject line. (For an imaginary legitimacy such spammers occasionally attach some xenophobic quibble about U.S legislation, not very relevant on the international Internet.)

:0D:
* ^Subject:.*ADV
PotentialSpam.mail

The "D" flag tells to distinguish between the lower and the upper case in testing for a match.

There are some obvious code words that tend to appear on the subject line, such as "make money fast" and "$$$".

:0:
* (^Subject:.*make.*money.*fast|^Subject:.*\$\$\$)
PotentialSpam.mail

Note, not "^Subject:.*$$$", but "^Subject:.*\$\$\$" because, if not quoted with "\", a "$" is taken as a regular expression indicating the end of line.
Other typical subjects which you might wish to catch include such as
- cable descrambler
- FOR SALE
- laser printer toner
- million email addresses
- ONLY $
- Quit Your Job
Other typical contents include such as
- absolutely no obligation
- call now 24 h
- to be taken off our list

Don't overdo it, though, lest you end up weeding also some legitimate email.

Feedback: The regexp:

(remove@|removeme@)

is much slower than

remove(me)?@

Having the 'top-level' of the regexp be a alternation (via '|') slows down matching by quite a bit. The more that can be factored out at the beginning of the regexp, the better. The same goes for the recipe that matches against the Subject: header field:

^Subject:.*(make.*money.*fast|\$\$\$)

is faster than:

(^Subject:.*make.*money.*fast|^Subject:.*\$\$\$)

My comment: Of course it is commendable to be efficient, especially where easy understanding is not compromised. However, if the two clash, I often prefer clarity of expression and convenience over a strict maximization of code efficiency. Don't we have our powerful modern computers to perform our tasks for us, not vice versa :-). (This is not about the particular feedback above. The improvements are useful. They are both legible and instructive.)

More feedback: The "* ^Subject:.*ADV" rule is overly simplistic and catches many non-spam subjects. Maybe rather something like "* ^Subject:\<*ADV\>"

My comment. Ok. Let's try

:0D:
#(The brackets [] start with a space and a tab)
* ^Subject:.*([ \{<]+)ADV([ :\}>]+|$) |\
^Subject:.*(\[+)ADV(:)?(\]+|$)
PotentialSpam.mail

It is far from perfect, but it should work reasonably well for regular purposes. Spam detection requires experimenting anyway. Regular expressions are not easy. They are quite a large subject area of their own.

The above assumes that the is (at least) one space after the "Subject:" header before the subject begins. This can be ensured by first applying "formail -z" which you can have high up your ~/.procmailrc. For example I have the upper two lines in mine.

:0 fwh
| formail -z -iContent-Length:

:0D:
* ^Subject:.*([ \{<]+)ADV([ :\}>]+|$) |\
^Subject:.*(\[+)ADV(:)?(\]+|$)
PotentialSpam.mail

See the other items in this tips file for an explanation of the "fwh" flags. The formail program with the "-z" switch will insert the desired blanks into the header. The "-iContent-Length:" switch (which is outside the theme of the current item) will replace the Content-Length: headers with Old-Content-Length: headers.

I use a slightly different recipe in my own ~/.procmailrc recipes file:

:0D
* ^Subject:.*([ ]|<|\[)ADV([ ]|>|:|\]|$)
{
  :0
  { RULE="Catch potential spam by detecting an ADV keyword" }
  :0
  /dev/null
}

If you wonder about the "RULE" variable, see the item about logging which rules have been used.

On to a different facet. Some ISPs (Internet Service Providers) do now allow numbers in the email addresses. Thus, you may identify some of the forged spam by catching a violation in this respect. The following recipe catches email with numbers in the user id before the @ mark from the all the various nodes on "respectable.net".

:0:
* ^From:.*[0-9].+@([-a-z0-9_]+\.)*respectable.net
PotentialSpam.mail

Date: Thu, 19 Dec 2002 10:44:44 +1000
From: Philip Gunter
To: Timo Salmi
Subject: A procmail tidbit

Hi Timo, thanks for your excellent procmail reference.

Here is a small recipe you might like to add to your site.
It limits the number of emails being forwarded from an account,
useful to stop sms storms.

Cheers,
Philip.

:0
{
  :0
  {
    # remove any sms-alert files older than 5 minutes
    GLOP_=`find /var/tmp/sms -name sms-alert\* -cmin +5 -exec rm -f {} \;`

    # Create an sms-alert file for this message.
    GLOP_=`touch /var/tmp/sms/sms-alert$$`

    # Count the number of sms-alert files
    COUNT=`ls /var/tmp/sms | grep sms-alert | wc -l`
    COUNT1=`expr ${COUNT}`

    # Check if number of alerts in the last 5 minutes is less than 2

ISLT=`expr ${COUNT1} \< 2`

  }
  :0:
  # if the expression is true then forward the email
  * ISLT ?? ^^1^^
    ! 0123456789@pager.net
}

I have limited disk space. How can I truncate long messages?

Before we proceed any further, there is a very important email feature to observe. If you alter the content-length of a message it is highly advisable first to discard any "Content-Length:" lines from the email's header. If you fail to do that, there is the danger that next time you read the relevant email folder your email program will break your folder because of erroneous length information. Many email programs are brain-dead that way.

#Truncate messages longer than 4000 bytes to 100 lines
:0
* > 4000
{
  :0 fwh
  * ^Content-Length:
  | formail -IContent-Length:

  :0:Truncated.mail.lock
  | head -100 >> Truncated.mail
}

Some details:

The "* > 4000" matches email messages longer than 4000 bytes.
The already familiar set of flags "fwh" tells to treat the email's header.
Use formail to ensure removing even complicated "Content-Length:" lines.
The above also serves as an example of "block nesting", i.e the rules and actions between the braces "{ }".

Let's expand the recipe a bit.

#Truncate messages longer than 4000 bytes to 100 + 10 lines
:0
* > 4000
{
:0 fwh
  * ^Content-Length:
  | formail -IContent-Length:

  :0c:Truncated.mail.lock
  | head -100 >> Truncated.mail &&\
    echo "-:-:-:- (snip) -:-:-:-" >> Truncated.mail

  :0:Truncated.mail.lock
  | tail +101 | tail -10 >> Truncated.mail
}

A few observations:

The first 100 lines are included. So are the last 10.
The above also exemplifies giving multiple commands. Recall that a standard recipe only allows one action line.

Another option is to compress the incoming email instead of truncating it.

How can I quickly test if my rules with regular expressions match? The fuller procmail testbench is a bit heavy a machinery for quick testing.

Let's see. A lite version of the testbench could be the following. Put the rules you wish to try out in a "greptest" file of your rules with egrep since procmail matching closely (but not quite!) follows egrep's. Make the file executable with "chmod u+x greptest". Then make a "mail.msg" file with the texts you wish to try to match (or not to match). Thus you might have:

#The executable file named "greptest"
#!/bin/sh
egrep -i '(ts|timo\.salmi)@([-a-z0-9_]+\.)*uvasa\.fi' mail.msg
#
#Allow a quick visual comparison on the screen
echo ""
cat mail.msg

#The mail.msg target file with the trial text for the matching
ts@uvasa.Fi
ts@loisto.uvasa.fi
Timo.Salmi@uvasa.Fi
Timo.Salmi
null@uvasa.fi

Then, just give the command "greptest" and visually compare the outputs.

Miscellaneous notes:

There are some special differences between procmail extended matching rules and the egrep expressions. Thus under special circumstances they do not match the regular expressions quite the same way. This might raise occasional confusion. See "man procmailrc" for the details.
You can also test egrep regular expressions on your PC since egrep clones are available from the Garbo program archives. For example you might try gnuegrep.zip, egrep.zip and dgrep.zip.

How can I detect if the email comes, say, from the .com domain?

I have been baffling over this item myself, because it is not as trivial as it first appears. The catch is that the ".com" is exactly at the end of the address. The problem naturally is that in the email headers there can be text after the email address, such as the sender's name. E.g.

From: scam@cyberspam.com (The Big Bad Spammer)

The first solution that comes to mind is the following, but it is not entirely accurate.

:0:
* ^From:.*\.com
* !^From:.*\.com\.
* !^TO_(ts|timo\.salmi)@([-a-z0-9_]+\.)*uwasa\.fi
ProbableComSpam.mail

The first condition line matches a ".com" anywhere on the "From:" address line. It would match, for example, email from "someone@my.company.net".
The second condition line tries narrow the condition down, but it still would match e.g. "someone@my.ispcom.net". (Or would it? Anyway, the recipe is not quite accurate.)
The third condition line is just standard spam avoidance, not necessarily related to the task at hand. It is just that much, if not the majority of spam appears to involve .com addresses.

Quite possibly there are better solutions, but below is what I came up with for hopefully an accurate match:

# Get the sender's address
# Discard any leading and trailing whitespaces
FROMADDR_=`formail -rt -xTo: \
           | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

# Test if the email came from the .com domain
:0:
* $ ? echo ${FROMADDR_} | egrep -is '\.com$'
ComDomain.mail

Let formail take care of finding out from the headers what the sender's address is. Get rid of any leading and/or trailing white spaces using "expand" for tabs and "sed" for the remaining spaces. You should have this definition high up in your ~/.procmailrc
The "$" on the condition line tells to expand any variables on the line. In this case the "${FROMADDR_}" instead of taking in literally.
As far as I understand, the "?" executes a line (and tells to transmit an exit code, but that is beside the current point). BTW, if you have the procmail extended diagnostics on ("VERBOSE=yes") you can get in your procmail logfile a sinister looking "Program failure (1)". Don't panic. It just is egrep's exit code telling that no match was found for that particular email message, i.e. that it was not from the ".com" domain.
The condition line echoes the stripped email address to "egrep" in order to test if there is a match. The "-i" switch is used since email addresses are case insensitive. The essence of the "egrep" is the trailing "$" matching the end of the extracted address. The "-s" switch tells egrep to work silently, i.e. only to give the return code.

There is one small convenience in the first, inaccurate recipe version. It is easy to include several domains into the same recipe. For example:

:0:
* ^From:.*\.hk|\
  ^From:.*\.kr|\
  ^From:.*\.tr
* !^From:.*\.hk\.|\
  !^From:.*\.kr\.|\
  !^From:.*\.tr\.
* !^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail

An aside: You could also utilize a more condensed format:

* ^From:.*\.(hk|kr|tr)

(Condensing the rest of the above recipe is left as an exercise.)

Using scoring is one option. The recipe could also be rewritten as

#Define getting the sender's address
#Discard any leading and trailing whitespaces
FROM_=`formail -rt -xTo: \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

#Whatever other recipes in between.

#Spam screening of certain susceptible domains
:0:
* -1^0
*  1^0 $ ? echo ${FROM_} | egrep -is '\.hk$'
*  1^0 $ ? echo ${FROM_} | egrep -is '\.kr$'
*  1^0 $ ? echo ${FROM_} | egrep -is '\.tr$'
*  1^0 !^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail

There also is the option

What alternatives do I have to detect a sender all through the various header-fields?

If we only look at the "From:" field in the header we have the familiar:

#Accept all email from myself, weed out autoreplies
:0:
* ^From:.*myid@([-a-z0-9_]+\.)*myhost\.mydom
* ! ^X-Loop: myid@myhost\.mydom
${DEFAULT}

Next, let's extend the matching to more fields in the header:

:0
* ? formail -x"From" -x"From:" -x"Reply-To:" -x"Errors-To:"\
    | egrep -i "scam@cyberspam\.com"
/dev/null

The "?" at the start of the condition executes and evaluates what is on the condition line instead of searching for a literal match.
Use formail to extract from the headers.
The "-x" switch means extract the contents a headerfield from the header. Formail is convenient (also) because it can concatenate the potential continuation lines in a headerfield.
Pipe the results to "egrep" regular expression search. The "-i" switch tells egrep to ignore the lower/uppercase status of the target string.
Incidentally: Since we discard the email message to "/dev/null", file locking ":0:" must not be used.

We can utilize a predefined expression to match the header fields. The clever "FROM" expression below comes from Jari Aaltonen's procmail material.

FROM="^(From[ ]|(Old-|X-)?(Resent-)?(From|Reply-To|Sender):)(.*\<)?"
#(whatever else in between)
:0
* $ ${FROM}scam@cyberspam\.com
/dev/null

The first "$" on the condition line tells that the environment variable(s) on the line are to be expanded, instead of taking all the text on the condition line literally.

You may go even further in your detective work and include the information from the header's "Received:" lines. That is, you also can detect if something what you wish to avoid is along the route where the email came from.

:0
* ? formail -x"Received:"\
    | egrep -i "cyberspam\.com"
/dev/null

Spam email is sometimes indicated by a missing or an empty "From:" line in the header. Furthermore, the "From:" line might contain an empty <> instead of having a proper address within the <>. Using scoring we might have something like

:0:
* 1^0 ^From:([ ]$|$)
* 1^0 ! ^From:
#A catch: Don't use here the word-boundary operators \< \>
#Use just the plain <>
* 1^0 ^From:.*<>
NoFrom.mail

Under a worst-case scenario, the various sender headers might all be empty. To test for this unlikely eventuality we can utilize the fact that formail would put a "foo@bar" into the "FROM_" under such circumstances.

# Define getting the sender's address
# Discard any leading and trailing whitespaces
FROM_=`formail -rt -xTo: \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

# Test if the sender could not be identified at all
:0:
* FROM_ ?? foo@bar
NoSender.mail

As always, there are several alternatives to solving a problem. Consider a potential case where a spammer poses as the mailer-daemon but the "From:" header is either missing or total gibberish. How to detect this situation? The second condition in the recipe below ensures that there is "From:" line in the header, and that it has some elementary validity.

:0:
* ^From[  ]*MAILER-DAEMON
* ! ? formail -x"From:" | egrep -is "[a-z]"
ProbableSpam.mail

The first condition is to check the first From line in the header.
The [] contains a space and a tab.
In the second condition the "!" is the familiar operator indicating a negation.
The "?" tells to execute and evaluate what is on the condition line instead of searching for a literal match.
formail's -x"From:" extracts the From: header contents (without the field name).
Unix egrep is used to test whether the "From:" field exists and contains at least one ordinary letter, upper or lower case ("i"), working silently ("s").

How can I extract a valid address from the Reply-To field, and that field only?

One trick is to utilize the following variable definition letting formail do the worrying about the proper address format.

REPLYTO_=`egrep "^Reply-To:" | head -1 \
         | formail -c -rt -xTo: \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

Assume that indeed you strictly want the address from a "Reply-To:" header. No address in any other header will do. Use egrep to extract the "Reply-To:" header field from the incoming email.
head -1 ensures that only the first occurrence of a "Reply-To:" in the message counts.
formail -c -rt -xTo: is a standard, special trick to form a return address. The key is the -r switch which "generates an autoreply header". The "-c" switch concatenates any continued fields in the header.
If no "Reply-To:" header is found in the email message, foo@bar will be returned as the address.
The last line removes any leading and trailing tabs and blanks from the address.

If you put the REPLYTO_ definition high up in your ~/.procmailrc you will have the variable available to the rest of your recipes.

Feedback: Let me suggest this:

REPLYTO_=`formail -cXReply-To: | head -1 | formail -rtzxTo:`

"formail -cX" rather than "egrep" in case the header has a different capitalization -- or if the real address is on a continuation line.
formail "-z" flag to avoid "expand" and "sed".

Timo's further comments:

The "-c" switch concatenates continuation lines.
The "-X" switch extracts the header field, preserving the field name.
The "-rt", "-x" and "To:" trick prepare a return address.
The "-z" switch ensures that a whitespace exists between field name and content.
If the Reply-To: header field is empty or missing, the value of the REPLYTO_ variable will be foo@bar

How can I extract the address of the sender's postmaster?

Put these definitions high up in your ~/.procmailrc :

#Get the sender's address, the generic version
FROM_=`formail -rt -xTo: \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

#Get the sender's host
FHOST_=`echo "${FROM_}" | awk -F@ '{ print $2 }'`

#Build the postmaster's address
FMAST_="postmaster@${FHOST_}"

Thus, you have the postmaster's alleged address available as ${FMAST_} from this point on in your recipes file. Note, however, that all validity testing of the address is missing.

What happens in the FROM_ formula:

At e quick glance it may appear that the "From:" header and the "To:" header have been confused in the formula, but this is not the case. The formail program is asked to ("-r") to prepare a reply header to send email back to the sender. Then that return address is extracted. That is why we have a "-xTo:" since we want to extract where the reply would be sent. That is where we assume that the email came from.
In the pipe "expand" is used to replace potential tabs with spaces, and "sed" is used to omit any leading and trailing white spaces.

Formail uses a certain priority order in preparing the reply header. If there is a "Reply-To:" field in the header, the "FROM_" variable will contain that address. In same cases one may wish to ignore that field for example to prevent malicious relaying. Here is the how:

#Get the sender's address, ignore Reply-To:
FROM_=`formail -I"Reply-To:" -rt -xTo: \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

How can I weed out an inordinately long recipient list? I am one of the recipients of a very useful professional mailing list, but it lists in its "To: " header field all the recipients to the list. Furthermore, it repeats the messages in HTML format. The text format is sufficient for me.

The (only slightly modified) example below is based on a true situation from my own ~/.procmailrc.

#Ensure a whitespace exists between field name and content
#Comment "Old-" the Content-Length field from all the headers
:0 fwh
| formail -z -i"Content-Length:"

#(whatever else in between)

:0
* From:.*the-mailing-list-maintainer
* ^TO_the@first\.recipient\.edu
{
  :0 fw
  | formail -I"To:" -I"X-" -I"Content-Type:" -I"MIME-Version:"\
    -A "To: Maintainer's long recipient list suppressed" \
  | sed -e '/^This is a multi-part /,/^Content-Transfer-Encoding: /d' \
        -e '/------=_NextPart_/,$d'

  :0:
  ${DEFAULT}
}

There are two condition lines.
- Match if it is from the mailing list maintainer.
- Match if it is for the full mailing list and not only to me personally from the maintainer.
Feed ("f") the email message to a pipe of several lines. Tell procmail to wait ("w") for the pipe to finish.
- Let formail weed out superfluous fields.
- Append a very brief "To:"-field for your information.
- Let sed take out any special format information.
- Let sed weed from the start of the HTML part to the end of the message
This example shows the principles, but it is based on the established format of the postings on the particular mailing list. Therefore it is not applicable as such, but you'll have to customize and test it for your own situation. (See the items on test methods on this page.)

What is this procmail scoring? How can I utilize it?

This is a somewhat complicated subject with material dispersed throughout the various procmail FAQs. Basically scoring is a method to count how many of the conditions are fulfilled in a recipe and if the "score" is positive, that is the score is 1 or more, the action line in the recipe will be performed. There is much, much more to scoring, but this is a good starting point.

Consider the following simple spam foiling recipe. It will put the email into the ProbableSpam.mail file if the score adds up to at least to one. If the first condition is met, 1 is added to the score. Ditto for the second condition. Thus if either of the tell-tale spam signals occur, the score will be positive (that is greater than zero) and the action (storing the email message into the ProbableSpam.mail file) will be enacted.

:0:
* 1^0 ^Subject:.*make money fast
* 1^0 ^Subject:.*\$\$\$
ProbableSpam.mail

The example above uses equally-weighted scoring. One can also have unequal scores. Below, a hit of the second condition gives two points while a hit of the first only gives one.

* 1^0 ^Subject:.*make money fast
* 2^0 ^Subject:.*\$\$\$

Scoring can be used to build some extremely trivial artificial intelligence into the recipes. Consider the following

:0:
* -1^0
*  1^0 ^Subject:.*money
*  1^0 ^Subject:.*fast
*  1^0 ^Subject:.*\$\$\$
ProbableSpam.mail

The initial score is set at -1. Thus at least two of the subsequent conditions have to be met in order for the entire recipe to match. If none or only one of the conditions is met, the score will not rise above zero.

An alternative formulation of scoring to foil spam is given below. This time it is required that at least three of the score-condition lines match. (The [] contain a space and a tab, as usual.)

:0:
* ^Subject:[ ]*\/[^ ].*
* -2^0
* 1^0 MATCH ?? ()\<easy\>
* 1^0 MATCH ?? ()\<fast\>
* 1^0 MATCH ?? ()\<(cash|money)\>
* 1^0 MATCH ?? \$\$\$
ProbableSpam.mail

procmail \/ operand is used to extract the subject of the email into the reserved MATCH variable.
Variables testing "??" is used.
Word matching is used applying the word boundaries "\<". Thus "fast" would be matched, but not "faster".
If both the words "cash" and "money" appear on the subject line no more than one score point will be awarded.

Further, simple examples

#Catch potential spam by examining the email route
:0:
* 1^0 ? formail -x"Received:" | egrep -i "157\.161\.140\.2"
* 1^0 ? formail -x"Received:" | egrep -i "199\.217\.231\.46"
* 1^0 ? formail -x"Received:" | egrep -i "212\.106\.213\.36"
* 1^0 ? formail -x"Received:" | egrep -i "216\.154\.1\.82"
ProbableSpam.mail

As usual, the "?" executes and evaluates what is on the rest of the condition line instead of searching for a literal match. Note the syntax order.
Incidentally, there is a subtle catch in using the IP numbers. Assume that you wish to detect the nodes from 216.154.1.74 through to 216.154.1.86. This rule won't work quite right: "216\.154\.1\.[74-86]". Why? The "[74-86]" will match 4-8. (The 7 and 6 would be superfluous since they already are within the 4-8 range.) The rule would find matches outside the intended range. E.g. "216\.154\.1\.72" would be matched. Instead, applying both "216\.154\.1\.7[4-9]" and "216\.154\.1\.8[0-6]" would match correctly.

This 'precision' recipe checks in the message header both the "From:" field and the "Received:" path of a forgery spam.

#Avoid a specific forgery spam
:0:
* -1^0
*  1^0 ^From:.*mikerobbins2000@hotmail\.com
*  1^0 ? formail -x"Received:" | egrep -is "psi\.net"
Spam.mail

Scoring and ordinary conditions can be mixed in the rules. For example the two recipes below achieve roughly the same thing, but the latter option produces less steps if the email is for you.

:0:
* -1^0
*  1^0 ? formail -c -x"Received:" | fgrep -is 'alladvantage.com'
*  1^0 ? formail -c -x"Received:" | fgrep -is 'ameritech.net'
*  1^0 ? formail -c -x"Received:" | fgrep -is 'bellatlantic.net'
*  1^0 ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
ProbableSpam.mail

:0:
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* 1^0 ? formail -c -x"Received:" | fgrep -is 'alladvantage.com'
* 1^0 ? formail -c -x"Received:" | fgrep -is 'ameritech.net'
* 1^0 ? formail -c -x"Received:" | fgrep -is 'bellatlantic.net'
ProbableSpam.mail

The formail switches in the above are

-c Concatenate continued fields in the header.
-x Get the contents of the said header field. Do not include the field name.

The fgrep (search a file for a fixed-character string) switches in the above are

-i Ignore upper/lower case distinction during comparisons.
-s Silent (only produce error messages) in order to check the return status without any output.

The above example could also be written more efficiently without scoring as

:0:
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* ^Received:.*(\
alladvantage\.com|\
ameritech\.net|\
bellatlantic\.net)
ProbableSpam.mail

How can I test if the subject is empty or if the subject field is missing altogether?

Scoring seems to be the answer:

:0:
* 1^0 ^Subject:([  ]$|$)
* 1^0 !^Subject:
NoSubject.mail

As usual, the brackets [] contain a space and a tab.

There are other options to test for an empty "Subject:" or an entirely missing "Subject:" field. The one below puts the subject contents in a variable. The actual recipe then tests if the value of the "SUBJ_" variable is empty. (Also see the feedback about the syntax.)

#Get the subject discarding any leading and trailing blanks
SUBJ_=`formail -xSubject: \
       | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

#Test for an empty or missing subject
:0:
* SUBJ_ ?? ^^^^
NoSubject.mail

^^^^ denotes empty contents. The trick is adopted from procmail material of some other authors where the ^^ anchor is better explained than what I can do. Also see procmarc.man for it.
Likewise, see procmarc.man for the ?? definition.

How can I modify the "To:" field of the email I received?

I am not exactly sure why you wish to do this, but here is how to replace the "To:" header field of a message using formail. Choose the formail "-i" option to rename the old "To:" field to be "old-To:" and to insert the new "To:" header field. The flags in the recipe are as follows: "f" use the pipe as a filter, "h" it is about the header of the email message, "w" execute before proceeding down the rest of the "~/.procmailrc".

:0 fhw
* To.*myoldid@myoldhost.myolddom
| formail -i "To: mynewid@mynewhost.mynewdom"

I have a long list of spammers and other Internet lowlife in a separate file. How can I utilize it?

The technique is fairly simple. Put this in your "~/.procmailrc" file:

MAILDIR=/home/myid/Mail   #The location of your own mail directory
# Whatever other preliminaries

# Whatever other recipes

# Test if the email's sender is in the blacklisted
:0
* ? formail -x"From" -x"From:" -x"Sender:" \
    -x"Reply-To:" -x"Return-Path:" -x"To:" \
    | egrep -is -f black.lst
/dev/null

All the common email sender headers are covered.
Also the "To:" field is covered in the recipe, since spammers often name their mailing lists as phony addresses.
Continuation lines ("\") are utilized. Incidentally, ensure that there are no trailing whitespaces after the "\" on a line.
The "-i" option in egrep tells to ignore upper/lower case distinction. The "-s" is for silence. The "-f file" option tells to take the list of the regular expressions from file.

Prepare a "/home/myid/Mail/black.lst" file with contents something like:

abc23@airnewz.ccn
abdu@advis.com.tr
adexec@mail.com
dinner@dine.com
friend@public.com
helpingyou@mail.com
mk1977@ms1.kingnet.com.tw
nb8MAMxhq@mail.com
no@body.com
owieuj@peterlink.ru
patkline00@usa.net
promotions@web-vertise.com
unknown@unknown.com

The black.lst file should reside in your "${MAILDIR}" mail directory (unless you explicitly include the path in your "~/.procmailrc").
The problem with such lists is that most of the spam related addresses are very transient by nature. I do not think such lists alone are a very effective method, as I have explained in my Foiling Spam with an Email Password System measures medley.
For an exact matching you might wish to use e.g. "no@body\.com" instead of "no@body.com". Alternatively, one could use fgrep (fixed grep) or grep -F

How do I forward certain messages that I get, and preserve a myself copy?

Below is an example:

#Get the sender's bare email address from the first "From" line
FROM_=`formail -c -x"From " \
         | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g' \
         | awk '{ print $1 }'`

#Get the original subject of the email
#Discard superfluous tabs and spaces
#On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: \
         | expand \
         | sed -e 's/  */ /g' \
         | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

#Whatever other recipes you'll use

:0
* ^From:.*infolist@([-a-z0-9_]+\.)*infohost\.infodom
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
  :0c:   #Preserve a copy of the email
  Infolist.mail
  :0fwh  #Adjust some headers before forwarding
  | formail -A"X-Loop: myid@myhost.mydom" \
            -A"X-From-Origin: ${FROM_}" \
            -i"Subject: $SUBJ_ (fwd)"
  # Forward the email
  :0
  !mydept@myhost.mydom
}

How do I forward certain messages to two different addresses?

I have the following recipe in my ~/.procmailrc file, but the email does not get forwarded to the myid2@myhost.mydom address.

  :0 c
  *^From.*info.gov
    ! friend@somehost.domain myid2@myhost.mydom

I am not sure what is wrong with that, but at least the solution below should work:

:0
* ^From.*info.gov
* ! ^X-Loop: myid@myhost\.mydom
{
  :0fwh
  | formail -A"X-Loop: myid@myhost.mydom"
  :0c
  ! friend@somehost.domain
  :0
  ! myid2@myhost.mydom
}

The X-Loop is not relevant from the point of the stated problem, but using it as a safeguard is always advisable.

Feedback: The reason that the first one does not work is that the recipients' addresses are separated by space while they should be separated by a comma [as in]

:0
! friend@somehost.domain,myid2@myhost.mydom

(I have not tested this one.)

How do I automatically return certain email messages?

Ah! Another potential case of spam avoidance? (This is a companion page to Foiling Spam with an Email Password System, remember.) Below is an example. But be sensible in using the method, since most spam has forged senders.

#Define getting the sender's address
#Discard any leading and trailing whitespaces
FROM_=`formail -rt -xTo: \
  | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

#Whatever other recipes in between.

#Return certain email
:0
#
# Is the email from a frequent spam domain?
# (Note: fgrep takes no regular expressions)
* ? formail -c -x"Received:" | fgrep -is 'cyperspam.com'
#
# Is it for a mailing list rather than to me?
* ! ^TO_(myid|myFirst\.mySecond)@([-a-z0-9_]+\.)*myhost\.mydom
#
# Avoid forgeries that pretend to be from my own site
* ! $ ? echo ${FROM_} | fgrep -is 'myhost.mydom'
* $ ? echo ${FROM_} | fgrep -is '.'
* $ ? echo ${FROM_} | fgrep -is '@'
#
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
  # Make a temporary file of the message to be returned
  :0c:formail.lock
  # Discard whitespaces, insert a leading blank
  | expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
  # Prepare and send the rejection
  # Be sure to customize your sendmail path
  :0:formail.lock
  | (formail -r -I"Subject: Rejected mail: Recipient refusal" \
    -A"X-Loop: myid@myhost.mydom" ; \
    echo "--- begin rejected mail ---" ; \
    cat return.tmp ; \
    echo "--- end rejected mail ---" ; \
    rm -f return.tmp) \
    | /usr/lib/sendmail -t
}

The spamfoiling page has a further example.
The "-r" option tells formail to generate an auto-reply header.

There can be many variants of detecting and returning email which one does not wish to get. Below is a fictitious example utilizing variables to enhance the flexibility of the return address handling. (If you are baffled by the "RULE" variable, which is just a sideline here, see the item on identifying executed recipes.)

:0
* ^From:.*(charpie|charpie5266)@mydeja\.com
{ REJECT="charpie5266@mydeja.com" }
:0
* ^From:.*umidextr@([-a-z0-9_]+\.)*mindfall\.com
{ REJECT="umidextr@mindfall.com" }
:0
* ^From:.*(rasch|Greg.*\.Rasch)@([-a-z0-9_]+\.)*millkirn\.com
{ REJECT="rasch@millkirn.com" }
:0
* ^From:.*(daren|Daren[_\.]Risenthal)@([-a-z0-9_]+\.)*slunet\.org
{ REJECT="daren@slunet.org" }

:0
* ! REJECT ?? ^^^^
{
  :0
  { RULE="These users I do not want to talk with" }
  :0cw
  | expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
  :0:procmail.lock
  | (formail -r -I"To: ${REJECT}" \
    -I"Subject: Rejected mail: Recipient refusal" \
    -A"X-Loop: myid@myhost.mydom" ; \
    echo "--- begin rejected mail ---" ; \
    cat return.tmp ; \
    echo "--- end rejected mail ---" ; \
    rm -f return.tmp) \
    | /usr/lib/sendmail -t
}

Note how the above set of rules has two parts, the actual detection plus the return address definition, and the return action. The latter could be written in many alternative ways, including

:0
* ! REJECT ?? ^^^^
{
  :0cw
  | expand | sed -e 's/[ ]*$//g' | sed -e 's/^/ /' > return.tmp
  :0 fwh
  | formail -r \
    -A"Subject: Rejected mail: Recipient refusal" \
    -A"From: myid@myhost.mydom" \
    -A"X-Loop: myid@myhost.mydom" ; \
    echo "--- begin rejected mail ---" ; \
    cat return.tmp ; \
    echo "--- end rejected mail ---" ; \
    rm -f return.tmp
  :0
  ! ${REJECT}
}

My address has changed. How do I forward a copy to myself and tell the sender?

This is a theme whose constituents already are covered throughout this material. But also take a look at "man procmailex" for the "vacation database" idea even if a better name here would be something like "dejatold database".

#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: \
       | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

:0
# Was it to me
* ^TO_myoldid@myoldhost\.myolddom
# Ignore messages for daemons
* ! ^FROM_DAEMON
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
  :0 c
  ! myid@myhost.mydom
  :0:dejatold.lock
  | formail -rD 8192 dejatold.cache
  :0 eh
  | (formail -r \
     -A"X-Loop: myid@myhost.mydom" \
     -I"Subject: Changed email address" ; \
     echo "Dear Sender," ; \
     echo "" ; \
     echo "Thank you for your email about" ; \
     echo "\"${SUBJ_}\"" ; \
     echo "" ; \
     echo "My email address has changed." ; \
     echo "Old: myoldid@myoldhost.myolddom" ; \
     echo "New: myid@myhost.mydom" ; \
     echo "Your email has been forwarded to my new address." ) \
     | /usr/lib/sendmail -oi -t
}

Some explanations:

The "-r" switch prepares s reply header for sending email back to the sender.
The "-D maxlen idcache" switch in "-rD" controls the message identification cache. For more see "man formail"
The "c" flag in ":0 c" tells that the processing should continue also after this particular recipe has been acted upon.
The "e" flag in ":0 eh" decrees that recipe only executes if the immediately preceding recipe failed
The "h" flag in ":0 eh" tell to feed the header to the pipe. Put since it is the default, it is not compulsory.

Naturally, the recipe does not stand alone in the ~/.procmairc but is a part of it. Thus you would e.g. have previous recipes that takes care of email that is not to you, and email that was for mailer daemons.

How can I set variable values based on the text in the body of the email message?

Let's start with another, much simpler question:

From: ts@UWasa.Fi (Timo Salmi)
Newsgroups: comp.mail.misc
Subject: Re: Procmail: How do I filter by the body
Date: Sun Apr 23 09:34:38 EET DST 2000
X-Comment: Slightly modified

I am trying to save all the messages that come to me with "mypassword" in the body to a folder called password. How do I do that?

As the manuals state:

Flags can be any of the following:

B Egrep the body.

Hence, all there is to it is

:0 B:
* mypassword
password

If you want your password case sensitive then use ":0 BD:".

All the best, Timo

From: ts@UWasa.Fi (Timo Salmi)
Newsgroups: comp.mail.misc
Subject: Re: Question of procmail newbie
Date: Tue Nov 23 23:09:41 EET 1999
X-Comment: Slightly modified

How could I solve the following problem with procmail: I receive e-mails with a body like this:

Category: aaa

Subcategory: bbb

File: ccc

I need to store this mail to the folder aaa/bbb/ccc, so procmail should create directories aaa/bbb . What kind of .procmailrc should I write?

The trick is to extract the appropriate text from the body of the email message and to set procmail variable values on the basis of the results. This is how it can be done.

#Preliminaries
SHELL=/usr/bin/sh #Use the Bourne shell (check your path!)

CATE=`cat | egrep "^Category:" | awk '{ print $2 }'`
SCAT=`cat | egrep "^Subcategory:" | awk '{ print $2 }'`
FILE=`cat | egrep "^File:" | awk '{ print $2 }'`

#Whatever other recipes

:0B:Procmail.lock
* ^Category:[ ].+[a-z0-9]
* ^Subcategory:[ ].+[a-z0-9]
* ^File:[ ].+[a-z0-9]
| mkdir ${CATE} ; mkdir ${CATE}/${SCAT} ;\
cat >> ${CATE}/${SCAT}/${FILE}

#Whatever other recipes

As a validity check the condition lines require that all the key-lines are present in the email message body and that the lines contain names.

All the best, Timo

Feedback: It would be much more efficient rewriting these definitions using awk's pattern matching, such as:

CATE=`cat | awk '/^Category:/ { print $2 }'`
etc

Apropos awk. On the Usenet there are dedicated was newsgroups comp.lang.awk and alt.lang.awk. Furthermore, although used in quite another connection than procmail, there are several awk (actually GnuAWK) usage examples in my MS-DOS batch programming tricks collection.

Next, let's consider a more tricky task. Find from the body of the text the last line that potentially contains the string "mailto:". Insert the contents of that line into a MAILTO_ variable.

:0
* ^Subject:.*Whatever
{
  :0
  {
  MAILTO_=`sed -e '1,/^$/ d' \
           | egrep "mailto:" \
           | tail -1 \
           | expand \
           | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g' \
           | sed -e 's/[^o]://g' -e 's/^://g' \
           | awk -F: '{ print $2 }' | awk '{ print $1 }'`
  }
  :0:
  WhichEverFolderYouWant
}

Consider the MAILTO_ construct. (The test of the recipe should be self-explanatory.)

The sed -e '1,/^$/ d' extracts the body of the email message (i.e. the headers are ignored).
The egrep "mailto:" finds all the lines containing mailto:.
If there are several mailto: lines the tail -1 gets the last of them.
The expand expands any TAB characters to SPACE characters.
The sed -e 's/^[ ]*//g' -e 's/[ ]*$//g' omits any leading and trailing blanks.
The sed -e 's/[^o]://g' -e 's/^://g' weeds out from the same line the possible preceding colons (:) which might cause confusion. It is not perfect, though.
The awk -F: '{ print $2 }' gets the rest (until the end of line or the next colon) after the colon (:), i.e. the email address from the mailto: line and what may come after it. The awk '{ print $1 }' discards the potential rest of the line starting with the first blank after the address. What should thus be left is the email address in the mailto: field.

Should you wish to get the entire line with the "mailto:" into the MAILTO_ variable instead of just the email address there, simply leave out the last two lines from the MAILTO_ definition.

How can I insert some token text in front of the body of incoming email?

I have a really simple procmail question. All I want to do is add a line
"======= Forwarded Mail =========="
to the top of the body of all incoming messages, and forward them to another account.

Let start by considering the first part of the question only. This is how it is done. The solution owes heavily to Philip Guenther.

:0
{
  :0 fhw
  | cat - ; \
  echo "===== Filtered email ====="
  :0:
  ${DEFAULT}
}

So far so good. Next let's add the forwarding so that the token will only appear in the forwarded message. (If you wish to change that, adjust the order of the rules.)

:0
{
  :0c:
  ${DEFAULT}
  :0 fhw
  | cat - ; \
  echo "======= Forwarded Mail =========="
  :0
  !forward@myhost.mydom
}

Finally, let's add avoiding email loops.

# Discard loops
:0
* ^X-Loop: myid@myhost\.mydom
/dev/null

:0
{
  :0c:
  ${DEFAULT}
  :0 fhw
  | cat - ; \
  echo "======= Forwarded Mail =========="
  :0 fhw
  | formail -A"X-Loop: myid@myhost.mydom"
  :0
  !forward@myhost.mydom
}

Do you have any useful tips for regular expression matching?

This is a terribly complicated subject involving many many features which I do not know. Let's nevertheless look at some further example recipes.

# Matching a few undelivery and such reports
:0:
* ^Subject:.*Undeliver(ed|able) (e)?mail|\
  ^Subject:.*Returned (spam )?(e)?mail
* ^TO_(myid|firstname\.lastname)@([-a-z0-9_]+\.)*myhost\.mydom
Returned.mail

Consider the first rule of the recipe above. It will match all email with the following on the "Subject:" line in the header:

Undelivered mail
Undeliverable mail
Undelivered email
Undeliverable email
Re: Undelivered mail
etc...

The continuation line will match

Returned mail
Returned email
Returned spam mail
Returned spam email
Re: Returned mail
etc...

What if you don't want to match "Re: Undelivered mail"? The following condition gives a more exact match

* ^Subject:[  ]+Undeliver(ed|able) (e)?mail

In other words only spaces and/or tabs are allowed between "Subject:" and the start of the actual subject.

Let's consider another example. Say that we have two hosts

cyber.com
cyber.com.au

How to catch email from the former, but not the latter:

:0:
* ^From:.*cyber.com([^\.]|$)
ProbableSpam.mail

That is, do not allow a dot after the .com or alternatively require that the line ends there. However, cyber.comet would be matched! Thus, depending on what you want to achieve, you might have e.g.

:0:
* ^From:.*cyber.com( |"|>|$)
ProbableSpam.mail

What is the difference between the rules below?

* ^From:.*myid@([-a-z0-9_]+\.)*myhost.mydom
* ^From:.*myid@([-a-z0-9_]+\.)?myhost.mydom
* ^From:.*myid@([-a-z0-9_]+\.)+myhost.mydom

The first one matches any of

myid@myhost.mydom
myid@subhost1.myhost.mydom
myid@mypc.subhost1.myhost.mydom

The first one does not match myid@.myhost.mydom (and neither should it!).
The second one matches 1 and 2, but not 3.
The third one matches 2 and 3, but not 1.

To recount the purpose of the main special regexp symbols:

Symbol Interpretation

* Match zero or more times

? Match zero or one times

+ Match one or more times

. Any character

[ ] Match from the list within the backets

^ The start of the line (within [] however, a negation)

$ The end of the line

\ Quote the next character to take it literally

( ) Grouping

Symbol	Interpretation
*	Match zero or more times
?	Match zero or one times
+	Match one or more times
.	Any character
[ ]	Match from the list within the backets
^	The start of the line (within [] however, a negation)
$	The end of the line
\	Quote the next character to take it literally
( )	Grouping

How can I test if two procmail variables have the same contents?

Basically the syntax for variable value tests is

VAR1_=Whichever expression you devise
:0:
* VAR1_ ?? regexp
wherever

But you can build rules like

VAR1_=Whichever expression you devise
VAR2_=whatever
:0:
* $ VAR1_ ?? ${VAR2_}
wherever

Note, however, that the above still is regular expression matching, not an equality.

The blank after the first $ is significant. It tells that the variable references on the line (${VAR2_}) are to be expanded, not to be taken as a literal text.

Feedback: That's easily resolved using $\var expansion and anchoring both ends of the regexp:

        * VAR1_ ?? $ ^^$\VAR2_^^

That condition will succeed if and only if VAR1_ and VAR2_ have the same contents, with the possible exception of VAR1_ having one more trailing newline than VAR2_.

I am having difficulties with "<". How does one match it?

Date: 09 Dec 1999 23:06:41 -0600
From: Philip Guenther
Newsgroups: comp.mail.misc
Subject: Re: procmail, trivial html detection, and a quirk

ts@UWasa.Fi (Timo Salmi) wrote:
> I just noted that, at least in procmail v3.13.1 1999/04/05
>
> :0B:
> * </body>
> * </html>
>
> does not work. Instead one has to apply
>
> :0B:
> * [<]/body>
> * [<]/html>

Yep. A leading '<' or '>' on a condition causes procmail to interpret the condition as a size test. If you want a normal regexp condition that starts by matching a literal '<' or '>' character you have to protect the leading character from such interpretation. There are several ways of doing so. The most efficient are to use parens or a backslash:

* ()</body>

* (<)/body>

* (</body>)

* \</body>

That last one is generally avoided because it looks like you're using the \< regexp special when you really aren't. Putting the '<' or '>' in brackets also works, as you did above, but it slows down the matching ever so slightly as a character class is slower to match than a single normal character. Thus, one of the above four methods is usually preferred.

Philip Guenther

(Timo's addendum: As far as I understand \< is a word-boundary in procmail. Hence \< is best avoided, when not used as an actual boundary.)

How can I insert identification text to the beginning of the subject line?

I know how to sort my incoming email with procmail into different folders, but how do I use formail to automatically add some suitable identification text to the subject line of the email that I receive?

The general idea is this

#Get the subject discarding any leading and trailing blanks
#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -xSubject: \
       | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

* YourFirstSelectionCriterion
{
  :0 fwh
  | formail -I"Subject: WhateverYouAdd_1 ${SUBJ_}"
  :0:
  YourFirstFolder
}

* YourSecondSelectionCriterion
{
  :0 fwh
  | formail -I"Subject: WhateverYouAdd_2 ${SUBJ_}"
  :0:
  YourSecondFolder
}

The flags are as follows: "f" use the pipe as a filter, "w" execute before proceeding, "h" it is about the header of the email message.

The -I option in formail removes and replaces the old header. Should you wish to retain the old subject header with an "Old-" prefix added, use -i instead.

I tried out your tips, but some of them failed on my system. What next?

Here are a few ideas:

Have you copied right? For example:
- If you cut and paste, the brackets [] containing tabs will not be copied correctly, since on this page the assumed tabs aren't true tabs.
- Make sure that you have not misinterpreted the meaning of the quotation (") marks anywhere in the advice.
- If you have a backslash \ at the end of the line to continue the line, it is very important to ensure that you do not have white spaces after the \ backslash.
Have you customized all your file-paths right? Some of the recipes may require a slightly different setup in your environment than assumed in this FAQ.
Check that procmail is getting your proper path. Try "echo ${PATH}" and then include "PATH=WhatYouGot" high up in your ~/.procmailrc recipe file.
Include "VERBOSE=yes" high up in your ~/.procmailrc recipe file. Then see what is in the logfile procmail produced for debugging. The testbench is a useful aid in the debugging.
The shell you use may affect some actions. Check where your Bourne shell sh is with "which sh". If it is e.g. /bin/sh then include "SHELL=/bin/sh" at the beginning ~/.procmailrc recipe file and see if anything changes. Bourne shell is the shell I have used in preparing this tips page.
Work systematically. Try to pinpoint which particular line is causing the offense and how. If the problem is with the condition part make general enough a version to get it match. Then narrow it down towards what you wanted until the recipe fails. If the problem is with an action, try to separate whether the problem is with the actual action or your procmail syntax. For example if you pipe the email to a program, try to separate if it is the call syntax that is in error (e.g. do you manage to convey the parameters right) or if it the actual program you called that fails.
If you have a procmail problem which you can't solve after trying properly, post your problem to the comp.mail.misc Usenet newsgroup and/or your corresponding local newsgroup. If you have genuine feedback about my procmail tips, your email is most welcome, but please refrain from using email for private consultation requests.

Echo and grep blues. I am having difficulties with echo and grep usages in procmail.

The combination of quoting and regular expressions can cause some subtle problems when the Unix echo and one of the greps (grep, fgrep, egrep) is used in the procmail recipes.

Consider

#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject:`

# Responses to filter reports
:0:
* -1^0
*  1^0 $ ? echo \"${SUBJ_}\" | fgrep -is 'Re: Filter report'
*  1^0 ^TO_myid@([-a-z0-9_]+\.)*myhost\.mydom
Response.mail

In the example the email's subject header is put into a "SUBJ_" variable utilizing formail "-x" option.
The "-c" option is used to concatenate the potential continuation lines, since occasionally the headers are divided onto several lines. This is more common on the "Received:" line, but can also occur on the "Subject:" line.
If the quoted quotes (\") are not used in the echo, the special characters on the email's Subject line in the header will be processed as shell related operators. This must not be allowed, since it will result in errors that may be hard to trace. For example operators such as "(", ")", "`", "'", "<", ">" and "|" all have a special meaning to the shell.
It is safer to use fgrep (the fixed-character expression search) because fgrep interprets also the regular expression special characters literally. For example, for fgrep you could use fgrep 'myhost.mydom' instead of egrep "myhost\.mydom". BTW, as you gather from the example above, procmail uses egrep-like syntax.

Consider a more complicated expression to extract the subject:

#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: \
         | expand | sed -e 's/[;|\$\`\\]/ /g' \
         | sed -e 's/  */ /g' \
         | sed -e 's/(/\\\(/g' -e 's/)/\\\)/g' \
         | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

The potential tabs are expanded.
Some of the problem special characters are substituted with spaces.
Multiple spaces are substituted with a single space.
Parentheses are covered with backslashes "\". Here things can get really complicated, since the number of backslashes must be compatible with the number of interpretation rounds through procmail and the shell.
The last sed gets rid of any leading and trailing whitespaces.

There is much more to the echo and grep interactions with the shell and the regular expressions. That is why sufficient trials using the testbench are advisable before including the more complicated recipes into one's "~/.procmailrc" file.

How do I know which of my many procmail recipes has been enacted?

To get a log of what happens you set at the beginning of your ~/.procmailrc recipes file

SHELL=/usr/bin/sh                 # Use Bourne shell
MAILDIR=${HOME}/Mail              # Customize as appropriate
LOGFILE=${MAILDIR}/procmail.log   # Your procmail log
VERBOSE=yes                       # Produce full information
LOGABSTRACT=all                   #       - " -

However, this produces so much information that it is not convenient for a routine checking by a visual examination. But you can include a suitable (dummy) variable definition in each one of your recipes and then search the log file for occurrences of that variable. Here is an example demonstrating how it goes. Consider a recipe that originally is

# Discard probable spam mail, set 1
:0:
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* 1^0 ^From:.*alladvantage.com
* 1^0 ^From:.*ameritech.net
* 1^0 ^From:.*bellatlantic.net
ProbableSpam.mail

Change this to be

:0
* ! ^TO_(myid)@([-a-z0-9_]+\.)*myhost\.mydom
* 1^0 ^From:.*alladvantage.com
* 1^0 ^From:.*ameritech.net
* 1^0 ^From:.*bellatlantic.net
{
  :0
  { RULE="Discard probable spam mail, set 1" }
  :0:
  ProbableSpam.mail
}

Apply the same principle for all your recipes in your ~/.procmailrc file. Then, as email has arrived, you can check which rules have been used by searching the log file with the command grep "RULE=" ${HOME}/Mail/procmail.log. If you need this regularly, make the grep search one of your Unix scripts:

#!/usr/bin/sh
grep "Assigning \"RULE=" ${HOME}/Mail/procmail.log

In the altered procmail recipe, further up, carefully note some of the syntax

The location of the lockfile invocation ":".
Above the RULE="..." line there is no cloning "c" flag in ":0" since setting a variable is a non-delivering action. The next line will be reached anyway. In fact, it would be a mistake to use a "c" there. It would lead to complications.
In setting the RULE variable ensure that there is space after the "{" and prior the "}". Otherwise the email will go to a folder with rather a long and complicated name.

Procmail recipes nesting can get fairly complicated. Consider the following example involving setting the RULE variable and prcomail else if conditions ":0E".

:0
* ^TO_my-mailing-list
{
  :0
  * ^From:.*@([-a-z0-9_]+\.)*myhost\.mydom
    {
      :0
      { RULE="To my-mailing-list, probably legitimate" }
      :0:
      ${DEFAULT}
    }
  :0E
    {
      :0
      { RULE="To my-mailing-list, probably spam" }
      :0:
      Spam.mail
    }
}

Feedback: There is a method for logging which action took place without using the VERBOSE yes which creates large log files. This method uses the LOG variable:

LOGFILE=$HOME/.MailFilter_log
SHELL=/bin/sh

:0 B
* .*spam
{
  LOG="TRAPPED SPAM - "
  :0
    /dev/null
}

#- Accept All other mail -#
:0
{
  LOG="ACCEPTED MAIL - "
  :0
  $ORGMAIL
}

the out put looks something like this:

  TRAPPED SPAM - From spammer@spam.com Thu May 16 03:52:42 2002
   Subject: Make Money Fast
    Folder:
/dev/null 43140
ACCEPTED MAIL - From goodguy@save.com Thu May 16 03:54:08 2002
Subject: Legitimate email message
  Folder:
var/spool/mail/username 4683

My comment: If you look at the example for testing for individual procmail recipes you'll see that for logging one sets (usually for troubleshooting)

#Troubleshooting:
VERBOSE=yes
LOGABSTRACT=all

For the method in the feedback above, leave those variables out or set

VERBOSE=no

However, do not set

LOGABSTRACT=no

because the you'll miss all but the actual log variable identification. Instead, just leave the line out.

How can I detect Korean, Cyrillic, or Chinese to avoid such frequent spam?

There is a very good page by Walter Dnes explaining the method. So for once I'll direct you elsewhere. The method relies on ad-hoc approximation. In brief, scoring is used to detect if more than 5 per cent of the characters in the body of the message are high-bit characters typical of the said language codes. If you have gone through the items in my procmail FAQ, it should be easy to understand the inventive method given on Walter's page. Also see the exercise at the end of the current FAQ involving detecting Korean.

How can I change the subject line and include part of the message body into it?

I have a cellular phone. I want to save the incoming email normally and also to send a modified copy to my second account (a Short Message Service). The forwarded copy should include the original subject AND five lines of the original message text. The original body should not be included. Is this possible with procmail?

Well yes, it is. It takes some figuring out needing many of the principles presented in the other items in my proctips collection. It also needs a few tricks with Bourne shell programming. Perhaps most importantly, this item demonstrates how to put the body of the message into a variable.

# Customize these paths if they do not match yours
SHELL=/usr/bin/sh
SENDMAIL=/usr/lib/sendmail

:0
* ^Subject:.*Timo testing
{
  # Put the email intact in the default folder
  :0c:
  ${DEFAULT}
  # The "c" flag above tells the recipe to continue
  # Now we prepare a different version of the message
  :0
  {
    # Get the subject into a variable
    # Expand the possible tabs into blanks
    # Discard any leading and trailing blanks
    # On some systems -xSubject: has to be -x"Subject: "
    SUBJ_=`formail -xSubject: \
      | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`
    # Get the body of the message into a variable
    # Accept only the first five lines
    # Discard newlines, i.e. put everything on one line
    BODY_=`sed -e '1,/^$/ d' | head -5 | tr -d '\n'`
  }
  # Prepare and send a message with no body
  # -X "" extracts just the header (discards the body)
  # Plug in the new subject
  # Content fields might cause problems if not discarded
  # Change to To: address
  :0:proc.lock
  | formail -X "" \
      -I"Subject: ${SUBJ_} ${BODY_}" \
      -i"Content-Type:" \
      -i"Content-Length:" \
      -I"To: your@second.address" \
  | ${SENDMAIL} -t
}

The line

BODY_=`sed -e '1,/^$/ d' | head -5 | tr -d '\n'`

retrieves the first five lines from the body of the text. It would be more useful to retrieve a specified number of characters from it. Say we wish to retrieve 160 characters. This is how to do that.

BODY_=`sed -e '1,/^$/ d' | tr -d '\n' | dd bs=1 count=160`

Solving the alternative of having a maximum of 160 characters in the concatenated SUBJ_ and BODY_ is left as an exercise to the reader.

There also is another, more important improvement that can be made in the action above. Replace tr -d '\n' with tr '\n' ' ' so that when the lines are concatenated a space is put in between them.

How can I remove the signature from the incoming email?

The recipe below assumes that the signature properly adheres to the Internet "-- " convention to denote where the signature starts.

:0
* ^Subject: Whatever
{
  :0 fbw
  | sed -e '/^-- /,$ d'
  :0:
  ${DEFAULT}
}

Let's look at what we've got:

The b flag means feed the body to the pipe.
The f flag means use the pipe as a filter.
The w flag means wait for the filter or program to finish.
This is not a sed FAQ, but in brief:
- In the sed script the /^-- / matches the first occurrence of the signature designator string "-- ".
- In sed, a lone $ stands for the last line.
- The d denotes deleting the "pattern space" found.

In the above the sed script will delete everything in the message body starting from the "-- " until the end of the incoming message. Substituting

sed -e '/^-- /,$ d'

with

sed -e '/^-- /,/^$/ d'

will instead delete everything starting from the "-- " until the first encountered empty line. Thus if there is e.g. an attachment after the signature, the attachment will not be thrown away.

What unix manuals relating to procmail should I get?

Unix manuals are not very helpful as starting points, but after you have got the rudiments under your belt, you may wish to browse the following manuals for additional information. Below is a simple "manuals" Bourne shell script. It prepares plain text format files of some of the essential Unix man manuals for a procmail user, especially suited for offline reading.

Note that the "^H" is not a "^" and an "H", but a CTRL-H, i.e. ASCII 8 (the backspace character). To make the "manuals" file executable type "chmod u+x manuals".

#!/bin/sh
TODIR=${HOME}/myman
echo ${TODIR}
man egrep      | sed -e 's/_^H//g' > ${TODIR}/egrep.man
man formail    | sed -e 's/_^H//g' > ${TODIR}/formail.man
man procmail   | sed -e 's/_^H//g' > ${TODIR}/procmail.man
man procmailex | sed -e 's/_^H//g' > ${TODIR}/procmaex.man
man procmailrc | sed -e 's/_^H//g' > ${TODIR}/procmarc.man
man regexp     | sed -e 's/_^H//g' > ${TODIR}/regexp.man
man sendmail   | sed -e 's/_^H//g' > ${TODIR}/sendmail.man
ls -lF ${TODIR}

Many of the recipes in this FAQ utilize sed and/or awk. Some useful links (note, however, as is common with links, I can't guarantee that they still are current):

The sed FAQ Eric Pement
SED - the stream editor Sven Guckes
Manipulating Strings with sed
SED Patrick Hartigan
comp.lang.awk FAQ Russell Schulz
Awk -- A Pattern Scanning and Processing Language Aho, Kernighan, Weinberger
Awk programming links The University of Edinburgh
How to Use AWK Patrick Hartigan

Is it possible to use procmail to call the vacation program?

Yes, it is, but it is not quite as straight-forward as one would expect.

Since this is a procmail, not the vacation program advice collection I'll assume that you are reasonably familiar with the vacation program. If not, start with "man vacation". You have to use procmail to customize the ~/.vacation.msg file because when invoked via procmail, the vacation $SUBJECT variable is not necessarily set.

Usually, when vacation is used, it is first called interactively to crate the ~/.vacation.msg file and to replace the ~/.forward file. If you are going to use the procmail solution it is very important not to do this. In particular, the ~/.forward file must not be touched in any way. The reason is that in this solution it is used to to invoke procmail, not vacation. (The vacation program is, of course, called by procmail now.)

# Set a number of variables high up in your ~/.procmailrc
#
VACATION=/usr/bin/vacation
ONVACAT=yes
VACFREQ=5d
VACMSG=${HOME}/.vacation.msg
MYNAME_="MyFirstName MyLastName"
MYEMAIL_=myid@myhost.mydom

# Get the subject discarding any leading and trailing blanks
# Note: On some systems -xSubject: has to be -x"Subject: "
#
SUBJ_=`formail -xSubject: \
    | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

# Prepare the vacation message's base
# This is done only once in ~/.procmailrc
#
:0 cwi
* ONVACAT ?? ^^yes^^
| echo "From: ${MYEMAIL_}" > ${VACMSG} ;\
  echo "Subject: ${MYNAME_}, away from my mail" >> ${VACMSG} ;\
  echo "X-Loop: myid@myhost.mydom" >> ${VACMSG} ;\
  echo "" >> ${VACMSG} ;\
  echo "Thank you for your email about:" >> ${VACMSG} ;\
  echo "\"$SUBJ_\"" >> ${VACMSG} ;\
  echo "" >> ${VACMSG} ;\
  echo "Your email will be seen to when I return." >> ${VACMSG} ;\
  echo "" >> ${VACMSG} ;\
  cat ${HOME}/.signature >> ${VACMSG}

# Here we go ivoking vacation and also saving the email
# You might have serveral, different of these recipes
#
:0
* ^Subject:.*Whatever
{
  :0
  { RULE="Testing" }
  :0 cwi
  * ONVACAT ?? ^^yes^^
  * ! ^X-Loop:.*myid@myhost\.mydom
  | ${VACATION} -t${VACFREQ} myid
  :0:
  WhateverFolder
}

Feedback: Maybe I [Collin Park] can add one more comment: I think you need a global LOCKFILE to cover the area from when you generate the vacation message to the place where you invoke $VACATION.

Otherwise, message #N may generate .vacation.msg, then message #N+1 overwrites it before #N invokes $VACATION.

Could you please solve for me this procmail problem of mine?

It is nice that you have found my proctips so useful that you ask for my personal advice. Nevertheless, if you ask me by email for individualized procmail consultation my response has to be similar to that as in asking me for any programming advice. Briefly, the response is that I do not do email consultation. If you have a procmail related problem please post your question to the Usenet news to a newsgroup like comp.mail.misc. The added advantage of posting is that in a newsgroup both the question and the potential answers will have a wider forum. That way everyone will benefit.

On rare occasions I have also been asked to email my own personal ~/.procmailrc or my own spamfoiling scripts. The answer is a definite no. There are two main reasons. First, that material is private. Second, I have neither the willingness nor the time to send out material to users on individual requests. If and when I want to share my material I make it available for the users to themselves retrieve it via WWW or FTP.

I liked this material. Do you have anything else on programming, etc?

Yes, notably this:

Programming

Turbo Pascal programming material

MS-DOS batch programming material

Unix Bourne shell scripts programming material

Etc

More links to Timo's FAQ materials

Programming
	Turbo Pascal programming material
	MS-DOS batch programming material
	Unix Bourne shell scripts programming material
Etc
	More links to Timo's FAQ materials

Some exercises

Let's see if we can put to work the methods presented in this FAQ to solve some tasks, part of them having come up on the Usenet news.

Ex.1) Keep a copy of incoming email, and at the same time, get only the first five lines from the message body and forward it to another account.

# Discard potential email loops
:0
* ^X-Loop: myid@myhost\.mydom
/dev/null

:0
* Any rule(s) you might wish to have
{
  # Keep a copy, but don't stop yet ( the c )
  :0c:
  ${DEFAULT}

  # Comment with "Old-" the Content-Length field from the header
  # Ensure that a whitespace exists between field name and content
  :0 fwh
  * ^Content-Length:
  | formail -z -i"Content-Length:"

  # Add the loop avoidance
  # ( f for piping; w for waiting for completion; h for headers )
  :0 fwh
  | formail -A"X-Loop: myid@myhost.mydom"

  # Truncate the body ( the b ) to five lines
  :0 fwb
  | head -5

  # Forward to the other account
  :0
  ! myid2@myhost.mydom
}

It is important to handle the content-length header-field when the length of the email is altered. This is done to ensure that the receiving email program will not break the forwarded message when it is read. The -i switch is used to retain the information about the original message length to the attention of the receiver.

Ex.2) Forward the first 10 lines of the message body to the user's second account while preserving all the original message headers -- I.e, at the receiving side, the user wants to see all the message travel history and only first 10 line of the message body.

This is a more complicated version of the first exercise. The transformed task is not trivial, since when you forward, the original message headers will be replaced by your forwarding headers. Therefore, you'll have to see to preserving also the original headers. Below is how I would solve the problem based on several items in this FAQ.

# A trick to extract the subject into a variable
# Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -c -xSubject: | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

# The actual recipe to solve the exercise starts here
:0
* Whatever condition(s) you wish to select the messages for forwarding
# Avoid email loops
* ! ^X-Loop: myid@myhost\.mydom
{
  :0c: #If you want to, preserve a full copy of the email, else omit
  ${DEFAULT}
  :0fwh #Preserve the information about the original content length
  * ^Content-Length:
  | formail -z -i"Content-Length:"
  :0fwb #Truncate the body of the message to ten lines
  | head -10
  :0fwh #Insert a blank line at the beginning of the body for clarity
  | cat - ; echo ""
  :0fwh #Store the original headers, quoting them to avoid problems
  | sed -e 's/^/\> /'
  :0fwh #Insert some of your own information before forwarding
  | formail -A"X-Loop: myid@myhost.mydom" \
    -A"X-Info: Forwarded body truncated to 10 lines" \
    -i"Subject: $SUBJ_ (fwd)"
  #Finally, forward the adjusted email
  :0
  !my2dnId@myhost.mydom
}

# Discard potential email loops
:0
* ^X-Loop: myid@myhost\.mydom
/dev/null

Feedback: The recipe with head probably needs an "i" on the flags line, as:

:0 fwbi

| head -10

since write errors on the pipe are likely for messages larger than a certain size. (I've seen numbers like 4096 and 10240... it apparently varies with the system.)

Ex.3) Match a potential [TS999] identification in the Subject header, such as "[TS001] Timo testing". If found, insert a "Subject id: [TS999]" as the first line in the body of the message. (The rest of the original subject line must not reappear in the id.)

:0
* ^Subject:.*\/\[TS[0-9]+\]
{
  :0 fhw
  | cat - ; \
  echo "Subject id: ${MATCH}"
  :0:
  ${DEFAULT}
}

But what if you do want to include the rest of the original subject line? In that case use

* ^Subject:.*\/\[TS[0-9]+\].*

Ex.4) Multi-part messages (which typically include attachments) have in their headers a field like the two examples below:

Content-Type: multipart/mixed; boundary=ELM965173874-25050-0_
Content-Type: multipart/mixed; boundary="------------BA45271FBDAA479CECA7E20A"

Write a recipe that inserts into a variable (call it BOUND) the boundary string. Note that the potential quotes (") are not to be part of that string. Also note that the header might be divided on multiple lines as in

Content-Type: multipart/mixed;
boundary=ELM965173874-25050-0_

There are alternative solutions, which not necessarily are quite equivalent. The first one is putting high up in your ~/.procmailrc recipe file the line(s)

BOUND1=`formail -z -x"Content-Type:" \
| awk -F= '{ print $2 }' \
| sed -e 's/\"//g' | tr -d '\n'`

A second one is:

:0h
* ^Content-Type:
{ BOUND2=`egrep -i 'boundary=' \
| awk -F= '{ print $2 }' | sed -e 's/\"//g'` }

This was not in the exercise, but you can then have recipes like

:0:
* ! BOUND2 ?? ^^^^
WhateverFolder

Ex.5) Identify if the arriving email is in Korean. If so, return the message to the sender and his/her postmaster. Ignore a potential Reply-To: field in the header. Avoid email loops. Avoid forgeries which appear to come from your own host. Avoid forgeries which lack a host name. Be careful not to take Finnish/Swedish or French as Korean.

This is quite a difficult exercise with many details involved.

# Get the sender's address, ignore Reply-To:
FROM_=`formail -c -I"Reply-To:" -rt -xTo: \
  | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

# Get the sender's host
FHOST_=`echo "${FROM_}" | awk -F@ '{ print $2 }'`

# Your path to sendmail
SENDMAIL="/usr/lib/sendmail"

# Reject probable Korean email using character scoring
:0
* ! ^X-Loop:.*myid@myhost\.mydom
* ! $ ? echo ${FHOST_} | fgrep -is 'myhost.mydom'
* $ ? echo ${FHOST_} | fgrep -is '.'
{
  :0BD
  *  -1^1 .
  *   2^1 =[0-9A-F][0-9A-F]
  *  20^1 [¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿]
  *  20^1 [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß]
  *  20^1 [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
  *  20^1 =[89A-F][0-9A-F]
  * -20^1 [åÅäÄöÖàáâçèéêë]
  * -20^1 =(E5|C5|E4|C4|F6|D6|E0|E1|E2|E7|E8|E9|EA|EB)
  {
    :0
    { RULE="Probable Korean email" }
    #
    :0c:${HOME}/procmail.lock
    | expand | sed -e 's/[ ]*$//g' \
      | sed -e 's/^/ /' > ${HOME}/procmail.reject.korean
    #
    :0:${HOME}/procmail.lock
    | (formail -r -I"Subject: Autorejected email" \
      -I"To: ${FROM_}" \
      -I"Cc: postmaster@${FHOST_}" \
      -A"X-Loop: myid@myhost.mydom" ; \
      echo "--- begin rejected probable Korean email ---" ; \
      echo "" ; \
      cat ${HOME}/procmail.reject.korean ; \
      echo "--- end of rejected probable Korean email ---" ; \
      rm -f ${HOME}/procmail.reject.korean) \
        | ${SENDMAIL} -t
  }
}

Ex.6) If the subject of the email contains the identifier [INFO], in capitals, put the body of the incoming email into a temporary file. Ensure that the name of the temporary file is unique. Insert the full subject line at the top of the temporary file. (Why, and what then is beyond this exercise.)

#Get the subject discarding any leading and trailing blanks
#Note: On some systems -xSubject: has to be -x"Subject: "
SUBJ_=`formail -xSubject: \
| expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

# Assign a temporary file name
TMPFILE_=proctemp.$$

:0D
* ^Subject.*\[INFO\]
{
  :0 fwbi
    | echo "Subject: ${SUBJ_}" > ${TMPFILE_}; \
    echo >> ${TMPFILE_}; \
    cat >> ${TMPFILE_}
}

Ex.7) If the email comes from a certain sender, check if the time-zone information is present in the Date header. If not, add it assuming +3 hours.

#Get the date discarding any leading and trailing blanks
DATE_=`formail -xDate: \
  | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

:0
* ^From:.*TheCertainSender
* ! ^Date:.*(EET|DST|GMT)
{
  :0 fwhi
  | formail -i"Date: ${DATE_} +0300 (EET DST)"
  :0:
  ${DEFAULT}
}

Ex.8) The simple spamfoling recipe below won't work. Correct it.

:0:
* !^TO$USER@xxxxxxx.xxx
ProbableSpam.mail

:0
{
  :0
  { USER=`whoami` }
  :0:
  * $ ! ^TO_${USER}@([-a-z0-9_]+\.)*xxxxxxx\.xxx
  ProbableSpam.mail
}

The ([-a-z0-9_]+\.)* is optional.

Another solution:

:0:
* $ ! ^TO_${LOGNAME}@([-a-z0-9_]+\.)*xxxxxxx\.xxx
ProbableSpam.mail

Ex.9) Insert at the beginning of the subject the date/time of receiving the incoming message in the YYYYMMDD HHMMSS format.

:0
* Whatever rules
{
  :0
  { SUBJ_=`formail -c -xSubject: \
    | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'` }
  :0
  { DATETIME_=`date "+%Y%m%d %k%M%S"` }
  :0 fhwi
  | formail -I"Subject: ${DATETIME_} ${SUBJ_}"
  :0:
  ${DEFAULT}
}

Ex.10) This partly is based on an actual incident. Consider the follwing recipe with three small, but crucial syntax errors, and one omission. Find them.

:0
* ^From:.*(\
(abuse(-news)?|acct_closed)@
(pacificnet\.net|\
mindspring\.net|\
InfoAve\.net|\
netcom\.com\|
yahoo\.com|\
alladvantage\.com|\
hotmail\.com))
* ^TO_(myid|myFirstName\.mySecondName)@([-a-z0-9_]+\.)*myhost\.mydom
{
  :0
  {RULE="Abuse reception notes"}
  :0
  ReceivedNotes
}

The answer is a bit further down
  :
  :
  :
  :
  :
  :
  :
  :
  :
  :
  :
  :
  :
  :

:0
* ^From:.*(\
(abuse(-news)?|acct_closed)@\
(pacificnet\.net|\
mindspring\.net|\
InfoAve\.net|\
netcom\.com|\
yahoo\.com|\
alladvantage\.com|\
hotmail\.com))
* ^TO_(myid|myFirstName\.mySecondName)@([-a-z0-9_]+\.)*myhost\.mydom
{
  :0
  { RULE="Abuse reception notes" }
  :0:
  ReceivedNotes
}

Ex.11) Write a recipe to match the subject line below. The (RECENT) may or may not be there, and the numbers will change from posting to posting.
Subject: Re: [SpamCop:(RECENT)38.204.225.29,id:16135684] Make lotsof $$$

:0:
* ^Subject: Re: \[SpamCop:($RECENT$)?[0-9\.]+,id:[0-9]+\]
WhateverFolder

Ex.12) It is fairly common that spam email has the same sender and recipient in the From: and To: fields. Device a recipe that detects such postings.

This is not quite as simple as it first sounds, since it is advisable to take into the account the fact that the contents of the two fields may not be quite identical even in the case of the actual addresses being the same. Thus I would use regular expression matching both ways as below as one of the optional solutions. By default, variable comparisons are regular expression matching, not strict equalities. Also note avoiding email loops and falsely targeting email which one may have sent to oneself.

WHOFROM=`formail -xFrom: \
  | expand \
  | sed -e 's/  */ /g' \
  | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

WHOTO=`formail -xTo: \
  | expand \
  | sed -e 's/  */ /g' \
  | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

:0:
* -100^0 ^X-Loop: myid@myhost\.mydom
* -100^0 ^TO_(myid|myFirst\.mySecond)@([-a-z0-9_]+\.)*myhost\.mydom
* -100^0 ^From:.*LegitimateMailingList
* 1^0 $ WHOFROM ?? ${WHOTO}
* 1^0 $ WHOTO ?? ${WHOFROM}
ProbableSpam.mail

Ex.13) Write a (spam avoidance) recipe to detect email with more than seven recipients in the "To:" header field. Assume for simplicity that each address will have exactly one "@" character in it.

:0
* ^Subject:.*The information you requested
{
  :0
  {
    WHOTO=`formail -z -xTo:`
    COUNT=`echo ${WHOTO} | sed -e 's/[^@]//g' | wc -c`
    COUNT1=`expr ${COUNT} - 1`
    ISGT=`expr ${COUNT1} \> 7`
  }
  :0:
  * ISGT ?? ^^1^^
  ProbableSpam.mail
}

Ex.14) Make procmail forward email that arrives between 9am and 5pm to a predefined daytime email address.

:0
# Omit the condition line below if this is for all email
* ^Subject:.*Whatever
{
  :0
  {
    TIME=`date +%H%M`
    ISGT=`expr ${TIME} \> 0900`
    ISLT=`expr ${TIME} \< 1700`
  }
  :0
  * ISGT ?? ^^1^^
  * ISLT ?? ^^1^^
  ! daytime_forward_address
}

Ex.15) Write a Procmail recipe which detects if there is a Word document attached to the incoming email.

# Email with a Word document attached
:0
* ^Content-Type: multipart/
{
  :0 B
  * ^Content-.*attachment.*name=.*\.(doc|rtf)
  {
    :0
    { RULE="Email with a Word document attached" }
    :0:
    WordAttachmentEmail
  }
}

Acknowledgements for useful advice and/or feedback:

Aughey, John
Bump, Jorey
Davey, David
Dnes, Walter
Eriksson, Era
Guenther, Philip
Hebeisen, Christoph
Hirvonen, Hannu
Melish, Jacob
Menezes, Evandro
Novak, Curtis
Park, Collin
van Tol, Ruud
Van Steenkist, Vernon

Any errors and inadequacies are, however, solely my own responsibility.

A legal note: The author shall not be liable to the user, the reply target or any third party for any direct, indirect or consequential loss or damage arising from using, abusing, or a failure to be able to use, the information in this message/file howsoever caused. No warranty is given that all the information contained is correct, or that it is current.

[ts@uwasa.fi] [Photo] [Programs] [FAQs] [Research] [Lectures] [Department] [Faculty] [University]

[Revalidate]