07 Februar 2013

[Howto] Zarafa Mailextraktion

Eine Zarafa-Umgebung benötigt eine MySQL-Datenbank zur Verwaltung von Nutzern, Mailstores, Mailfoldern und sonstigen Email-Objekten. Für das Attachment-Storage gibt es dabei zwei Betriebsmodi:

  • Im Database-Modus sind sämtliche Informationen („Objekte“) in der MySQL-Datenbank enthalten, vor allem die Emails, Email-Header und -Anhänge.
  • Im Files-Modus hingegen werden die potenziell großen Objekte, nämlich Email-Anhänge und volle Emails – die ihre Anhänge als Multiparts einbetten – im Dateisystem abgelegt.

Für uns sind folgende Daten und deren Ablageort interessant:

  • Der reine Email-Header ist in der DB-Tabelle properties zu finden.
  • Daten bezüglich der Userkonten sind verteilt auf die DB-Tabellen users und stores.
  • Die vollständigen Emails sind gespeichert in
    • der lob Tabelle, falls Zarafa im Database-Modus arbeitet, beziehungsweise
    • im Dateisystem typischerweise unter /var/lib/zarafa/…, falls der Files-Modus genutzt wird.
  • Beziehungen zwischen zusammengehörigen Daten mehrerer Tabellen werden unter anderem über Instance_IDs (SQL Schlüsselattribut) hergestellt.

Im Files-Modus hat der volle Pfad der einzelnen, per gzip komprimierten vollständigen Emails die Form {attachment_path}/{INT}/{INT}/{Instance_ID}.gz, wobei der attachment_path in der Zarafa-Standardkonfiguration /var/lib/zarafa lautet. Deren Subverzeichnisse sind von der Instance_ID abgeleitet, ein konkreter Beispielpfad lautet /var/lib/zarafa/4/6/43664.gz. Das folgende Skript liest einen String von der Standardeingabe und sucht passende Email-Header in der properties-Tabelle. Zu den gefundenen Treffern werden die vollen Emails aus der lob-Tabelle extrahiert, Zarafa muss also mit attachment_storage=database betrieben werden – die Suche nach den zugehörigen gz-Dateien im Files-Modus ist hier nicht implementiert. Es sei angemerkt, dass die SQL-Statements je nach Datenbankgröße eine hohe DB-Last erzeugen können. Getestet wurde Zarafa in Version 7.0.9. Ein Beispielaufruf wäre:

[root@zarafa ~]# cat needle.txt
Subject: test Mon, 04 Feb 2013 15:05:12 +0100
[root@zarafa ~]# mkdir hackstack; ./zarafa-extract.pl zarafauser haystack < needle.txt
 Writing haystack/1359986776.0000.txt ...
[root@zarafa ~]# cat haystack/1359986776.0000.txt
Date: Mon, 04 Feb 2013 15:05:12 +0100
To: zarafauser
From: zarafa.testsystem.intern
Subject: test Mon, 04 Feb 2013 15:05:12 +0100
X-Mailer: swaks v20100211.0 jetmore.org/john/code/swaks/
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----=_MIME_BOUNDARY_000_14806"
 
------=_MIME_BOUNDARY_000_14806
Content-Type: text/plain
 
This is a test mailing
------=_MIME_BOUNDARY_000_14806
Content-Type: application/octet-stream; name="test.xml"
Content-Description: test.xml
Content-Disposition: attachment; filename="test.xml"
Content-Transfer-Encoding: BASE64
 
SGVsbG8gV29ybGQK

Zur Nutzung des Perl-Skripts werden die DBI und DBD::mysql Module benötigt, welche unter Debian beispielsweise durch libdbd-mysql-perl bereitgestellt werden.

#!/usr/bin/perl -w
 
# Copyright (c) 2013, Damian Lukowski 
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#    * Redistributions of source code must retain the above copyright
#      notice, this list of conditions and the following disclaimer.
#    * Redistributions in binary form must reproduce the above copyright
#      notice, this list of conditions and the following disclaimer in the
#      documentation and/or other materials provided with the distribution.
#    * Neither the name of the  nor the
#      names of its contributors may be used to endorse or promote products
#      derived from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL  BE LIABLE FOR ANY
# DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
use strict;
use DBI;
use DBD::mysql;
 
my $DEBUG = 0;
 
# Fill in the MySQL server properties
my ($dbhost,$dbport,$dbname,$dbuser,$dbpass) = ('', 3306, 'zarafa', '', '');
 
if (@ARGV < 2)
{
	print "Usage:	$0  \n
	$0 reads a mail header from STDIN and searches
	for zarafa-users's emails matching the given mail header.
	Each result is stored as a separate file within .
	It is assumed that zarafa is configured with
	attachment_storage = database!\n";
	exit;
}
 
my ($username, $outdir) = @ARGV;
 
chomp(my $needle = join '', );
$needle = "%$needle%";
# RFC822 demands CR LF as a line separator.
$needle =~ s/(?!\r)\n/\r\n/g;
 
my $dbh = DBI->connect("dbi:mysql:$dbname:$dbhost:$dbport", $dbuser, $dbpass)
	  or die $DBI::errstr;
 
my ($qry, $ref, $sth, $uid);
 
if ($username ne '.')
{
 
	$qry = "SELECT user_id FROM stores WHERE user_name = '$username'";
	$ref = $dbh->selectall_arrayref($qry);
 
	if (@$ref > 1)
	{
		print "Ambiguous username. Matching userids: ",
		join ', ', map {$_->[0]} @$ref;
		exit;
	};
 
	die "Username not found\n" unless @$ref;
	$uid = $$ref[0][0];
	print "uid of $username: $uid\n" if $DEBUG;
};
 
if ($DEBUG && $username ne '.')
{
	$qry = "SELECT COUNT(*) FROM properties";
	printf "%d properties\n", ${$dbh->selectall_arrayref($qry)}[0][0];
 
	$qry = "SELECT COUNT(*) FROM lob";
	printf "%d large objects\n", ${$dbh->selectall_arrayref($qry)}[0][0];
 
	$qry = "SELECT COUNT(*) FROM properties WHERE tag = 125";
	printf "%d properties with tag 125 (headers)\n",
		${$dbh->selectall_arrayref($qry)}[0][0];
 
	$qry = "SELECT COUNT(*) FROM properties WHERE val_string LIKE ?";
	$sth = $dbh->prepare($qry);
	$sth->execute($needle);
	printf "%d tag-125 properties matching needle\n",
		${$sth->fetchrow_arrayref}[0];
 
        $qry = "SELECT h.owner, s.user_name, p.hierarchyid FROM "
             . "hierarchy h RIGHT JOIN properties p on h.id = p.hierarchyid "
             . "LEFT JOIN stores s ON s.user_id=h.owner WHERE p.tag = 125 "
             . "AND p.val_string LIKE ?";
 
	$sth = $dbh->prepare($qry);
	$sth->execute($needle);
	while (my @row = $sth->fetchrow_array())
	{
		printf " uid=%s, username=%s, hierarchyid=%s\n", @row;
	}
 
	$qry = "SELECT count(*) FROM hierarchy h JOIN properties p on "
	     . "h.id = p.hierarchyid LEFT JOIN singleinstances s on "
	     . "s.hierarchyid = h.id WHERE p.tag = 125 AND h.owner = $uid";
	printf "%d tag-125 properties of %s\n",
		${$dbh->selectall_arrayref($qry)}[0][0], $username;
 
	$qry = "SELECT h.owner, p.hierarchyid, s.instanceid FROM hierarchy h "
	     . "JOIN properties p on h.id = p.hierarchyid LEFT JOIN "
	     . "singleinstances s on s.hierarchyid = h.id WHERE "
	     . "h.owner = $uid AND p.val_string LIKE ?";
	$sth = $dbh->prepare($qry);
	$sth->execute($needle);
	$ref = $sth->fetchall_arrayref();
	printf "%d tag-125 properties of %s matching needle\n",
		(scalar @$ref), $username;
 
	for my $row (@$ref)
	{
		printf " uid=%s, hierarchyid=%s, instanceid=%s\n", @$row;
	}
}
 
# A tag value of 125 corresponds to the mapitag definition as found in zarafa's
# sources under mapi4linux/include/mapitags.h:
# define PR_TRANSPORT_MESSAGE_HEADERS          PROP_TAG(PT_TSTRING,    0x007D)
# define PR_TRANSPORT_MESSAGE_HEADERS_W        PROP_TAG(PT_UNICODE,    0x007D)
# define PR_TRANSPORT_MESSAGE_HEADERS_A        PROP_TAG(PT_STRING8,    0x007D)
 
if ($username ne '.')
{
	$qry	= '(SELECT s.instanceid FROM properties p JOIN hierarchy h '
		. 'ON h.id = p.hierarchyid JOIN singleinstances s ON '
		. "s.hierarchyid = h.id WHERE h.owner = $uid AND p.tag=125 AND "
		. 'p.val_string LIKE ?)';	
} else
{
	$qry	= '(SELECT s.instanceid FROM properties p '
		. 'JOIN singleinstances s ON '
		. "s.hierarchyid = p.hierarchyid WHERE p.tag=125 AND "
		. 'p.val_string LIKE ?)';
}
 
$sth = $dbh->prepare($qry);
$sth->execute($needle);
 
my $iid = undef;
 
# Hint:
# For zarafa deployments with attachment_storage = files, the requested mails
# are not stored in the lob table, but within /var/lib/zarafa/. Let iid be
# 54321, then the mail is stored in /var/lib/zarafa/1/12/54321.gz
# The extraction logic is not covered here ...
 
my $written = 0;
 
while (my @row = $sth->fetchrow_array())
{
	$iid = $row[0];
	$qry = "SELECT val_binary FROM lob WHERE instanceid = $iid";
	$ref = $dbh->selectall_arrayref($qry);
 
	print "Looking for full mail of iid=$iid\n" if $DEBUG;
 
	for my $row (@$ref)
	{
		print " Found full mail with iid=$iid\n" if $DEBUG;
		my $tmp_file = sprintf '%s/%s.%04d.txt', $outdir,
				time, $written++;
		open my $fh, "> $tmp_file";
		print " Writing $tmp_file ...\n";
		print $fh $row->[0];
		close $fh;
	};
};
 
unless (defined $iid)
{
	print "No iid found, probably because the header didn't match.\n";
	print "If you are sure that the header should be found, try to use ",
	      "'.' (dot) as the username.\nThe DB query will be slower but ",
	      "might find the email.\n" unless $username eq '.';
} elsif ($written == 0)
{
	print	"Instanceid(s) found, but no files written? ", 
		"Probably attachment_storage != database\n";
}

 

 

Alle Artikel zum Thema Groupware stehen auch als eigene Kategorie mit eigenem Feed zur Verfügung. Falls Ihr tiefergehende Fragen zu Support und Services für Zarafa habt seid Ihr bei uns ebenfalls richtig.

Kategorien: HowTos
Tags: E-Mail Zarafa

DL

über den Autor

Damian Lukowski


Beitrag teilen: