Download attachments from a received message
Receive a webhook, list the attachments on the message, and stream the file content to your own storage.
When a message lands in an Emailgistics mailbox, your integration may need the actual attachment files — to OCR an invoice, archive a contract, route a claim form, etc. Emailgistics doesn’t store attachment content; it’s fetched on demand from the customer’s Exchange tenant when you ask for it.
This guide walks the full flow: webhook → list attachments → download.
Prerequisites
- An Emailgistics mailbox with the Webhooks feature enabled. If you don’t use webhooks, you can also get a
messageIdfrom the Received Message Detail report. - An API key whose scopes include both
messages:readandmessages:content. - Storage for the files: local disk, an object store (S3 / Azure Blob), or a downstream pipeline.
The flow
Receive the webhook
Configure a webhook in Admin (see Setting up a webhook) and a mailbox rule that triggers it on incoming messages. Each delivery includes the message’s identifier and a hasAttachments flag — short-circuit early when there’s nothing to download.
The id field in the webhook payload is the messageId you’ll use for the API calls below.
from flask import Flask, jsonify, request
app = Flask(__name__)
@app.post("/emailgistics-webhook")
def webhook():
payload = request.get_json()
if not payload["hasAttachments"]:
return jsonify({"status": "success", "message": "No attachments"})
message_id = payload["id"]
# ...List attachments
Call List attachments to retrieve metadata. The response is a list of objects with id, name, contentType, size, and attachmentType.
Filter on attachmentType — only fileAttachment can be downloaded. referenceAttachment and itemAttachment show up in the listing but the download endpoint returns 403 for them.
import requests
response = requests.get(
f"https://c1.emailgistics.com/api/v1/messages/{message_id}/attachments",
headers={"Authorization": f"Bearer {API_KEY}"},
timeout=30,
)
response.raise_for_status()
attachments = response.json()["attachments"]
downloadable = [a for a in attachments if a["attachmentType"] == "fileAttachment"]From here you can filter further on contentType (PDFs only), size (skip giant files), or name patterns — whatever your integration cares about.
Download each file
For each attachment you want, call Download an attachment. The response is binary on success; it’s JSON only on failure.
Stream the response rather than buffering it into memory. There’s no documented size cap, so a 100 MB PDF can occur.
def download_attachment(message_id: str, attachment: dict, dest_dir: str) -> str:
url = (
f"https://c1.emailgistics.com/api/v1/messages/{message_id}"
f"/attachments/{attachment['id']}/download"
)
response = requests.get(
url,
headers={"Authorization": f"Bearer {API_KEY}"},
stream=True,
timeout=300,
)
response.raise_for_status()
# Prefer Content-Disposition's filename; fall back to the metadata name.
disposition = response.headers.get("Content-Disposition", "")
filename = parse_filename(disposition) or attachment["name"]
path = os.path.join(dest_dir, filename)
with open(path, "wb") as f:
for chunk in response.iter_content(chunk_size=64 * 1024):
f.write(chunk)
return pathparse_filename (implemented in the full sketch below) pulls the filename parameter out of Content-Disposition. The exact format follows RFC 6266.
Return a webhook response
Whether or not the download succeeded, your webhook handler must return a response envelope. Doing the downloads inline keeps the handler simple but ties the webhook delivery’s success to your storage; doing them asynchronously is more resilient but means the webhook returns before files are saved.
Either approach is fine. Pick based on whether you’d rather have the webhook delivery retry on download failure, or own the retry logic in your own queue.
End-to-end Python sketch
A complete handler that downloads every PDF attachment to local disk:
import hmac
import os
import re
import requests
from flask import Flask, jsonify, request
API_KEY = os.environ["EMAILGISTICS_API_KEY"]
WEBHOOK_SECRET = os.environ["EMAILGISTICS_WEBHOOK_SECRET"]
BASE_URL = "https://c1.emailgistics.com/api/v1"
DEST_DIR = "/var/lib/inbox-attachments"
app = Flask(__name__)
def parse_filename(disposition: str) -> str | None:
# Cheap RFC 6266 — good enough for ASCII filenames, which is what
# Emailgistics emits today. Handles `filename="x.pdf"` and `filename=x.pdf`.
match = re.search(r'filename\*?=(?:UTF-8\'\'|")?([^";]+)', disposition)
return match.group(1) if match else None
@app.post("/emailgistics-webhook")
def webhook():
auth = request.headers.get("Authorization", "")
if not auth.startswith("Bearer ") or not hmac.compare_digest(auth[7:], WEBHOOK_SECRET):
return ("", 401)
payload = request.get_json()
if payload.get("event") != "received" or not payload.get("hasAttachments"):
return jsonify({"status": "success", "message": "Nothing to download."})
message_id = payload["id"]
# 1. List
listing = requests.get(
f"{BASE_URL}/messages/{message_id}/attachments",
headers={"Authorization": f"Bearer {API_KEY}"},
timeout=30,
)
listing.raise_for_status()
attachments = listing.json()["attachments"]
pdfs = [
a for a in attachments
if a["attachmentType"] == "fileAttachment"
and a["contentType"] == "application/pdf"
]
# 2. Download each
saved = []
for a in pdfs:
url = f"{BASE_URL}/messages/{message_id}/attachments/{a['id']}/download"
with requests.get(
url,
headers={"Authorization": f"Bearer {API_KEY}"},
stream=True,
timeout=300,
) as resp:
resp.raise_for_status()
filename = (
parse_filename(resp.headers.get("Content-Disposition", ""))
or a["name"]
)
path = os.path.join(DEST_DIR, f"{message_id}-{filename}")
with open(path, "wb") as f:
for chunk in resp.iter_content(chunk_size=64 * 1024):
f.write(chunk)
saved.append(path)
return jsonify({
"status": "success",
"message": f"Saved {len(saved)} PDF(s).",
})Things to watch for
-
Idempotency. Webhook deliveries can retry on transient failures (see Delivery, retries and security). The example above prefixes filenames with
message_id, which is enough for “don’t overwrite different messages’ files” — but a retry of the same delivery would still re-download the same files. If that matters, key your storage onmessage_id+attachment_idand skip downloads whose target already exists. -
Streaming large files. Always use
stream=Trueanditer_content. Buffering a 50 MB file into memory is fine; buffering a 500 MB one will OOM your worker. -
Only
fileAttachmentis downloadable. A list response can includereferenceAttachment(links to files in OneDrive / SharePoint) anditemAttachment(other Outlook items embedded in the message). The download endpoint returns403for both. Filter them out at the listing step rather than handling the403. -
Binary success, JSON failure. A successful download doesn’t have a JSON envelope — it’s the raw bytes with
Content-Typematching the attachment. A failed download returns the standard error envelope as JSON. Check the response status before reading the body. -
Per-call timeouts. Listing is cheap; downloads can be slow if Exchange is under load. The example uses 30s for listing and 300s for downloads. Tune for your environment.
-
Storage hygiene. Files contain customer data. Encrypt at rest, set retention policies, and don’t log filenames or contents into anything you wouldn’t show the customer.
-
Rate limits. The 100 requests-per-minute cap (see Rate limits) is shared across all
/api/v1/...calls. A message with 30 attachments needs 31 calls — list + download — so a high-volume mailbox can rate-limit itself.