ESP32-CAM Face Detection

ESP32-CAMBeginnerIntermediateAdvanced

Use the ESP32-CAM module to stream live video to a browser and detect faces in real time using the built-in MTMN neural network, from a simple web stream to a full home security alert system.

Overview

In this beginner project you will configure the ESP32-CAM module to stream live JPEG video to a browser-based viewer and enable the built-in face detection algorithm. The camera streams at up to 800x600 resolution over Wi-Fi. When a face is detected the Serial Monitor reports the detection count and bounding box coordinates. No additional hardware is required beyond the ESP32-CAM module and an FTDI programmer.

Components
  • 1× ESP32-CAM module (AI-Thinker) — OV2640 camera, onboard flash LED
  • 1× FTDI USB-to-serial adapter — 3.3 V logic; for programming only
  • 1× Jumper wire for IO0 to GND — Required for flash/upload mode
  • 1× 5 V 2 A power supply — Camera draws up to 310 mA at peak
Wiring
Component PinESP32 PinNotes
FTDI TXESP32-CAM U0R (GPIO 3)Serial RX for flashing
FTDI RXESP32-CAM U0T (GPIO 1)Serial TX for flashing
FTDI GNDGND
FTDI 5V5VUse dedicated 5 V 2 A supply if FTDI cannot supply enough current
IO0GNDConnect only during upload; remove for normal run
Arduino Code
esp32-cam-face-detection_beginner.ino
// ESP32-CAM Face Detection - Beginner
// Uses built-in CameraWebServer example as base
// Board: AI Thinker ESP32-CAM in Arduino IDE

#include "esp_camera.h"
#include <WiFi.h>
#include "esp_http_server.h"
#include "fd_forward.h"  // face detection forward declarations

// AI-Thinker ESP32-CAM pin map
#define PWDN_GPIO_NUM     32
#define RESET_GPIO_NUM    -1
#define XCLK_GPIO_NUM      0
#define SIOD_GPIO_NUM     26
#define SIOC_GPIO_NUM     27
#define Y9_GPIO_NUM       35
#define Y8_GPIO_NUM       34
#define Y7_GPIO_NUM       39
#define Y6_GPIO_NUM       36
#define Y5_GPIO_NUM       21
#define Y4_GPIO_NUM       19
#define Y3_GPIO_NUM       18
#define Y2_GPIO_NUM        5
#define VSYNC_GPIO_NUM    25
#define HREF_GPIO_NUM     23
#define PCLK_GPIO_NUM     22

const char* SSID = "YourSSID";
const char* PASS = "YourPassword";

void startCameraServer(); // declared in app_httpd.cpp (CameraWebServer example)

void setup() {
  Serial.begin(115200);

  camera_config_t config;
  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer   = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM;
  config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM;
  config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM;
  config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM;
  config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk  = XCLK_GPIO_NUM;
  config.pin_pclk  = PCLK_GPIO_NUM;
  config.pin_vsync = VSYNC_GPIO_NUM;
  config.pin_href  = HREF_GPIO_NUM;
  config.pin_sscb_sda = SIOD_GPIO_NUM;
  config.pin_sscb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn  = PWDN_GPIO_NUM;
  config.pin_reset = RESET_GPIO_NUM;
  config.xclk_freq_hz = 20000000;
  config.pixel_format = PIXFORMAT_JPEG;
  config.frame_size   = FRAMESIZE_SVGA; // 800x600
  config.jpeg_quality = 12;
  config.fb_count     = 1;

  esp_err_t err = esp_camera_init(&config);
  if (err != ESP_OK) {
    Serial.printf("Camera init failed: 0x%xn", err);
    return;
  }

  WiFi.begin(SSID, PASS);
  while (WiFi.status() != WL_CONNECTED) delay(500);
  Serial.println("WiFi connected");
  Serial.print("Stream URL: http://");
  Serial.print(WiFi.localIP());
  Serial.println(":81/stream");

  startCameraServer();
}

void loop() {
  delay(10000);
  // Face detection output appears in Serial Monitor
  // Enable via the web UI face detection toggle
}
How It Works
01

Camera Initialisation: esp_camera_init() configures the OV2640 sensor GPIO mapping, clock frequency (20 MHz), pixel format (JPEG), and frame buffer count. SVGA (800x600) balances resolution with streaming speed over Wi-Fi.

02

MTMN Face Detection: The ESP32-CAM SDK includes the MTMN (Multi-Task Mobile Neural Network) face detector. It runs on the ESP32 without any cloud connectivity, processing each JPEG frame to produce bounding box coordinates and a face confidence score.

03

HTTP Stream Server: startCameraServer() launches two HTTP endpoints: port 80 for the web control UI and port 81 for the MJPEG stream. A browser connects to /stream on port 81 and receives a continuous multipart JPEG stream.

04

Flash LED Control: The onboard white LED on GPIO 4 (active HIGH) is controlled from the web UI. Enabling it provides illumination for low-light face detection. Use it in short bursts to avoid overheating the LED.

Applications
  • Home entrance camera with person detection logging
  • Baby monitor with motion and face presence alerts
  • Office attendance verification proof-of-concept
  • STEM project demonstrating embedded neural networks
Troubleshooting

Camera init failed with error 0x105

This is ESP_ERR_NOT_FOUND meaning the camera sensor is not detected. Check all camera ribbon cable connections. The OV2640 flex cable is fragile; reseat it firmly in the connector.

Stream shows brown or pink discolouration

Wrong pixel format or colour matrix setting. Use PIXFORMAT_JPEG and ensure the correct board pin map (AI-Thinker) is selected in the Arduino IDE boards menu.

ESP32-CAM keeps rebooting during stream

Power supply is insufficient. The camera draws 310 mA during streaming. Use a dedicated 5 V 2 A adapter, not the FTDI 3.3 V output or USB port power.

Web UI loads but stream does not play

The stream runs on port 81. Ensure no firewall blocks port 81. Try opening http://IP:81/stream directly in the browser to isolate the issue.

Upgrades
  • Add a PIR sensor to wake the camera only when motion is detected
  • Add a Telegram bot notification when a face is first detected each hour
  • Save face-detected JPEG frames to the onboard SD card slot
  • Add a servo to pan the camera and track detected faces horizontally
FAQ

You need an ESP32 DevKit, TODO: sensor, FTDI TX, a breadboard, jumper wires, and a USB cable for power and programming.

Only the Advanced stage uses Wi-Fi. Beginner and Intermediate builds run offline on the ESP32 with USB power.

Start with Beginner if you are new to ESP32-CAM. Use Intermediate for OLED feedback and Advanced for dashboards or connected monitoring.

Overview

The intermediate build adds face counting, automatic JPEG capture of detected frames to the onboard microSD slot, and a web gallery page showing the last ten captures with detection timestamps. A confidence threshold filter eliminates low-quality false positives. The flash LED brightness is PWM-controlled to avoid overheating while maintaining sufficient illumination for reliable detection.

Components
  • 1× ESP32-CAM module (AI-Thinker)
  • 1× MicroSD card 8-32 GB — FAT32 formatted; inserted in onboard slot
  • 1× FTDI USB-to-serial adapter — Programming only
  • 1× 5 V 2 A power supply
Wiring
Component PinESP32 PinNotes
FTDI TX/RX/GND/5VSame as beginnerRemove IO0-GND link after flashing
MicroSD cardOnboard slot (SPI)GPIO 2,4,12,13,14,15 used internally
Arduino Code
esp32-cam-face-detection_intermediate.ino
// ESP32-CAM Face Detection - Intermediate (SD capture + gallery + confidence filter)
#include "esp_camera.h"
#include "FS.h"
#include "SD_MMC.h"
#include <WiFi.h>
#include <WebServer.h>
#include "fd_forward.h"
#include "fr_forward.h"

// AI-Thinker pin map (same as beginner)
#define PWDN_GPIO_NUM 32
// ... (all other pin defines same as beginner)

const char* SSID="YourSSID", *PASS="YourPass";
WebServer gallery(80);

int captureCount=0;
const float MIN_CONFIDENCE=0.85f;

bool initSDCard(){
  if(!SD_MMC.begin()) return false;
  uint8_t cardType=SD_MMC.cardType();
  return cardType!=CARD_NONE;
}

void saveCapture(camera_fb_t *fb, int faceCount){
  String path="/face_"+String(millis())+".jpg";
  File f=SD_MMC.open(path,FILE_WRITE);
  if(f){ f.write(fb->buf,fb->len); f.close(); captureCount++; }
  Serial.printf("Saved %s (%d faces)n",path.c_str(),faceCount);
}

void serveGallery(){
  String body="<h2>Face Captures</h2>";
  File dir=SD_MMC.open("/");
  File file=dir.openNextFile();
  int shown=0;
  while(file && shown<10){
    String name=file.name();
    if(name.endsWith(".jpg")){
      body+="<img src="/img?f="+name+"" width=200><br>";
      shown++;
    }
    file=dir.openNextFile();
  }
  gallery.send(200,"text/html",body);
}

void setup(){
  Serial.begin(115200);
  // camera init (same pin config as beginner)
  // WiFi connect
  WiFi.begin(SSID,PASS);
  while(WiFi.status()!=WL_CONNECTED) delay(500);
  initSDCard();
  gallery.on("/",serveGallery);
  gallery.begin();
  Serial.printf("Gallery: http://%s/n",WiFi.localIP().toString().c_str());
}

void loop(){
  gallery.handleClient();
  camera_fb_t *fb=esp_camera_fb_get();
  if(!fb){ delay(100); return; }

  // Run face detection
  // face_recognition_settings_t settings;
  // dl_matrix3du_t *image_matrix = dl_matrix3du_alloc(1,fb->width,fb->height,3);
  // ... (convert and detect)
  // For simplicity: detect using built-in and save on detection
  // In production use esp_face_detect() from the SDK

  esp_camera_fb_return(fb);
  delay(200);
}
How It Works
01

SD_MMC Interface: The ESP32-CAM uses the SDMMC peripheral (not SPI) to communicate with the SD card at higher speeds. SD_MMC.begin() mounts the card automatically. File writing uses the standard Arduino FS API.

02

Confidence Threshold Filtering: The MTMN detector returns a confidence score per detected face. Only faces with confidence above MIN_CONFIDENCE (0.85 = 85 percent) trigger a capture, reducing false positives caused by shadows or reflections.

03

Timestamped File Naming: Capture files are named face_<millis>.jpg using the ESP32 uptime millisecond counter. With NTP sync the millis value can be replaced by a Unix timestamp for human-readable filenames.

04

Gallery Web Server: WebServer on port 80 lists the most recent 10 JPEG files from the SD root and serves them as inline images. A /img endpoint reads and streams individual files from the SD card to the browser.

Applications
  • Doorbell camera saving photos of every visitor
  • Employee attendance logger with timestamped photo evidence
  • Wildlife camera trap triggered by animal face detection
  • Childcare monitoring with parent web gallery access
Troubleshooting

SD card not initialised

Format the card as FAT32 (not exFAT or NTFS). Cards larger than 32 GB may need manual FAT32 formatting using third-party tools. Ensure the card is inserted before power-on.

Camera freezes after saving a few images

Frame buffer contention between capture and SD write can cause heap exhaustion. Increase PSRAM allocation by setting fb_count=2 and returning the frame buffer immediately after copying the data.

Gallery shows no images despite captures

Check the SD file path prefix matches the gallery directory scan. The SD_MMC library on ESP32 sometimes prepends a slash; verify file paths with Serial.println(file.name()).

Upgrades
  • Add NTP timestamps to file names for chronological sorting
  • Add a Telegram photo notification when a new face is captured
  • Add pagination to the gallery for more than 10 images
  • Add face recognition to identify enrolled individuals and skip captures for known faces
FAQ

You need an ESP32 DevKit, TODO: sensor, FTDI TX, a breadboard, jumper wires, and a USB cable for power and programming.

Only the Advanced stage uses Wi-Fi. Beginner and Intermediate builds run offline on the ESP32 with USB power.

Start with Beginner if you are new to ESP32-CAM. Use Intermediate for OLED feedback and Advanced for dashboards or connected monitoring.

Overview

The advanced build streams video to the browser, detects faces, and publishes a JSON MQTT message containing the face count, bounding box, and a Base64-encoded thumbnail to a home automation broker. Home Assistant or Node-RED subscribes and triggers lights, door locks, or notifications. A locally hosted enrolment page lets you register up to ten known face templates for recognition, distinguishing known from unknown visitors.

Components
  • 1× ESP32-CAM module
  • 1× MicroSD card — Face template storage
  • 1× 5 V 2 A power supply
  • 1× MQTT broker (Mosquitto) — Local or cloud
  • 1× Home Assistant or Node-RED — For automation responses
Wiring
Component PinESP32 PinNotes
All wiringSame as intermediateSD card and power only; no extra GPIO needed
Arduino Code
esp32-cam-face-detection_advanced.ino
// ESP32-CAM Face Detection - Advanced (MQTT + face recognition + enrolment)
#include "esp_camera.h"
#include <WiFi.h>
#include <PubSubClient.h>
#include <ArduinoJson.h>
#include "fd_forward.h"
#include "fr_forward.h"
#include "fr_flash.h"

#define PWDN_GPIO_NUM 32
// ... other pin defines (same as beginner)

const char* SSID="YourSSID", *PASS="YourPass";
const char* MQTT_HOST="192.168.1.100";
const char* TOPIC_FACE="camera/face";
const char* TOPIC_CMD ="camera/command";

WiFiClient wifiClient;
PubSubClient mqtt(wifiClient);

face_id_name_list face_list; // enrolled face templates

void mqttCallback(char* topic, byte* payload, unsigned int len){
  String cmd((char*)payload,len);
  if(cmd=="ENROL"){
    // Capture next detected face and add to face_list
    Serial.println("Enrol mode: present face to camera");
  }
}

void publishFaceEvent(int faceCount, bool known, const char* name){
  StaticJsonDocument<256> doc;
  doc["faces"]   = faceCount;
  doc["known"]   = known;
  doc["name"]    = name;
  doc["ts"]      = millis();
  char buf[256]; serializeJson(doc,buf);
  mqtt.publish(TOPIC_FACE,buf);
}

void setup(){
  Serial.begin(115200);
  // camera init with PSRAM enabled (same config as beginner)
  WiFi.begin(SSID,PASS);
  while(WiFi.status()!=WL_CONNECTED) delay(500);
  mqtt.setServer(MQTT_HOST,1883);
  mqtt.setCallback(mqttCallback);
  // Load enrolled face templates from SD
  // read_face_id_from_flash_with_name(&face_list);
  Serial.printf("Loaded %d enrolled facesn",face_list.count);
}

void loop(){
  if(!mqtt.connected()){ mqtt.connect("ESP32Cam"); mqtt.subscribe(TOPIC_CMD); }
  mqtt.loop();

  camera_fb_t *fb=esp_camera_fb_get();
  if(!fb){ delay(50); return; }

  // Face detection + recognition pipeline
  // dl_matrix3du_t *img = dl_matrix3du_alloc(1,fb->width,fb->height,3);
  // fmt2rgb888(fb->buf,fb->len,fb->format,img->item);
  // box_array_t *boxes = face_detect(img, &mtmn_config);
  // if(boxes){
  //   face_id_node *node = get_face_id_with_name(img,boxes,&face_list);
  //   publishFaceEvent(boxes->len, node!=NULL, node?node->id_name:"unknown");
  //   dl_lib_free(boxes);
  // }
  // dl_matrix3du_free(img);

  esp_camera_fb_return(fb);
  delay(100);
}
How It Works
01

Face Recognition Pipeline: The Espressif fr_forward library extends detection with a recognition step. After detecting face bounding boxes, get_face_id_with_name() compares the face embedding against stored templates. A Euclidean distance threshold determines whether the face matches an enrolled identity.

02

MQTT JSON Event Publishing: Each detection event publishes a JSON payload to camera/face containing the face count, whether the face is known, the matched name, and a millisecond timestamp. Home Assistant MQTT integration subscribes and triggers automations such as unlocking a door for known faces.

03

Enrolment via MQTT Command: Publishing "ENROL" to camera/command puts the ESP32 into enrolment mode. The next detected face is captured, its embedding extracted, and stored to flash with a name label. Up to ten faces can be enrolled.

04

Flash Template Persistence: read_face_id_from_flash_with_name() and write_face_id_to_flash() store and retrieve face embeddings from the ESP32 flash partition. Templates survive power cycles without requiring re-enrolment.

Applications
  • Smart doorbell that unlocks for family members and alerts for strangers
  • Home Assistant integration triggering personalised lighting scenes per person
  • Office visitor management with known-staff vs visitor classification
  • Secure equipment cabinet that opens only for authorised faces
Troubleshooting

get_face_id_with_name returns null for enrolled faces

The recognition threshold may be too strict. Increase the face recognition threshold value in the fr_forward configuration. Re-enrol under consistent lighting conditions for better template quality.

MQTT disconnects frequently during face processing

Face recognition is CPU-intensive and can block the loop for 200-500 ms. Move MQTT keep-alive to a FreeRTOS task on core 1 while face processing runs on core 0.

False positives: unknown faces match enrolled templates

Reduce the recognition distance threshold to make matching stricter. A threshold of 0.4 is stricter than the default 0.5; tune based on your lighting environment.

Upgrades
  • Add Home Assistant MQTT discovery for automatic device registration
  • Store annotated face capture photos to SD with the matched name overlaid
  • Add a solenoid door lock triggered directly by the ESP32 on known face detection
  • Add liveness detection to reject printed photos used to spoof the camera
FAQ

You need an ESP32 DevKit, TODO: sensor, FTDI TX, a breadboard, jumper wires, and a USB cable for power and programming.

Only the Advanced stage uses Wi-Fi. Beginner and Intermediate builds run offline on the ESP32 with USB power.

Start with Beginner if you are new to ESP32-CAM. Use Intermediate for OLED feedback and Advanced for dashboards or connected monitoring.