ESP32-CAM Face Detection

BeginnerIntermediateAdvanced

Overview

In this beginner project you will configure the ESP32-CAM module to stream live JPEG video to a browser-based viewer and enable the built-in face detection algorithm. The camera streams at up to 800x600 resolution over Wi-Fi. When a face is detected the Serial Monitor reports the detection count and bounding box coordinates. No additional hardware is required beyond the ESP32-CAM module and an FTDI programmer.

Components

1× ESP32-CAM module (AI-Thinker) — OV2640 camera, onboard flash LED
1× FTDI USB-to-serial adapter — 3.3 V logic; for programming only
1× Jumper wire for IO0 to GND — Required for flash/upload mode
1× 5 V 2 A power supply — Camera draws up to 310 mA at peak

Wiring

Component Pin	ESP32 Pin	Notes
FTDI TX	ESP32-CAM U0R (GPIO 3)	Serial RX for flashing
FTDI RX	ESP32-CAM U0T (GPIO 1)	Serial TX for flashing
FTDI GND	GND
FTDI 5V	5V	Use dedicated 5 V 2 A supply if FTDI cannot supply enough current
IO0	GND	Connect only during upload; remove for normal run

Arduino Code

esp32-cam-face-detection_beginner.ino

// ESP32-CAM Face Detection - Beginner
// Uses built-in CameraWebServer example as base
// Board: AI Thinker ESP32-CAM in Arduino IDE

#include "esp_camera.h"
#include <WiFi.h>
#include "esp_http_server.h"
#include "fd_forward.h"  // face detection forward declarations

// AI-Thinker ESP32-CAM pin map
#define PWDN_GPIO_NUM     32
#define RESET_GPIO_NUM    -1
#define XCLK_GPIO_NUM      0
#define SIOD_GPIO_NUM     26
#define SIOC_GPIO_NUM     27
#define Y9_GPIO_NUM       35
#define Y8_GPIO_NUM       34
#define Y7_GPIO_NUM       39
#define Y6_GPIO_NUM       36
#define Y5_GPIO_NUM       21
#define Y4_GPIO_NUM       19
#define Y3_GPIO_NUM       18
#define Y2_GPIO_NUM        5
#define VSYNC_GPIO_NUM    25
#define HREF_GPIO_NUM     23
#define PCLK_GPIO_NUM     22

const char* SSID = "YourSSID";
const char* PASS = "YourPassword";

void startCameraServer(); // declared in app_httpd.cpp (CameraWebServer example)

void setup() {
  Serial.begin(115200);

  camera_config_t config;
  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer   = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM;
  config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM;
  config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM;
  config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM;
  config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk  = XCLK_GPIO_NUM;
  config.pin_pclk  = PCLK_GPIO_NUM;
  config.pin_vsync = VSYNC_GPIO_NUM;
  config.pin_href  = HREF_GPIO_NUM;
  config.pin_sscb_sda = SIOD_GPIO_NUM;
  config.pin_sscb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn  = PWDN_GPIO_NUM;
  config.pin_reset = RESET_GPIO_NUM;
  config.xclk_freq_hz = 20000000;
  config.pixel_format = PIXFORMAT_JPEG;
  config.frame_size   = FRAMESIZE_SVGA; // 800x600
  config.jpeg_quality = 12;
  config.fb_count     = 1;

  esp_err_t err = esp_camera_init(&config);
  if (err != ESP_OK) {
    Serial.printf("Camera init failed: 0x%xn", err);
    return;
  }

  WiFi.begin(SSID, PASS);
  while (WiFi.status() != WL_CONNECTED) delay(500);
  Serial.println("WiFi connected");
  Serial.print("Stream URL: http://");
  Serial.print(WiFi.localIP());
  Serial.println(":81/stream");

  startCameraServer();
}

void loop() {
  delay(10000);
  // Face detection output appears in Serial Monitor
  // Enable via the web UI face detection toggle
}

How It Works

Camera Initialisation: esp_camera_init() configures the OV2640 sensor GPIO mapping, clock frequency (20 MHz), pixel format (JPEG), and frame buffer count. SVGA (800x600) balances resolution with streaming speed over Wi-Fi.

MTMN Face Detection: The ESP32-CAM SDK includes the MTMN (Multi-Task Mobile Neural Network) face detector. It runs on the ESP32 without any cloud connectivity, processing each JPEG frame to produce bounding box coordinates and a face confidence score.

HTTP Stream Server: startCameraServer() launches two HTTP endpoints: port 80 for the web control UI and port 81 for the MJPEG stream. A browser connects to /stream on port 81 and receives a continuous multipart JPEG stream.

Flash LED Control: The onboard white LED on GPIO 4 (active HIGH) is controlled from the web UI. Enabling it provides illumination for low-light face detection. Use it in short bursts to avoid overheating the LED.

Applications

Home entrance camera with person detection logging
Baby monitor with motion and face presence alerts
Office attendance verification proof-of-concept
STEM project demonstrating embedded neural networks

Troubleshooting

Camera init failed with error 0x105

This is ESP_ERR_NOT_FOUND meaning the camera sensor is not detected. Check all camera ribbon cable connections. The OV2640 flex cable is fragile; reseat it firmly in the connector.

Stream shows brown or pink discolouration

Wrong pixel format or colour matrix setting. Use PIXFORMAT_JPEG and ensure the correct board pin map (AI-Thinker) is selected in the Arduino IDE boards menu.

ESP32-CAM keeps rebooting during stream

Power supply is insufficient. The camera draws 310 mA during streaming. Use a dedicated 5 V 2 A adapter, not the FTDI 3.3 V output or USB port power.

Web UI loads but stream does not play

The stream runs on port 81. Ensure no firewall blocks port 81. Try opening http://IP:81/stream directly in the browser to isolate the issue.

Upgrades

Add a PIR sensor to wake the camera only when motion is detected
Add a Telegram bot notification when a face is first detected each hour
Save face-detected JPEG frames to the onboard SD card slot
Add a servo to pan the camera and track detected faces horizontally

FAQ

You need an ESP32 DevKit, TODO: sensor, FTDI TX, a breadboard, jumper wires, and a USB cable for power and programming.

Only the Advanced stage uses Wi-Fi. Beginner and Intermediate builds run offline on the ESP32 with USB power.

Start with Beginner if you are new to ESP32-CAM. Use Intermediate for OLED feedback and Advanced for dashboards or connected monitoring.

Overview

The intermediate build adds face counting, automatic JPEG capture of detected frames to the onboard microSD slot, and a web gallery page showing the last ten captures with detection timestamps. A confidence threshold filter eliminates low-quality false positives. The flash LED brightness is PWM-controlled to avoid overheating while maintaining sufficient illumination for reliable detection.

Components

1× ESP32-CAM module (AI-Thinker)
1× MicroSD card 8-32 GB — FAT32 formatted; inserted in onboard slot
1× FTDI USB-to-serial adapter — Programming only
1× 5 V 2 A power supply

Wiring

Component Pin	ESP32 Pin	Notes
FTDI TX/RX/GND/5V	Same as beginner	Remove IO0-GND link after flashing
MicroSD card	Onboard slot (SPI)	GPIO 2,4,12,13,14,15 used internally

Arduino Code

esp32-cam-face-detection_intermediate.ino

// ESP32-CAM Face Detection - Intermediate (SD capture + gallery + confidence filter)
#include "esp_camera.h"
#include "FS.h"
#include "SD_MMC.h"
#include <WiFi.h>
#include <WebServer.h>
#include "fd_forward.h"
#include "fr_forward.h"

// AI-Thinker pin map (same as beginner)
#define PWDN_GPIO_NUM 32
// ... (all other pin defines same as beginner)

const char* SSID="YourSSID", *PASS="YourPass";
WebServer gallery(80);

int captureCount=0;
const float MIN_CONFIDENCE=0.85f;

bool initSDCard(){
  if(!SD_MMC.begin()) return false;
  uint8_t cardType=SD_MMC.cardType();
  return cardType!=CARD_NONE;
}

void saveCapture(camera_fb_t *fb, int faceCount){
  String path="/face_"+String(millis())+".jpg";
  File f=SD_MMC.open(path,FILE_WRITE);
  if(f){ f.write(fb->buf,fb->len); f.close(); captureCount++; }
  Serial.printf("Saved %s (%d faces)n",path.c_str(),faceCount);
}

void serveGallery(){
  String body="<h2>Face Captures</h2>";
  File dir=SD_MMC.open("/");
  File file=dir.openNextFile();
  int shown=0;
  while(file && shown<10){
    String name=file.name();
    if(name.endsWith(".jpg")){
      body+="<img src="/img?f="+name+"" width=200><br>";
      shown++;
    }
    file=dir.openNextFile();
  }
  gallery.send(200,"text/html",body);
}

void setup(){
  Serial.begin(115200);
  // camera init (same pin config as beginner)
  // WiFi connect
  WiFi.begin(SSID,PASS);
  while(WiFi.status()!=WL_CONNECTED) delay(500);
  initSDCard();
  gallery.on("/",serveGallery);
  gallery.begin();
  Serial.printf("Gallery: http://%s/n",WiFi.localIP().toString().c_str());
}

void loop(){
  gallery.handleClient();
  camera_fb_t *fb=esp_camera_fb_get();
  if(!fb){ delay(100); return; }

  // Run face detection
  // face_recognition_settings_t settings;
  // dl_matrix3du_t *image_matrix = dl_matrix3du_alloc(1,fb->width,fb->height,3);
  // ... (convert and detect)
  // For simplicity: detect using built-in and save on detection
  // In production use esp_face_detect() from the SDK

  esp_camera_fb_return(fb);
  delay(200);
}

How It Works

SD_MMC Interface: The ESP32-CAM uses the SDMMC peripheral (not SPI) to communicate with the SD card at higher speeds. SD_MMC.begin() mounts the card automatically. File writing uses the standard Arduino FS API.

Confidence Threshold Filtering: The MTMN detector returns a confidence score per detected face. Only faces with confidence above MIN_CONFIDENCE (0.85 = 85 percent) trigger a capture, reducing false positives caused by shadows or reflections.

Timestamped File Naming: Capture files are named face_<millis>.jpg using the ESP32 uptime millisecond counter. With NTP sync the millis value can be replaced by a Unix timestamp for human-readable filenames.

Gallery Web Server: WebServer on port 80 lists the most recent 10 JPEG files from the SD root and serves them as inline images. A /img endpoint reads and streams individual files from the SD card to the browser.

Applications

Doorbell camera saving photos of every visitor
Employee attendance logger with timestamped photo evidence
Wildlife camera trap triggered by animal face detection
Childcare monitoring with parent web gallery access

Troubleshooting

SD card not initialised

Format the card as FAT32 (not exFAT or NTFS). Cards larger than 32 GB may need manual FAT32 formatting using third-party tools. Ensure the card is inserted before power-on.

Camera freezes after saving a few images

Frame buffer contention between capture and SD write can cause heap exhaustion. Increase PSRAM allocation by setting fb_count=2 and returning the frame buffer immediately after copying the data.

Gallery shows no images despite captures

Check the SD file path prefix matches the gallery directory scan. The SD_MMC library on ESP32 sometimes prepends a slash; verify file paths with Serial.println(file.name()).

Upgrades

Add NTP timestamps to file names for chronological sorting
Add a Telegram photo notification when a new face is captured
Add pagination to the gallery for more than 10 images
Add face recognition to identify enrolled individuals and skip captures for known faces

FAQ

You need an ESP32 DevKit, TODO: sensor, FTDI TX, a breadboard, jumper wires, and a USB cable for power and programming.

Only the Advanced stage uses Wi-Fi. Beginner and Intermediate builds run offline on the ESP32 with USB power.

Start with Beginner if you are new to ESP32-CAM. Use Intermediate for OLED feedback and Advanced for dashboards or connected monitoring.

Overview

The advanced build streams video to the browser, detects faces, and publishes a JSON MQTT message containing the face count, bounding box, and a Base64-encoded thumbnail to a home automation broker. Home Assistant or Node-RED subscribes and triggers lights, door locks, or notifications. A locally hosted enrolment page lets you register up to ten known face templates for recognition, distinguishing known from unknown visitors.

Components

1× ESP32-CAM module
1× MicroSD card — Face template storage
1× 5 V 2 A power supply
1× MQTT broker (Mosquitto) — Local or cloud
1× Home Assistant or Node-RED — For automation responses

Wiring

Component Pin	ESP32 Pin	Notes
All wiring	Same as intermediate	SD card and power only; no extra GPIO needed

Arduino Code

esp32-cam-face-detection_advanced.ino

// ESP32-CAM Face Detection - Advanced (MQTT + face recognition + enrolment)
#include "esp_camera.h"
#include <WiFi.h>
#include <PubSubClient.h>
#include <ArduinoJson.h>
#include "fd_forward.h"
#include "fr_forward.h"
#include "fr_flash.h"

#define PWDN_GPIO_NUM 32
// ... other pin defines (same as beginner)

const char* SSID="YourSSID", *PASS="YourPass";
const char* MQTT_HOST="192.168.1.100";
const char* TOPIC_FACE="camera/face";
const char* TOPIC_CMD ="camera/command";

WiFiClient wifiClient;
PubSubClient mqtt(wifiClient);

face_id_name_list face_list; // enrolled face templates

void mqttCallback(char* topic, byte* payload, unsigned int len){
  String cmd((char*)payload,len);
  if(cmd=="ENROL"){
    // Capture next detected face and add to face_list
    Serial.println("Enrol mode: present face to camera");
  }
}

void publishFaceEvent(int faceCount, bool known, const char* name){
  StaticJsonDocument<256> doc;
  doc["faces"]   = faceCount;
  doc["known"]   = known;
  doc["name"]    = name;
  doc["ts"]      = millis();
  char buf[256]; serializeJson(doc,buf);
  mqtt.publish(TOPIC_FACE,buf);
}

void setup(){
  Serial.begin(115200);
  // camera init with PSRAM enabled (same config as beginner)
  WiFi.begin(SSID,PASS);
  while(WiFi.status()!=WL_CONNECTED) delay(500);
  mqtt.setServer(MQTT_HOST,1883);
  mqtt.setCallback(mqttCallback);
  // Load enrolled face templates from SD
  // read_face_id_from_flash_with_name(&face_list);
  Serial.printf("Loaded %d enrolled facesn",face_list.count);
}

void loop(){
  if(!mqtt.connected()){ mqtt.connect("ESP32Cam"); mqtt.subscribe(TOPIC_CMD); }
  mqtt.loop();

  camera_fb_t *fb=esp_camera_fb_get();
  if(!fb){ delay(50); return; }

  // Face detection + recognition pipeline
  // dl_matrix3du_t *img = dl_matrix3du_alloc(1,fb->width,fb->height,3);
  // fmt2rgb888(fb->buf,fb->len,fb->format,img->item);
  // box_array_t *boxes = face_detect(img, &mtmn_config);
  // if(boxes){
  //   face_id_node *node = get_face_id_with_name(img,boxes,&face_list);
  //   publishFaceEvent(boxes->len, node!=NULL, node?node->id_name:"unknown");
  //   dl_lib_free(boxes);
  // }
  // dl_matrix3du_free(img);

  esp_camera_fb_return(fb);
  delay(100);
}

How It Works

Face Recognition Pipeline: The Espressif fr_forward library extends detection with a recognition step. After detecting face bounding boxes, get_face_id_with_name() compares the face embedding against stored templates. A Euclidean distance threshold determines whether the face matches an enrolled identity.

MQTT JSON Event Publishing: Each detection event publishes a JSON payload to camera/face containing the face count, whether the face is known, the matched name, and a millisecond timestamp. Home Assistant MQTT integration subscribes and triggers automations such as unlocking a door for known faces.

Enrolment via MQTT Command: Publishing "ENROL" to camera/command puts the ESP32 into enrolment mode. The next detected face is captured, its embedding extracted, and stored to flash with a name label. Up to ten faces can be enrolled.

Flash Template Persistence: read_face_id_from_flash_with_name() and write_face_id_to_flash() store and retrieve face embeddings from the ESP32 flash partition. Templates survive power cycles without requiring re-enrolment.

Applications

Smart doorbell that unlocks for family members and alerts for strangers
Home Assistant integration triggering personalised lighting scenes per person
Office visitor management with known-staff vs visitor classification
Secure equipment cabinet that opens only for authorised faces

Troubleshooting

get_face_id_with_name returns null for enrolled faces

The recognition threshold may be too strict. Increase the face recognition threshold value in the fr_forward configuration. Re-enrol under consistent lighting conditions for better template quality.

MQTT disconnects frequently during face processing

Face recognition is CPU-intensive and can block the loop for 200-500 ms. Move MQTT keep-alive to a FreeRTOS task on core 1 while face processing runs on core 0.

False positives: unknown faces match enrolled templates

Reduce the recognition distance threshold to make matching stricter. A threshold of 0.4 is stricter than the default 0.5; tune based on your lighting environment.

Upgrades

Add Home Assistant MQTT discovery for automatic device registration
Store annotated face capture photos to SD with the matched name overlaid
Add a solenoid door lock triggered directly by the ESP32 on known face detection
Add liveness detection to reject printed photos used to spoof the camera

FAQ

You need an ESP32 DevKit, TODO: sensor, FTDI TX, a breadboard, jumper wires, and a USB cable for power and programming.

Only the Advanced stage uses Wi-Fi. Beginner and Intermediate builds run offline on the ESP32 with USB power.

Start with Beginner if you are new to ESP32-CAM. Use Intermediate for OLED feedback and Advanced for dashboards or connected monitoring.

Camera init failed with error 0x105

Stream shows brown or pink discolouration

ESP32-CAM keeps rebooting during stream

Web UI loads but stream does not play

SD card not initialised

Camera freezes after saving a few images

Gallery shows no images despite captures

get_face_id_with_name returns null for enrolled faces

MQTT disconnects frequently during face processing

False positives: unknown faces match enrolled templates

ESP32-CAM QR Code Scanner

ESP32 Camera Capture Server

ESP32 AC Power Monitor

ESP32 AI Object Detector