#3349 OpenCV Pitcher and Batter Motion Tracking

1. Overview of the Approach

Load a pre-trained pose model (Caffe-based “OpenPose” model trained on the MPI dataset).
Read the video frame by frame using OpenCV’s cv2.VideoCapture.
Perform a forward pass through the pose network to get 2D keypoint heatmaps.
Extract joint coordinates by finding the maxima in each heatmap.
Draw the skeleton (optional, for visualization).
Compute movement metrics:
- For each joint, compare its location in the current frame to its location in the previous frame.
- For example, compute Euclidean displacement or angle changes.
- Store or print these metrics for later analysis.

You can easily extend this to calculate specific joint angles (e.g., elbow flexion/extension), detect when a pitcher’s arm is fully cocked, or flag when a batter’s hip rotation exceeds a threshold.

2. Requirements

OpenCV (at least 4.2+, compiled with the DNN module).
Pre-trained pose files (OpenPose “MPI” version):
- pose_deploy_linevec_faster_4_stages.prototxt
- pose_iter_160000.caffemodel
  You can download them from the official OpenPose GitHub or from OpenCV’s sample model zoo.
NumPy (for basic array math).

Place the *.prototxt and *.caffemodel files in a folder (e.g., models/) next to your script, or adjust the paths accordingly.

3. Code Example

import cv2
import numpy as np
import math

# ------------------------------------------------------------
# 3.1. Configuration: Adjust these paths to your environment
# ------------------------------------------------------------
PROTO_FILE = "models/pose_deploy_linevec_faster_4_stages.prototxt"
WEIGHT_FILE = "models/pose_iter_160000.caffemodel"

# Input video path (e.g., a video of a pitcher/batter)
VIDEO_PATH = "input/baseball_action.mp4"

# Threshold for heatmap confidence
# (only consider joints where confidence > THRESH)
THRESH = 0.1

# MPI model has 15 keypoints; pairs define the skeleton to draw.
# If you choose to use the COCO model instead, update these pairs accordingly.
# (These indices correspond to MPI order: 0=Head,1=Neck,2=RShoulder,3=RElbow,4=RWrist,5=LShoulder,6=LElbow,7=LWrist,
#  8=RHip,9=RKnee,10=RAnkle,11=LHip,12=LKnee,13=LAnkle,14=Chest,15=Background)
POSE_PAIRS = [
    (0, 1), (1, 14), (14, 8), (8, 9), (9, 10),  # Right side: Head→Neck→Chest→RHip→RKnee→RAnkle
    (1, 5), (5, 6), (6, 7),                     # Left side: Neck→LShoulder→LElbow→LWrist
    (14, 11), (11, 12), (12, 13),               # Left lower: Chest→LHip→LKnee→LAnkle
    (1, 2), (2, 3), (3, 4)                      # Right upper: Neck→RShoulder→RElbow→RWrist
]

# MPI number of joints (excluding background)
NUM_JOINTS = 15

# ------------------------------------------------------------
# 3.2. Initialize OpenCV’s DNN Pose Model
# ------------------------------------------------------------
net = cv2.dnn.readNetFromCaffe(PROTO_FILE, WEIGHT_FILE)

# (Optional) Use GPU if available. Uncomment if you have a CUDA-enabled build.
# net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
# net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

# ------------------------------------------------------------
# 3.3. Helper Function: Extract Joint Coordinates from Heatmaps
# ------------------------------------------------------------
def extract_keypoints(prob_map, threshold=THRESH):
    """
    Given a single-joint probability (heatmap), find the location of the maximum confidence
    if it exceeds `threshold`. Otherwise, return None.
    """
    _, conf, _, point = cv2.minMaxLoc(prob_map)
    h, w = prob_map.shape
    if conf > threshold:
        return (int(point[0]), int(point[1]), conf)
    return None

# ------------------------------------------------------------
# 3.4. Process Video: Detect Pose and Compute Movement
# ------------------------------------------------------------
cap = cv2.VideoCapture(VIDEO_PATH)
if not cap.isOpened():
    raise IOError(f"Cannot open video file: {VIDEO_PATH}")

# Store previous frame’s keypoints to compute displacement
prev_keypoints = [None] * NUM_JOINTS

frame_idx = 0

# Loop over each frame in the video
while True:
    ret, frame = cap.read()
    if not ret:
        break  # End of video

    frame_idx += 1
    (frame_h, frame_w) = frame.shape[:2]

    # -----------------------------
    # 3.4.1. Prepare Input Blob
    # -----------------------------
    # The MPI model expects a 368x368 input (or 456x456). You can choose scale based on your compute budget.
    in_width = 368
    in_height = 368

    blob = cv2.dnn.blobFromImage(frame, scalefactor=1.0 / 255,
                                 size=(in_width, in_height),
                                 mean=(0, 0, 0), swapRB=False, crop=False)
    net.setInput(blob)

    # -----------------------------
    # 3.4.2. Forward Pass to Get Heatmaps
    # -----------------------------
    output = net.forward()

    # `output` shape: (1, NUM_JOINTS+1, H_out, W_out)
    # Typically, H_out=W_out=46 if input is 368×368, due to strides/pooling.

    # -----------------------------
    # 3.4.3. Extract Keypoints for This Frame
    # -----------------------------
    current_keypoints = [None] * NUM_JOINTS
    H_out = output.shape[2]
    W_out = output.shape[3]

    # Loop over each of the NUM_JOINTS heatmaps
    for j in range(NUM_JOINTS):
        # Slice heatmap of joint 'j'
        prob_map = output[0, j, :, :]
        # Resize heatmap to original frame size
        prob_map_resized = cv2.resize(prob_map, (frame_w, frame_h))
        keypt = extract_keypoints(prob_map_resized, threshold=THRESH)
        if keypt is not None:
            (x, y, confidence) = keypt
            current_keypoints[j] = (x, y, confidence)

    # -----------------------------
    # 3.4.4. Compute Movement Metrics
    # -----------------------------
    # For each joint that exists in both current and previous frames, compute displacement.
    joint_displacements = {}
    for idx in range(NUM_JOINTS):
        prev = prev_keypoints[idx]
        curr = current_keypoints[idx]
        if prev is not None and curr is not None:
            (x0, y0, c0) = prev
            (x1, y1, c1) = curr
            dx = x1 - x0
            dy = y1 - y0
            dist = math.hypot(dx, dy)
            joint_displacements[idx] = dist
        else:
            joint_displacements[idx] = None

    # (Optional) Print or log the per-joint displacement for this frame
    print(f"Frame {frame_idx}: Joint Displacements:")
    for idx, disp in joint_displacements.items():
        if disp is not None:
            print(f"  - Joint {idx:2d}: Δ = {disp:.2f} pixels")
        else:
            print(f"  - Joint {idx:2d}: not detected in one of the frames")

    # -----------------------------
    # 3.4.5. Draw Skeleton (Optional Visualization)
    # -----------------------------
    overlay = frame.copy()
    for pair in POSE_PAIRS:
        part_a, part_b = pair
        if current_keypoints[part_a] and current_keypoints[part_b]:
            x_a, y_a, c_a = current_keypoints[part_a]
            x_b, y_b, c_b = current_keypoints[part_b]
            # Draw line segment between joint a and joint b
            cv2.line(overlay, (x_a, y_a), (x_b, y_b), color=(0, 255, 0), thickness=2)
            # Draw circles at each joint
            cv2.circle(overlay, (x_a, y_a), radius=4, color=(0, 0, 255), thickness=-1)
            cv2.circle(overlay, (x_b, y_b), radius=4, color=(0, 0, 255), thickness=-1)

    # Blend original frame and skeleton overlay
    alpha = 0.6
    output_frame = cv2.addWeighted(overlay, alpha, frame, 1 - alpha, 0)

    # Show the result in a window
    cv2.imshow("Pose + Movement", output_frame)
    key = cv2.waitKey(1)
    if key == ord('q') or key == 27:
        break  # Exit on 'q' or 'Esc'

    # -----------------------------
    # 3.4.6. Update previous keypoints
    # -----------------------------
    prev_keypoints = current_keypoints.copy()

# Release resources
cap.release()
cv2.destroyAllWindows()

4. Explanation of Key Sections

Loading the network
```
net = cv2.dnn.readNetFromCaffe(PROTO_FILE, WEIGHT_FILE)
```
This line reads the Caffe model architecture (.prototxt) and weights (.caffemodel). You must download these files beforehand; the “MPI” variant provides 15 joints for basic human pose.
Reading and resizing frames
We read each frame, then create a blob with:
```
blob = cv2.dnn.blobFromImage(frame, 1.0/255, (368, 368), (0,0,0), swapRB=False, crop=False)
```
The DNN expects fixed-size input (368×368) with pixel values normalized to [0,1].
Forward pass & heatmaps
```
output = net.forward()
```
The output has shape (1, NUM_JOINTS+1, H_out, W_out). Each channel (0 to 14) is a heatmap for one joint; channel 15 is “background.”
Extracting actual 2D coordinates
For each joint heatmap, we resize it back to the original frame resolution, then locate the maximum. If that maximum confidence exceeds our threshold (e.g., 0.1), we accept it as a valid detection.
```
prob_map_resized = cv2.resize(prob_map, (frame_w, frame_h))
keypt = extract_keypoints(prob_map_resized, threshold=THRESH)
```
Movement computation
We store prev_keypoints (from the last frame) and compare to current_keypoints. If both exist, we compute Euclidean distance sqrt((x1-x0)² + (y1-y0)²). This “displacement in pixels/frame” can be a naive proxy for movement magnitude. You can extend this to calculate velocities (displacement per second) by dividing by the frame time, or compute joint-angle differences.
Drawing skeleton (optional)
If you want a quick visual check, draw lines between detected joints using the predefined POSE_PAIRS. Blending the overlay onto the original frame helps see the pose on top of the athlete.
Interpreting results
- Joint Displacements: Large jumps for the wrist or elbow might indicate a pitching motion or a batting swing.
- Joint Angles: For more advanced analysis, compute angles between triplets of joints (e.g., shoulder-elbow-wrist) to quantify flexion/extension.
- Aggregating over time: Store these metrics in lists or NumPy arrays for further post-analysis (e.g., plotting kinematic curves).

5. Extending to More Advanced Analyses

Joint Angles
To track, for example, the elbow angle:

# Example: Compute angle at right elbow (joint indices 2=RShoulder, 3=RElbow, 4=RWrist)
def angle_between(p1, p2, p3):
    # p2 is the vertex: angle between p1→p2 and p3→p2
    v1 = np.array([p1[0] - p2[0], p1[1] - p2[1]])
    v2 = np.array([p3[0] - p2[0], p3[1] - p2[1]])
    cos_angle = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2) + 1e-6)
    return math.degrees(math.acos(np.clip(cos_angle, -1.0, 1.0)))

# In your loop, once current_keypoints are extracted:
if current_keypoints[2] and current_keypoints[3] and current_keypoints[4]:
    shoulder = current_keypoints[2]
    elbow = current_keypoints[3]
    wrist = current_keypoints[4]
    elbow_angle = angle_between(shoulder, elbow, wrist)
    print(f"Frame {frame_idx}: Right Elbow Angle = {elbow_angle:.1f}°")

Smoothing / Filtering
If the raw keypoints are jittery, apply a simple moving average or a Kalman filter on each joint’s coordinates before computing displacements or angles.
Event Detection
- Pitch Release: Identify the frame when the elbow angle rapidly extends, or the wrist speed exceeds a threshold.
- Bat Swing Completion: Detect when the shoulder rotates beyond a certain threshold or when wrist displacement peaks.

Saving Results
You can store per-frame joint positions, displacements, or angles in a CSV or pass them to a higher-level analytics pipeline (e.g., NumPy/pandas → plotting with Matplotlib).

import csv

# Example: Write to CSV
with open("joint_movement.csv", "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    # Header row
    header = ["frame"]
    for j in range(NUM_JOINTS):
        header.append(f"joint{j}_dx")
    writer.writerow(header)

    # Inside loop, after computing joint_displacements:
    row = [frame_idx] + [joint_displacements[j] if joint_displacements[j] is not None else "" for j in range(NUM_JOINTS)]
    writer.writerow(row)

6. Tips & Tricks

Model Choice: The above uses the MPI variant of OpenPose (15 keypoints). If you need more granular joints (e.g., foot, face), switch to the COCO (18 keypoints) or BODY_25 models—just update NUM_JOINTS, POSE_PAIRS, and download the corresponding .prototxt/.caffemodel.
Performance: A 368×368 input runs at ~5–10 FPS on a mid-range GPU. If you need real-time at 30 FPS, consider a lighter pose estimator (e.g., MobileNet-based), or reduce input size (e.g., 256×256), or use a specialized library (MediaPipe Pose) but that falls outside pure OpenCV.
Camera Calibration / Perspective: If you need real-world kinematics (e.g., jumping height, bat speed in m/s), calibrate your camera and convert pixel coordinates into physical units.
Error Handling: Occasionally a joint will not be detected. In the code above, we store None and skip displacement for that joint. You might interpolate missing joint locations using neighboring frames.

7. Summary

This script demonstrates how to:

Load a video of a baseball pitcher or batter.
Run OpenCV’s DNN pose‐estimation to find 2D keypoints each frame.
Compute per‐joint displacements (i.e., “body movement”) by comparing joint positions between consecutive frames.
Visualize the skeleton overlaid on the original video.
Print or save movement metrics for further kinematic analysis (e.g., velocity curves, angle trajectories).

From here, you can adapt the code to:

Compute specific joint angles (shoulder, elbow, hip, knee).
Flag “peak” motion events (e.g., maximum elbow extension, maximum trunk rotation).
Integrate a smoothing filter to reduce jitter.
Export data into CSV or directly plot using Matplotlib/pandas for more in-depth analysis.

Feel free to adjust thresholds, input resolutions, or switch to a different pose model (COCO/MobileNet) depending on your accuracy vs. speed trade-offs. This should give you a solid foundation for identifying and quantifying body movement in baseball pitchers and batters using OpenCV.

Search This Blog

Learning from ChatGPT