CS180 • Project 4 · NeRF

Neural Radiance Fields

By Kourosh Salahi · CS180 / 280A

This project implements NeRF completely from scratch: camera calibration, pose estimation, ray sampling, neural fields, volume rendering, and training NeRFs on both the classic Lego dataset and my own captured data. This page contains animations

Part 0 — Camera Calibration & Dataset Capture

Camera Frustums Visualization

After calibrating my camera through a grid of aruco tags, and taking multiple photos of my object from different views, I was able to estimate camera poses, as visualized below.

Viser camera view 1
Viser camera view 2

Calibration Visualization

Detected ArUco corners used for calibration

Part 1 — Neural Field Fitting (2D)

Model Architecture

For image fitting, I used a fully-connected network with width 256 and positional encoding L = 10. The model takes the 2D coordinates (after PE, dimension 4L + 2) and passes them through three hidden layers:

Training used Adam with learning rate 1e-2 and MSE loss.

Training Progression — Fox (Width=256, L=10)

Training iterations shown: 0 → 50 → 100 → 200 → 500 → 1000 → 1999.

Final Results

Four combinations of PE frequency (L ∈ {4,10}) and width (W ∈ {64,256}), all at iter1999. It seems as though Positional encoding has the highest effect on our overall image ouput quality.

W=64, L=4
W=64, L=10
W=256, L=4
W=256, L=10

PSNR Curves (Fox)

W=64, L=4
W=64, L=10
W=256, L=4
W=256, L=10

Training Progression — Monkey (Width=256, L=10)

Training iterations shown: 0 → 50 → 100 → 200 → 500 → 1000 → 1999.

PSNR for my image

Part 2 — Neural Radiance Field on Multi-View Lego

In this section, I implemented a complete NeRF pipeline: converting image pixels to 3D world-space rays, sampling points along those rays, evaluating a radiance field network, and performing differentiable volume rendering to optimize the model against multiple posed RGB images.

2.1 Camera → Ray Pipeline

Together, these steps implement the full pixel → camera → world → ray pipeline required for NeRF training.

2.2 Sampling Rays & Points

I use global ray sampling, where all pixels from all training images are flattened into a single pool. Each pixel is recorded with a +0.5 offset so that sampling occurs at the pixel center. At every iteration, I randomly choose N pixel indices, giving uniformly distributed rays across all views.

For each selected pixel, I compute its ray origin and direction from the previous pixel_to_ray method. Each ray is then discretized into n_samples = 64 points between near = 2.0 and far = 6.0 using:

x = r_o + r_d · t

During training I ensure that every portion of the ray is eventually sampled and prevent the network from overfitting to fixed depths by adding random noise within each interval (i.e., ti ← ti + ε·Δt). This ensures that over training, every part of the ray contributes to the reconstruction. The values n_samples, near, and far are parameters in my implementation, and I later change them for different scenes (e.g., my real dataset uses a smaller range), since the optimal sampling bounds depend heavily on object scale and the physical capture setup.

2.3 Dataloader + Precomputation + Viser Visualizations

To accelerate training, I built a preprocessing stage that:

The dataloader then samples rays by simply indexing into these large flattened arrays. I validated correctness of UV alignment, ray orientation, and frustum consistency using Viser visualizations.

Rays from one view (sampled)
Different region of same camera
Other view of rays
All cameras + 100 rays
Other view

2.4 Neural Radiance Field MLP

My NeRF model closely follows the original architecture: a deep point MLP with a skip connection, a density head, and a separate view-dependent color head. Both 3D sample locations and ray directions use sinusoidal positional encoding before being fed into the network.

NeRF MLP Diagram

2.5 Volume Rendering

I implemented volume rendering entirely in a vectorized form. For each ray, the sampled densities σ and colors rgb are combined using the weighting:

w_i = T_i · (1 − exp(−σ_i Δt))

where the transmittance T_i is computed using a shifted cumulative sum (up until but excluding value i) of the densities:

T_i = exp( − cumulative_sum(σ · Δt) )

This matches the continuous volume rendering equation while remaining efficient, since all rays and samples are processed in parallel. The final pixel color is:

C = Σ w_i · rgb_i

Training Progression (0 → 1400 iterations)

Below are renders from the validation camera at key training iterations: 0, 200, 400, 600, 800, 1000, 1200, 1400.

iter 0
iter 200
iter 400
iter 600
iter 800
iter 1000
iter 1200
iter 1400

PSNR Curves

As you can see, the PSNR curve reaches above the goal of 23, and the average PSNR for the 10 images of the validation set with the trained model is 23.91.

PSNR Val
10 tractor image individual PSNRS

Spherical Rendering (Novel Views)

Rendered novel-view orbit of Lego

Part 2.6 — NeRF Trained on My Own Captured Object

For this section, I trained a full Neural Radiance Field using the dataset captured in Part 0. This involved using my own calibrated images and camera poses, then training a NeRF model identical in structure to Part 2, with adjustments to near/far ranges and sampling strategy due to the much smaller physical scale of the scene.

Model Architecture & Training Setup

I used a custom NeRF architecture with skip connection, separate density/color branches, and independent positional encoding frequencies for 3D sample points and view directions.

Hyperparameter & Implementation Adjustments

These adjustments were essential to avoid the model sampling empty space and to keep rays bounded within the tight capture volume around the object.

Training Loss Over Time

MSE Loss Curve
PSNR Over Training

Intermediate Training Renders

Below are snapshots of the NeRF’s output at various stages of training. These demonstrate how density and color predictions refine over time.

iter 0
iter 200
iter 400
iter 600
iter 800
iter 1000
iter 1200
iter 1400
iter 1600
iter 1800

Reference Image

Ground Truth Reference

Novel-View Rendering (Final)

After training, I generated a 360° orbit animation by using the provided “look-at-origin” camera generation code, producing a smooth camera path around the object. Because my Aruco tag was so big, I pointed it towards a different corner rather than the origin corner. Thus, I had to change the logic for the roation function to rotate along a different world point. The final rendered GIF is shown below:

Novel-view orbit of my reconstructed NeRF

Conclusion

I implemented a complete Neural Radiance Field pipeline from scratch. From camera calibration and PnP, to neural fields, ray sampling, and full volumetric rendering, this project deepened my understanding of differentiable rendering and scene representation. THANKS FOR VIEWING MY PROJECT!!!