google-nginx-seo.jpg

Get rid of duplicate /index.php URLs next to your pretty URLs using a simple Nginx rewrite

I am not sure how it happened, but somehow Google discovered index.php on my website, even though it isn't linked anywhere. And because I use absolute paths everywhere it also started indexing those URLs resulting in a lot of duplicate content, which is obviously not too good for SEO 😩.

Screen Shot 2018-08-17 at 00.02.56.jpg

I fixed it by matching index.php using a regex and redirecting to the contents coming after index.php. I did this by adding the following line to my server block.

rewrite ^/index.php/(.*) /$1 permanent;

There's a catch though, it doesn't catch everything! We still have the actual file index.php. I didn't really find a solution when this file is accessed directly and got into some infinite redirect loops. Eventually I ended up adding it to my robots.txt hoping that crawlers will ignore it.

Disallow: /index.php

Just for reference, this my full nginx config and robots.

server {
  listen 80;
  index index.php;
  root /code/public;

  client_max_body_size 100m;

  rewrite ^/index.php/(.*) /$1 permanent;

  location / {
    try_files $uri $uri/ /index.php?$args;
  }

  location ~ \.php$ {
    try_files $uri /index.php =404;
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    fastcgi_pass php-fpm:9000;
    fastcgi_index index.php;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_read_timeout 600;
    include fastcgi_params;
  }
}
User-Agent: *
Allow: /
Disallow: /index.php
Disallow: /admin
Disallow: /admin/*
Disallow: /404

Sitemap: https://wouterdeschuyter.be/sitemap.xml