Extra validation of URLs with 404 HTTP response code

webcoder

New Member
Wurlie User
Nov 8, 2013
10
0
0
While deleting urls from an abusive user I noticed that majority were 404. Therefore, have built a quick and simple logic that goes through DB url's and checks HTTP response code. It follows 301/302 etc. and in my case removes all urls with 404 response but it could be a page stating that the destination doesn't exist any more. There are cases when server response is extremely slow and these are marked for additional separate drill-down with extended timout. Some are infinite 301/302 loops and for these max hops are set to X number then treated as 404 / removed. These are some ideas. Am thinking about implementing an extra check for page content e.g. see if it contains 'not found' or 'denied' key words etc. to double-check or handle in different ways other codes 5XX.
 

webcoder

New Member
Wurlie User
Nov 8, 2013
10
0
0
Also, the system should check for multiple insertion of exactly the same origianUrl by the same IP address which would indicate spam as normal users usually don't need multiple short url's for the same source.
 

webcoder

New Member
Wurlie User
Nov 8, 2013
10
0
0
can't edit previous posts, just wanted to add for urls that are the same, come from the same ip address and inserts are within short period of time.