Saturday, June 4, 2011

How to : Find simple OR duplicate files in Linux / Unix

Recently I have bought an USB hard-disk which is 1TB . And I have copied some movies around 500GB from one of my friends Hard disk. Then when I am checking the content I found that some of the movies are repeated and that to with different names, in different folders etc. This made finding them difficult. So how to find all those files and deleted them if not required. There is one command in Linux called fdupes(File DUPlicatES) which is not installed by default. Lets see how we can use it.

In Redhat/Fedora/CentOS

#yum install fdupes 

In Debain/Ubuntu

#apt-get install fdupes

Once installed you can use right away to find duplicates..

To find duplicates in a folder

#fdupes /path/to/folder

This will find all the duplicate files in that folder. What about if you want to find in sub folders too?

Use -r option to recursive search..

#fdupes -r /path/to/folder

but this will not show what is the size of that duped files? Then there is any option to find sizes too? Yes you can use -S to find sizes too..

#fdupes -S -r /path/to/folder

What about deleting them with a conformation so that you no need to go into every folder and delete them.

#fdupes -d -r /path/to/folder

Sample output

#fdupes -d -S -r /media/Movies/

Handy Command Line Tricks for Linux


Command Line Tricks for Linux

There are many Linux commands which helps us in day to day Life .... find some useful ones here

1. Linux comes in several flavors. The following commands will help you determine which Linux Distro is installed on your host, what’s the version of your Linux kernel, the CPU model, processor speed, etc.
$ cat /etc/issue$ cat /proc/version$ cat /proc/cpuinfo
2. Find the total amount of RAM available on your Linux box and how much is free.
$ free -mto
3. The command cd.. takes you up one directory level but cd – will move you to the previous working directory. Or use the command pwd to print the full path of the current directory that you can copy-paste later into the shell.
$ cd –$ pwd
4. The command history will show a list of all the recently executed commands and each will have an associated number. Use ! to execute that command again. Or, if the history is too long, use grep to search a particular command.
$ !$ history | grep 
5. You can remove any particular command from the shell history by number.
$ history –d 
6. If you made an error while typing a command name, just enter the correct command name and then use !* to reuse all the previous arguments.
$  !*
7. Re-run a command but after replacing the text abc in the command with xyz.
$ ^abc^xyz
8. This will list the size of all sub-folders of a directory in KB, MB or GB.
$ du –sh */
9. A better version of the ls command that displays file sizes in KB and MB.
$ ls –gho
10. You can use man to learn more about the syntax of a command but what if you don’t remember the name of the command itself? Use apropos then.
$ apropos 
11. Compare the content of two text files to see what has changed.
$ diff wp-config.php wp-config.php.old
12. Find lines that are common in any two text files.
$ grep –Fx –f file-A.html file-B.html
13. Compare the content of two directories recursively.
$ diff –urp /old-wp-directory /new-wp-directory
14. Find all files under the current directory that are larger than 10 MB in size.
$ find . -size +10M -exec du -h {} \;
15. Find all files on the system that have been modified in the last 2 days.
$ find . –type f –mtime -2
16. Find all files on the system that were modified less than 10 minutes ago
$ find . –type f –mmin -10
17. Find all PHP files that contain a particular word or phrase.
$ find . -name "*.php" -exec grep -i -H "matt mullenweg" {} \;
18. When copying or moving files, Linux won’t show a warning if you are overwriting an existing file. Therefore always use the –i switch to prevent overwrites.
$ cp –i abc.txt xyz.txt
19. Backup the content of the current folder into a tarball file using gzip compression.
$ tar zcfv backup.tar.gz /wp-directory/
20. Find processes with the highest CPU usage. Then use kill –9 pid to kill a process.
$ ps aux | sort -nrk 3 | head
21. Execute the following command in your Apache logs directory to determine hits coming from individual IP addresses.
$ cat access.log | awk '{print $1}' | sort | uniq -c | sort –n | tail
22. Monitor hits from Google bots to your website in real-time.
$ tail –f access.log | grep Googlebot
23. To find all files and web pages on your site that return a 404 error , run the following command in the Apache logs directory.
$ awk '$9 == 404 {print $7}' access.log | uniq -c | sort -rn | head
24. Find the 100 most popular pages of your site using Apache server logs again.
$ cat access.log | awk '{print $7}' |sort |uniq -c |sort -n |tail -n 100
25. Quickly find and replace a string in or more files.
$ find . -type f -name "*.php" -exec sed -i 'labrs.org/' {} \;