Linux File Identification: Do file extensions matter in Linux?
When working with an operating system such as Linux, Windows, or MacOS, the OS must be able to identify file types in some sort of the way.
And with that, the OS can be able to associate the file to the right application or action it should take upon you performing an open action.
With each operating system, it has its own approach to identifying files, where most of the identification is done through:
- File extensions
- Contents of a file that checks on MIME types and file signatures.
If you are coming from a Windows operating system environment, you know how the OS heavily relies on file extensions to open and execute files.
File type identification that is done on Windows is based on file extensions.
File extensions are set of characters appearing after a period (.) in the name of the file to identify its file type or format. “docx” is an example file extension used for identifying Microsoft Word file format/type.
However, when it comes to Linux, file identification does not rely on file extensions.
File extensions do not matter on Linux because the operating system relies on content-based file identification to determine what type of file it is. Linux checks on MIME types and file signatures/magic bytes that uniquely identify the types of a file.
However, when it comes to Linux users and applications, file extensions help to quickly identify the file type.
Let’s get into details to see how Linux identifies files and how that differs from a user’s or application’s perspective.
Why file extensions do not matter on Linux
Does not use them to determine what type of a file it is: whether executable, storage, or configuration file.
Does not use the extension to determine which program to open a file with
How users and applications recognize files
Linux does not follow the convention that humans do for recognizing filenames. Besides, it does not follow the file identification that applications use to identify file types.
You see,
When users and applications are interacting with a particular file name, they usually rely on the file extension to determine whether it is an image, video, or text file.
That way, the right application or execution can be performed on the file.
That’s why it so easy to change the file extensions on Windows to fool the system into executing a program file that is “hoodwinked” as a text file. (executable.exe.txt)
In this case, the user cannot discern the malicious “txt” file and upon opening it, the system executes it or (if it relies on file extensions) opens it as a text file that has corrupted data.
So, at this point, what you should know is that file extensions are heavily relied upon by users or applications to determine the type and format.
That’s how you are able to know if a file is an image by looking at the (.jpg, .jpeg, .png, etc) extensions. The same goes for documents (.pdf, .docx, .txt, .xlsx, etc), videos (.mp4, .avi, .mov, .mkv, etc), audio (.wav, .mp3, .aac, etc), and programming language-specific files (.py, .js, .cpp, etc)
However, when it comes to Linux, file extensions are not heavily relied upon to determine the file type and the action to take upon sending an open command by double-clicking on a file.
How does Linux examine a file?
Linux determines the file type and the action to take when opening files using content-based file identification.
Content-based file identification heavily relies on the metadata contained in a file to determine the appropriate application or action to take with a specific file. The identification approach relies on MIME types and file signatures that use magic numbers.
MIME types file identification on Linux
MIME (Multipurpose Internet Mail Extensions) types are file classification approaches that base files on their nature and format.
As it is classification-based, Linux relies on a database to check the association of a file to a particular MIME type.
So, before opening a file, Linux examines the contents of the file to check its MIME type, then cross-check the file type associated with that MIME type, finally, Linux is able to take the right action or application intended to manipulate, open, or execute the file.
So, if a MIME type extracted from a file is “text/plain”, Linux is able to determine that that file is a .txt file and it can be opened with a text editor application.
Here are the most common MIME types and their file associations on Linux:
MIME Type | File Association |
---|---|
text/plain | .txt, .log, .cfg, .conf, .sh, .c, .cpp |
application/pdf | |
application/json | .json |
application/xml | .xml |
application/zip | .zip |
application/gzip | .gz, .tar.gz, .tgz |
application/x-bzip2 | .bz2 |
application/x-tar | .tar |
application/x-rar | .rar |
application/x-7z-compressed | .7z |
image/jpeg | .jpg, .jpeg |
image/png | .png |
image/gif | .gif |
image/bmp | .bmp |
audio/mpeg | .mp3 |
audio/ogg | .ogg |
video/mp4 | .mp4 |
video/quicktime | .mov |
video/webm | .webm |
application/msword | .doc, .docx |
application/vnd.ms-excel | .xls, .xlsx |
application/vnd.openxmlformats-officedocument.presentationml.presentation | .ppt, .pptx |
application/rtf | .rtf |
application/zip | .zip |
application/x-tar | .tar |
application/x-gzip | .gz |
application/x-bzip2 | .bz2 |
application/x-7z-compressed | .7z |
application/octet-stream | No specific file extension |
If you want to check on the file type based on MIME types on Linux, you can use the xdg-mime command.
How to check the MIME type of a file using a Linux terminal
Step 1: Open the terminal using CTRL + ALT + T
Step 2: Use the command xdg-mime query filetype /path-to-your-file
to check the MIME type of a file.
For example, to check on the MIME type of a .txt file, you should use the following command:
xdg-mime query filetype file.txt
The same for Python files
xdg-mime query filetype main.py
You should get results like this:
Use of magic numbers to identify files on Linux
The other approach to content-based file identification on Linux is retrieving the magic numbers present at the beginning of the bytes of a file.
Magic numbers, which are also known as file signatures, are sequence of bytes at the beginning of the bytes making up the file that uniquely identify the format and type of a file.
So, as you know or not, a file is made up of a sequence of bytes arranged in an organized manner. These bytes are in the form of 1s and 0s.
Content-based file identification relying on magic numbers exploits this nature of sequence of bytes to add additional bytes at the beginning of the bytes that make up a file.
These file signatures in the form of bytes help uniquely identify the file type and format and it is heavily relied upon on Linux systems.
So, it works like this,
A file is accesses, the system reads the first few bytes of the file to extract the magic numbers.
The extracted magic bytes are cross-referenced against a database of know file formats and their associated types.
Based on the match, Linux is able to identify a file’s format, its type, and how it should be handled.
With such a use of file signatures, Linux is able to outrightly identify file types even when their file extensions look compromised as with the case of the executable disguised as a text file example with Windows.
A perfect Linux command that relies on magic numbers to identify file or MIME types of a file is the path command.
Here’s how to use to determine the file types of a file.
Open the Terminal.
Use the command file /path-to-your-file
file file.txt
In the example above, you should get the file type of the text file to be ASCII Text.
To recap,
Linux does not rely on file extensions to identify file types of a file. Instead, it relies on more reliable content-based approaches such as MIME types and file signatures.
Why content-based file identification is better than using extensions to identify files
Content-based file identification approach is more accurate because it examines the actual data present within a file. Exploits such as spoofed file extensions cannot happen with file-based identification.
Relying on content-based file identification leads to enhanced security as some malicious files cannot be opened based on deceptive extensions.
Content-based file identification is more flexible at handling new and unknown file types. Besides, very flexible at opening files with no extensions.
Can a file be without an extension?
A file can be without an extension. For example, you can have a text file with the name, ‘myfile’, without the extension, .txt. Linux can handle such a file appropriately because it does not rely on file extensions to access or execute a file.
However, some applications can incorrectly handle and access a file with no extension. In such a case, you may get meaningless mumbled data or an “unsupported file error”
To know the file type of a file with no extension on Linux, you can use the file or xdg-mime file utilities.
Using file utility to get the file type of a file with no extension:
Open the Terminal and execute the following command:
file file-with-no-extension
Without the file extension at the end of my file name, file utility is able to return the correct file type of the file, `file-with-no-extension.
The same happens with using xdg-mime utility to get the file type of a file with no extension
xdg file-with-no-extension
The command above produces the result:
text/plain
However, it is good practice to name your file with file extensions.
Best practices for file naming and extensions in Linux
- Avoid creating files with no extensions. Writing programs, if you are a programmer, will require you to reference file extensions. Besides, file extensions help identify its type and the appropriate application to open it with at first glance.
- Use appropriate file extensions to name your files. Name your files depending on the type of data they hold. Text files should have .txt extensions as an example.
- Use file extensions to avoid ambiguity in naming multiple files that may share the same name. A descriptive file extension should help prevent file overwrites.
- Consistent naming with meaningful extensions helps in categorizing and organizing files more efficiently.
Most common Linux file extensions to use
Although Linux does not use file extensions to determine the file type of a file, humans do. Thus, it is essential to name your files with an ending file extension to make other people who use your computer or even yourself identify if the file they are browsing is a text file, video file, document, or image.
Besides, file extensions are important in version control systems and when collaborating with other programmers.
Here are file extensions that you can use on Linux to name your files depending on the data they hold and their intended purpose.
File Extension | File intent/use or data |
---|---|
.deb | File extension for Linux executable |
.txt | Plain Text File |
.doc, .docx | Microsoft Word Document |
.rtf | Rich Text Format |
.odt | OpenDocument Text (LibreOffice, OpenOffice) |
.xls, .xlsx | Microsoft Excel Spreadsheet |
.ods | OpenDocument Spreadsheet (LibreOffice, OpenOffice) |
.ppt, .pptx | Microsoft PowerPoint Presentation |
.odp | OpenDocument Presentation (LibreOffice, OpenOffice) |
.jpg, .jpeg | JPEG Image |
.png | Portable Network Graphics Image |
.gif | Graphics Interchange Format Image |
.bmp | Bitmap Image |
.svg | Scalable Vector Graphics Image |
.mp3 | MP3 Audio File |
.wav | Waveform Audio File Format |
.ogg | Ogg Vorbis Audio File |
.mp4 | MPEG-4 Video File |
.avi | Audio Video Interleave File |
.mkv | Matroska Video File |
.mov | QuickTime Video File |
.zip | Zip Archive |
.tar | Tape Archive |
.gz | Gzip Compressed Archive |
.bz2 | Bzip2 Compressed Archive |
.7z | 7-Zip Compressed Archive |
.exe | Windows Executable |
.sh | Shell Script (Linux Executable) |
.rpm | Red Hat Package Manager Package |
.c, .cpp | C/C++ Source Code |
.java | Java Source Code |
.py | Python Script |
.html, .htm | HTML Web Page |
.css | Cascading Style Sheet |
.js | JavaScript File |
.sqlite | SQLite Database File |
.db | Database File (Generic) |
.conf | Configuration File (Generic) |
.ini | INI Configuration File |
How to display file extensions when listing files on a Linux terminal
To display the file extensions of files on a Linux terminal, you use ls -l or ls -p commands. The -l and -p flags are used in conjunction with the ls utility that is used for listing files on Linux.
Here’s how to list the files and their file extensions on a Linux terminal:
Step 1: Open the Terminal on your Linux machine.
Step 2: Navigate into the directory you want to list your files using the cd command.
Step 3: Use the ls -l
command to list the files in the directory, including their file extensions
ls -l
The command will display an exhaustive list of files and directories in the current directory with their names and respective file extensions.
Step 4: Use ls -p for a concise list of files and their file names
To display only the file names and their file extensions without additional file metadata, you can use ls -p
command.
ls -p
… and that’s it!
If you are new to Linux, you should definitely check out this guide I have written for you: