How to Best Utilize FileCloud’s Metadata
Info: This is the first post in the series “Smart Classification, Metadata, and Smart DLP – the powerful combo” about data classification and security in FileCloud
The current world is all about data. Data mining, data science, AI, everything revolves around gathering data, processing data, and analyzing it. The same applies to modern businesses, where employees need to make informed decisions based on facts and hard statistics rather than assumptions and intuition. However, in 2021, data stored in files is not enough. And this is where metadata comes into the picture.
“Metadata – data that provides information about other data”
Although this definition is rather vague it explains the concept pretty well – we have to attach data to our datasets. Everything that provides valid information about a given file can be treated as metadata. Common file properties like size, creation date, owner, and extension provide basic information that can be used in querying, quickly understanding what can be expected (for example, one knows that a .png will contain an image), and performing many more daily operations.
Having that in mind FileCloud provides a submodule that allows our customers to attach metadata to files and folders that exist in the system. To make it easy and robust we’ve provided a highly configurable environment organized in a 2-level deep structure, containing metadata sets that group individual metadata attributes.
One can think about metadata sets as groups of related metadata fields or attributes. Administrators manage permissions for the whole set as well as enabling/disabling its visibility. Currently, FileCloud provides three types of metadata sets:
- Default metadata set
This predefined metadata set is shipped with each FileCloud installation and is automatically attached to every file and folder added to the system. By default, it provides a single metadata attribute, called ‘Tags’ that is intended to allow easy classification of files and folders. Administrators can set permissions for the Default set, edit attributes and provide a custom name and description if needed.
- Custom metadata sets
Custom metadata sets are fully configurable by administrators. FileCloud allows full control over the list of defined attributes and permissions assigned. Custom metadata sets can be added to files/folders manually by users (this action requires write permission for the given metadata set) or can be used by the Smart Classification subsystem, which will be the topic of another post in this series.
- Built-in metadata sets
The built-in metadata sets are predefined by FileCloud and are mostly intended to perform automatic assignment and metadata extraction.. Administrators can only control the visibility (read permission) of those metadata sets. The only exception is the new color tagging metadata set that allows admins to edit a list of available colors.
Automatic extraction is a very powerful concept that enables FileCloud to extract, parse and store metadata directly from the supported file types, for example, exif metadata for images, embedded properties of Office files, etc.
Attributes are the smallest building blocks of FileCloud’s metadata subsystem. They are responsible for storing single pieces of information. As of version 20.3, FileCloud comes with the following types of attributes:
- Text – a simple textual value. (Name)
- Integer – an integer number. (Age)
- Decimal – a floating-point value. (Distance in miles, price)
- Boolean – a true/false value. (Verified, Confidential)
- Date – a date. (Release Date)
- Enumeration – a list of predefined values; the user can select one value from the list. (Sex)
- Array – a type that allows the user to apply multiple values to the given attribute. (Tags list)
Each attribute provides options to mark it as required, disable it, and provide it with a default value
That’s enough theory – now let’s jump into the system to see metadata in action in a basic use case. We’ve created a new Custom set that will allow users to provide information about their music album collections. Below is the definition visible in the Admin Panel
We’ve added various attributes to provide information about the album: Title, Band, Year, Genre, Rating, etc. We’ve assigned read permission for the Audiophiles group and write permission for the user named John Doe. With that in mind, we can jump into action.
The user John Doe has uploaded a shortlist of his favorite music albums to a dedicated team folder location. Now, for each album, he can navigate to the Metadata tab available in the right-side panel and add the Music albums metadata set to that album. Once this is done he can provide all required metadata attribute values. He is able to do that since he was granted write permission to that metadata set.
This is how the metadata form looks in more detail. It is a dynamically constructed component that includes the set of attributes provided in the definition.
Other users from the group can log in to FileCloud in order to view metadata provided by John Doe. They see a read-only view since they were granted read permission only, but with write permission they could override data provided by John Doe, which would enable full collaboration.
Now, we can dive into the core of this post – what can we do with the metadata now?
FileCloud allows users to search for metadata. This is an extremely powerful feature. Imagine that users agreed to recommend an album to listen to next with a ‘listen to’ tag. Here are the search results for that use case:
Now, let’s say that we want to search for all albums from the Grunge genre. Below we can see the results.
And to go even deeper, let’s refine the search and look only for the unplugged albums for that genre.
This becomes extremely useful in cases where your data is scattered across many files or many departments, for example, a project that has marketing files, demos, presentations, research data, etc. Metadata allows organizations to apply horizontal data organization, without the need to store data in a dedicated, per project, directory structure. This is useful in many daily applications.
FileCloud’s Data Leak Prevention subsystem is the topic of another blog post in this series. This is only a sneak peek at its capabilities. Imagine that for some reason Audiophiles don’t want to “leak” albums with ratings less than 5 stars from the system by disabling an option to share those items. This can be easily achieved by adding a dedicated DLP rule. Below is the outcome – the “Unknown Pleasures” album (which is still amazing, to be clear) cannot be shared since it doesn’t have a rating of 5 stars.
On the other hand, the same rule allows us to share Nirvana’s “Nevermind” album since it was rated with 5 stars.
The above examples show, with a rather simplified use case, how to use metadata in FileCloud in an efficient manner. The problem with that approach is that all metadata has to be applied manually to the files prior to the system being able to use them. Since this might be a blocker for bigger organizations, FileCloud provides options to automate the process of metadata extraction for some common file types like Office documents and images by shipping two built-in metadata sets: Image metadata and Microsoft Office Tag metadata
Below is an example of image metadata extraction for a sample image
Another example is the .docx properties extraction from the draft of this blog post.
It is clear that we can use built-in metadata extraction to search for all documents in a given category, with a given set of predefined attributes (for example, image size) and this is only the tip of the iceberg since metadata attribute values can be set by the very powerful Smart Classification system based on the content of the file! But that is a topic for another post