Archive for the ‘Artificial intelligence’ Category

Best Practices for ITAR Compliance in the Cloud

The cloud has become part and parcel of todays Enterprise. However, remaining compliant with the International Traffic in Arms regulation (ITAR) demands extensive data management aptness. Most of the regulatory details covered by ITAR aim to guarantee that an organization’s materials and information regarding military and defense technologies on the US munitions list (USML) is only shared within the US, with US authorized entities. While this may seem like a simple precept, in practice, attaining it can be extremely difficult for most companies. Defense contractors and other organizations that primarily handle ITAR controlled technical data have been unable to collaborate on projects while utilizing cloud computing practices that have a proven track record fostering high performance and productivity. Nevertheless, the hurdles impeding the productivity opportunities of the cloud can be overcome. Practices that govern the processing and storage of export controlled technical data are evolving.

Full ITAR compliance in the cloud is not an end result, but a continual odyssey in protecting information assets. In the long run, being ITAR compliant boils down to having a solid data security strategy and defensive technology execution in place.

Utilize End-to-End Encryption

On September 2016, the DDTC published a rule that established a ‘carve out’ for the transmission of export controlled software and technology within a cloud service infrastructure, necessitating the ‘end-to-end’ encryption of data. The proviso is that the data has to be encrypted before it crosses any boarder, and has to remain encrypted at all times during transmission. Likewise, any technical data potentially accessed by a non-US person outside or within the United States has to be encrypted ‘end-to-end’; which the rule delineates as the provision of continual cryptographic protection of data between the originator and the intended recipient. In a nutshell, the mechanism of decrypting the data can’t be given to a third party before it reaches the recipient.

The native encryption of data at rest offered by most cloud providers fails to meet the definition of end-to-end encryption, because the cloud provider likely has access to both the encryption key and data. The cloud provider inadvertently has the ability to access export controlled information. Organizations have to ensure that DDTC definition of ‘end-to-end’ encryption is met before storing their technical data in a public or private cloud environment. Otherwise they will be in violation of ITAR.

Classify Data Accordingly

Most technologies are not limited to single use. Whenever an organization that handles technical data related to defense articles shares information regarding a service or product; steps have to to be taken to make sure that any ITAR controlled data is carefully purged in its entirety. Classification entails reviewing existing business activities and contracts to establish if they fall under ITAR. The process requires a good understanding of licensing terms, court interpretations, agency directives and other guidance. In order to successfully navigate the nuances and complexities of ITAR, organizations have to collect enough metadata to catalog, separate and classify information. For easy identification, the data should be classified into categories such as ‘Public Use’, ‘Confidential’, and ‘Internal Use Only’. Classifying data is a requisite to creating a full-proof Data Leakage Prevention (DLP) implementation.

Develop a Data Leak Prevention (DLP) Strategy

Accidental leaks owing to user error and other oversights occur more often that most would care to admit. Mistakes that can happen, will happen. Establishing a set of stringent policies to obviate users from mishandling data, whether fortuitously or intentionally is crucial to ITAR compliance. Organizations should have a strategy in place to guarantee the continual flow of data across their supply chains, while protecting said data from the following employee scenarios:
Well meaning insiders – employees who makes an innocent mistake.
Malicious insiders – employees with ill intention
Malicious Outsiders – individuals looking to commit cooperate espionage, hackers, enemy states, and competitors among others.

Control Access to Technical Data

Access control is well known technique that is used to regulate who can view or use the resources in a computing environment. Access control can be employed on a logical or physical level. Physical access control restricts access to physical areas and IT assets. Logical access control allows IT administrators to establish who is accessing information, what information they are accessing and where they are accessing it from. Roles, permissions are security restrictions should be established before hand to ensure that only authorized U.S persons have access to export controlled technical information. Multifactor authentication strengthens access control by making it extremely difficult for unauthorized individuals to access ITAR controlled information by compromising an employees access details.

Establish Security Policies and Train the Staff Well

An ITAR specific security stratagem is the corner stone of data security practices. The policies should handle network and physical security considerations. ITAR is riddled with complications that make it easy for organizations to make mistakes if they don’t remain keen. The organization is as secure as it’s weakest link, in most cases it’s usually the staff. A solid security policy on paper simply does not cut it. Without proper staff training, a compliance strategy will be largely ineffective since it doesn’t tie in with the actual organizational procedures. Investing in end-user training is the only way to ensure security policies are implemented.

In Closing

Organizations have turned to government clouds to manage the complex regulatory issues associated with the cloud. Platforms like AWS Gov Cloud has developed substantial capabilities that enable organizations subject to ITAR to effectuate robust document management and access control solutions. When paired with FileCloud organizations can build and operate document and information management systems that satisfy the strictest security and compliance requirements.


Author : Gabriel Lando

Will Machine Learning Replace Data Scientists?

data science

People have begun to get jumpy at the possibility of Artificial Intelligence being used to automate anything and everything. Now that AI has proven it has the propensity to push out blue-collar jobs (via robotics) and white collar professions (via Natural Language Processing), cultural susceptivity surrounding this technology is on the rise. After decades of exploring symbolic AI methods, the field has transposed towards statistical approaches, that have recently begun working in a vast array of ways, largely due to the wave of data and computing power. This has inadvertently lead to the rise of machine learning.

In todays digital world, machine learning and big data analytics have become staples in business, and are increasingly being incorporated into business strategies by organizations. The ‘data-driven enterprise’ makes all it’s decisions based on the insights they get from collected data. However, as A.I and machine learning continue to develop a larger role in the enterprise, there is a lot of talk about the role of the data scientists becoming antiquated. The advances made in machine learning by industry titans like Microsoft and Google evinces that most of the work currently being handled by data scientists will be automated in the near future. Gartner also recently reported that 40 percent of data science tasks will be automated by 2020.

The difference between Machine Learning and Data Science

Data science is primarily a concept used to tackle big data and is inclusive of data preparation, cleansing and analysis. The rise of big data sparked the rise of data science to support the need for businesses to gain insights from their massive unstructured data sets. While the typical data scientist is envisioned as a programmer experienced in Hadoop, SQL, Python, R and statistics, this is just the tip of the data science iceberg. Essentially, data scientists are tasked with solving real company problems by analyzing them and developing data driven answers, how they do it is irrelevant. The Journal of data science describes it as “almost everything that has something to do with data … yet the most important part is its applications – all sorts of applications”. One of the applications being machine learning.

The rise of big data has also made it possible to train machines with a data driven approach as opposed to a knowledge driven approach. Theoretical research relating to recurring neural networks has become feasible; transitioning deep learning from an academic concept to a tangible, useful class of machine learning that is affecting out every day lives. Machine learning and A.I has now dominated the media, overshadowing every other aspects of data science. So now the prevalent view of a data scientist is a researcher focused on machine learning and A.I. In real sense data science transcends machine learning.

Machine learning is basically a set of algorithms that train on a set of data to fine tune their parameters. Obtaining training data is reliant on multiple data science techniques like supervised clustering and regression. On the other hand, ‘data’ in data science may or may not evolve from a mechanical process or a machine. The main difference between the two is that data science covers the entire spectrum of data processing, not just the statistical or algorithmic aspects.

Human Intuition Cannot Be Automated

Data science is distinguishable from machine learning due to the fact that its goal is especially human focused – to gain insight and understanding. There always has to be a human in the loop. Data scientists utilize a combination of engineering, statistics and human expertise to understand data from a business point of view and provide accurate insights and predictions. While ML algorithms can help identify organizational trends, their role in a data driven process is limited to making predictions about future outcomes. They are not yet fully capable of understanding what specific data means for an enterprise and its relationships, or even the relationships between varying, unconnected operations.

The judgment and critical thinking of a data scientist is indispensable in monitoring the parameters and making sure that the customized needs of a business are met. Once all the questions have been asked, data has been gathered, and ran through necessary algorithms. A discerning data scientist will have to figure out what the larger business implications are and present takeaways to management. Ultimately, the interactive interpersonal conversations driving these initiatives is fueled by abstract, creative thinking that cannot be replaced by any modern-day machine.

Advances in AI Is driving Talent Demand

As the transformational AI wave cuts across end-markets from enterprise to consumer platforms, from robotics to cyber security, the demand for data scientists is only likely to grow. The role of a data scientist will probably assume a new level of importance and evolve in typical computer science fashion. As the machines ability to accurately analyze data increases, with the help from expert statistical modeling and solid algorithms created by data scientists. Data scientists will move up the ‘abstraction scale’ and begin tackling higher level and more complex tasks. The current demand clearly outpaces the supply. McKinsey Global Institute estimates that the United States could have about 250,000 open data science positions by 2024. This data science skill gap is likely to leave companies scrambling to hire candidates who can meet their analytical needs.

In Closing

Will machine learning replace data scientists? The short answer is no, or at least not yet. Certain aspects of low-level data science can and should be automated. However, machine learning is creating a real need for data scientists. As AI advances to analyze and establish cause as well as correlations, software will be used in collect and analyze data; but ML tools don’t yet possess the human curiosity or desire to create and validate experiments. That aspect of data science will probably never be automated any time soon. Human intelligence is crucial to the data science field, despite the fact that machine learning can help, it can’t completely take over.


Author – Gabriel Lando

Imagining a Blockchain + AI Hybrid


This past year has seen the implementation of artificial intelligence (AI) and blockchain, across a varied range of solutions and application within several industries. What is yet to be explored is a fusion of the two. This merger can allow enterprises to create a composable business that has a high service delivery. Blockchain is basically an immutable ledger that is open and decentralized, with strong controls for privacy and data encryption. Smart contracts, trusted computing and proof of work are some of the features that contravene traditional centralized transactions, making blockchain truly transformative.

AI, on the other hand, is a general term for varying subsets of technologies, it revolves around the premise of building machines that can perform tasks requiring some level of human intelligence. Some of the technologies striving to make this a reality include deep learning, artificial neural networks and machine learning. By utilizing AI-based technologies, organizations have been able to accomplish everything from the automation of mundane tasks, to the study of black holes. These technologies are disruptive in their own rights, a framework that harnesses the best of both worlds could be radically transformational.

A Trustworthy AI

As the use of AI continues to become more mainstream in high profile and public facing services, like defense, medical treatment and self-driving cars, multiple concerns have been raised what is going on under the hood. The AI black-box suffers from an explainability problem. If people are willing to place their lives in the hands of AI powered devices and applications, then they naturally want to understand how the technology makes its decisions. Having a clear audit trail not only improves the trustworthiness of the data and the models, but also gives a clear route to trace back the machine learning process.

Additionally, machine learning algorithms rely on information fed into it to help shape the decision making process. A shift from fragmented databases maintained by individual entities to comprehensive databases maintained by consumers can increase the amount of data available to predictive marketing systems and other recommendation engines. Resulting in a measurable improvement of accuracy. Google’s Deep Mind is already deploying blockchains to offer improved protection for user’s health data used in their AI engines, to make their infrastructure mode transparent. Currently, intelligent personal assistants are appealing to consumers, and users will generally sacrifice privacy for the sake of convenience; overlooking what data is being collected from their device, how it is secured, or how it compromises their privacy. An amalgamation of AI and blockchain can reinvent the way information is exchanged. Machine learning can swift through and digest vast amounts of data shared via blockchain’s decentralized structure.

Decentralized Intelligence

The synthesis of blockchain and AI opens the door for the decentralization of authentication, compute power and data. When data is centrally stored, a breach will always be an imminent threat. Blockchain decentralizes user data, thus reducing the number of fraudsters or hackers trying to gain access and take advantage of the systems. Machine learning algorithms are capable of monitoring the system for any behavioral anomalies, becoming more accurate as their intelligence improves. This completely dismantles the inherent vulnerability of centralized databases, forcing cyber attackers to challenge not one, but multiple points of access, which is exponentially more difficult. Blockchain and AI combine to offer a strong shield against cyber attacks. Aside from enhanced attack-defense mechanisms, decentralization translates to higher amounts of data being processed and more efficient AI networks begin built. Imagine a peer-to-peer connection that has been streamlined with natural language processing, image recognition, and multidimensional data transformations in real time.

Access to More Computing Power

An AI-powered blockchain is scalable based on the number of users. AI adds an aspect of computational intelligence that can optimize transaction data in blocks and make the entire process faster. A 2016 report from Deloitte estimated that the cost of validating transactions on a blockchain stood at a whooping $600m annually. A large potion of the cost was generated by specialized computing components which consume a lot of energy while performing mining operations. An AI based blockchain model has the ability to help enterprises set up a low energy consumption model, by allowing specific nodes to initially perform larger tasks and alert miners to halt less crucial transactions. Enterprises will be able to achieve the latency required for performing transactions faster without making any structural changes in their architecture. A machine learning and blockchain combo might also be the key to figuring out how to leverage the worlds idle computing power.

The Ultimate Framework

Both AI and blockchains are technologies that aim to improve the capabilities of the other, while also providing opportunities for better accountability and oversight. AI reinforces blockchains framework, and together they solve several of the challenges that come with securely sharing information over the IoT. Blockchain provides a decentralized ledger of all transactions, while AI offers intelligent analytics and real-time decision-making. This allows users to take back control and ownership of their personal data and open the door for more effective security measures. Datasets that are currently only available to tech giants will be put in the hands of the community, subsequently accelerating AI adoption. Sectors such as telecom, financial services, supply chain intelligence, and retail in general are primed for the adoption of both technologies, with health care following suit.


Author: Gabriel Lando

Photo by Hitesh Choudhary on Unsplash

Predictions for Enterprise AI: 2018 and Beyond

The last couple of years has seen Artificial Intelligence (AI) become a house hold topic with many being curious about how far the technology can go. Consumer products like Amazon Alexa and Google Home have long used machine learning as a selling point; However, AI applications in the enterprise remain limited to narrow machine learning tasks. With progressive improvements in the convergence of of hardware and algorithms happening on a daily basis, AI is likely to have a larger impact on business and industry in the coming years.

A multi-sector research study conducted by Cowen and Company revealed that 81 percent of IT decision makers are already investing in, or planning to invest in AI. Furthermore, CIOs are currently integrating AI into their tech stacks with 43 percent reporting that they are in the evaluation phase, while an additional 38 percent have already implemented AI and plan to invest more. Research firm McKinsey estimates that large tech companies spent close to $30 billion on AI in 2016 alone. IDC predicts that AI will grow to become a $47 billion behemoth by 2020, with a compound annual growth rate of 55 percent. With market forecasts predicting explosive growth for the artificial intelligence market, it is quite clear that the future of the enterprise will be defined by artificial intelligence. Below are the top 10 predictions for AI in the enterprise.

Search Will Become More Intelligent

According to a Forrester report, 54 percent of global information workers are interrupted a couple of times a month to spend time looking for insights, information and answers. There are now more file formats and types than ever before, and the bulk of it is unstructured data, making it hard for traditional CRM platforms to recognize. AI powered cognitive search returns more relevant results to users by analyzing search behavior, the content they read, the pages they visited, or files they downloaded – to establish the searchers intent or the context of the search query. The machine learning algorithm’s ability to self-learn improves search relevance over time and subsequently the user experience.

Hackers Will Get More Crafty

ML is especially suited for cyber security since hard-coding rules to detect whenever a hacker is trying to get into your system is quite challenging. However, AI is a double edged sword when in comes to data security. The more AI advances, the more its potential for attacks grow. Neural networks and deep learning techniques enable computers to identify and interpret patterns, and they can also find and exploit vulnerabilities. 2018 will likely see the rise of intelligent Ransomware or malware that learns as it spreads. As a result, data security concerns will speed up the acceptance of AI forcing companies to adopt it as a cyber security measure. A recent survey conducted by PWC indicates that 27 percent of executives say that their organization plans to invest in cyber security safe guards that use machine learning.

AI Will Redefine How Data is Approached

The general feeling over the last few years has been that data is the lifeblood of any organization. A recently concluded study by Oxford Economics and SAP revealed that 94 percent of business tech decision makers are investing in Big Data and analytics, driving more access to real-time data. Through out 2018 and beyond, data will remain a priority as companies aim to digitally transform their processes and turn insights into actions for real-time results. Additionally, machine learning will also play a huge role as companies aim to meet regulatory requirements such as the GDPR. Individuals will be empowered to demand that their personal data be legally recognized as their IP. If or when this happens, both parties will turn to AI to provide answers as to how the data should be used.

AI Will Change the Way We Work

Whenever artificial intelligence and jobs are mentioned in the same breath; one side views it as the destroyer of jobs while the other sees it as the liberator of menial tasks in the workplace. A 2013 working paper from the University of Oxford suggests that half of all jobs in the US economy can be rendered obsolete by ‘computerization’. Others argue that intelligent machines will give rise to new jobs. According to a Gartner report, by 2019 more than 10% of hires in customer service will mostly be writing scripts for chatbot interactions. The same report also predicts that by 2020, 20 percent of all organizations will dedicate employees to guide and monitor neural networks. What’s certain is that AI will elevate the enterprise by completely transforming the way we work, collaborate and secure data.

The AI Talent Race Will Intensify

A major challenge in the AI space has been finding talent. You require people on your team with the necessary ability to train AI based systems. Organizations with access to substantial R&D dollars are still trying to fill their ranks with qualified candidates who can take on ‘industry-disrupting’ projects. In 2018, as some businesses look to re-skill their existing workforce to achieve broader machine learning literacy; larger organizations may look to add data science and AI related officers in or close to the C-suite. These senior-level decision makers will be responsible for guiding how machine learning and AI can be integrated into the company’s existing strategy and products. Others will consider hiring practitioners in algorithms, math and AI techniques to offer input.

The Citizen Data Scientist Will Settle In

As AI based tools become more user friendly, users will no longer have to understand how to write code in order to work with them. Gartner defines a citizen data scientist as an individual who generates or creates models that utilize predictive capabilities or advanced diagnostic capabilities, but whose primary role falls outside the fields of analytics and statistics. As stated by the IDC 2018 IT industry predictions, over 75 percent of commercial enterprise applications will utilize AI in some form, by 2019. As business use cases for AI become more mainstream, the need for functional expertise across the organization will be important; to the point that the skill sets AI specialists typically lack will be required. As AI is integrated into every facet of the enterprise, citizen data scientists may end up being more important than computer scientists.

An AI Blockchain Power House Will Emerge

AI and blockchain are ground-breaking technological trends in their own rights; when combined, they have the potential to become even more revolutionary. Both serve to improve the capabilities of the other, while also providing opportunities for enhanced accountability and oversight. In the coming year, we can expect to see blockchain combined with AI to create a new level of deep learning that learns faster than previously imagined. The immutable nature of data stored on a blockchain could enhance the accuracy of AI predictions. Sectors such as the telecom, financial services and retail are among the key industries that are best suited for the adoption of these technologies.

Consumer AI Will Drive Enterprise Adoption

AI already plays a key role in shaping consumer experience. Chatbots have become one of the most recognizable forms of AI, with 80 percent of marketing leaders citing the use of chatbots to enhance customer experience. Despite the fact that the market for enterprise AI has recorded substantial growth, they still require more complex solutions. Some consumer products have already made their way into the enterprise, a good example being voice-activated digital assistants. With Amazon’s recent announcement of Alexa for Business, we can expect employees to start relying on smart assistants to manage their calendars, make calls, schedule reminders, and run to-do lists without lifting a finger.

MLaaS Will Rise

Now that machine learning has proven its value, as the technology matures, more businesses will turn to the cloud for Machine Learning as a Service. The adoption of MLaaS will increase starting in private clouds within large organizations and in multi-tenant public cloud environments for medium sized enterprises. This will enable a wider range of enterprises to take advantage of machine learning without heavily investing in additional hardware or training their own algorithms.

Machine Learning Will Have a Mainstream Appeal

Every forward-thinking innovative organization currently has an initiative or project around digital transformation, with AI usually being the focus. AI represents a significant change in the way enterprises do business. Predictive algorithms, translators and chatbots have already become mainstream and multiple businesses across the globe are utilizing them to boost profitability by reducing costs and understanding their customers better. Expect an even higher level of personalization to become ubiquitous and enhance customer experience everywhere.


Author: Gabriel Lando

Photo by Franck V. on Unsplash

A.I Meets B.I : The New Age of Business Analytics

The dawn of the digital age was marked by the monumental shift in the way information is processed and analyzed. The widespread use of the internet further resulted in the increased production of data in the form of text sharing, videos, photos and internet log records, which is where big data (large data sets) emanated from. Data is now deeply embedded in the fabric of society. Over the last couple of years, the world has been introduced to Artificial Intelligence in the form of mobile assistants (Siri, Alexa, Google assistant), smart devices, self-driving cars, robotic manufacturing and even selfie drones. The ubiquitous availability of open source Machine learning (ML) and AI frameworks and their cognate automation simplicity is redefining how digital information is being processed. AI has already begun impacting how we live, work and play in profound ways.

With the realization that big data on its own is not enough to provide valuable insights. Businesses are now turning to Machine learning to uncover the hidden potentials of big data by supercharging performance and implementing innovative solutions to complex business problems. Judging by the massive rise in popularity for venture investment in recent years, It’s no secret that AI and ML are conceivably the most instrumental technologies to have gained momentum in recent times. Here are some ways by which coupling A.I with big data has helped improve business intelligence.

Automated Classification is the First Step Towards Big Data Analytics

Content classification is fundamentally used to predict the grouping or category that a new or incoming data object belongs to. Data streams are continually becoming more and more complex and varied. Simply structuring, organizing, and preparing the data for analysis can take up a lot of time and resources. Data classification challenges at this scale are relatively new. The more data a business has, the more strenuous it is to analyze; however, on the other side of the spectrum, the more data the business has, the more precise its predictions will be. Whether this data is in the form of technical documents, emails, user reviews, customer support tickets or even news articles. Finding that balance is crucial. Doing it manually is implausible because it will not scale and in some cases may lead to privacy violations.

Machine learning and big data analytics are a match made in heaven given the needs for operating on anonymized datasets and data analysis. With an artificially intelligent tool, data classification can be used to predict new data elements on the basis of groupings found via a data clustering process. Multi-label classification captures virtually everything and is handy for image and audio categorization, customer segmentation and text analysis. In an instant, the content is classified, analyzed, profiled and the appropriate policies required to keep data safe is applied.

A.I Marks an End to Intuition-Based Decision Making

The analytics maturity model is used to represent the stages of data analysis within a company. Analytics maturity traditionally starts with an intent to transform raw data into operational reporting insight to lessen intuition-based decision making. With mounds of data at your disposal, the assumption is that more decisions will be rooted in data analysis than on instinct. But, that is often not the case. Countless Excel models, PhDs, and MBAs have taken number crunching to an entirely new level and yet data analysis is becoming increasingly more complex. Data-driven decision tools often require manual development processes to aggregate sums, averages, and counts. In many instances, the findings lack a holistic reflection and don’t generally put statistical significance into consideration.

An AI and ML driven model facilitates automatic learning without any prior explicit programming. This basically means that they have the ability to efficiently analyze enormous volumes of data that may contain too many variables for traditional statistical analysis techniques or manual business intelligence. All the answers are in the data; you just have to apply AI to get them out. A machine learning algorithm automatically discovers the signal in the noise. Hidden patterns and trends in the data that a human mind would be unable to detect are easily identified. Additionally, the AI acquires skill as it finds regularities and structure in the data; becoming a predictor or classifier. The same way an algorithm can teach itself to play Go, it can teach itself what product to push next. And the best part about it is that the model adapts each time new data is introduced.

Accurate Predictive Analysis

Naturally, businesses are more interested in outcomes and action as opposed to data visualization, interpreting reports, and dashboards. The good news is that ‘forecasting’ doesn’t require crystal balls and tea leaves. After gaining insight into historical data, machine learning answers the question ‘what next?’. ML can be utilized to develop generalizations and go beyond knowing what has happened, to offering the best evaluation of what will occur in the future. Classification algorithms typically form the foundation for such predictions. They are trained by running specific sets of historical data through a classifier. The machine learning model learns behavior patterns from the data and determines how likely it is for an individual or a group of people to perform specific actions. This facilitates the anticipation of events to make futuristic decisions.

The Foundation for Risk Analysis

By powering high-performance behavioral analytics, machine learning has taken anomaly detection to greater heights; making it possible to examine multiple actions on a network in real-time. The self-learning abilities of AI algorithms allow them to offer an investigative context to risky behaviors, advanced persistent threats, and zero-day vulnerabilities. A good use case is in fraud detection – AI algorithms can adapt to varying claim patterns, learn from new unseen cases, and evaluate the legitimacy of a claim. Additionally, ML and AI algorithms can aid enterprises to conform to strict regulatory oversight by ensuring all regulations, policies, and security measures are being followed. By pinpointing the outliers in real time, AI gives businesses an opportunity to take immediate action and mitigate risks.

In Closing

In a data-driven world, machine learning will be a key differentiator. As business processes become reliant on digital information, organizations have to adopt next-gen automation technologies to not only survive but thrive. The beauty of combining Business Intelligence (BI) and Artificial Intelligence (AI) lies in the fact that business insights can be discovered at incredible speed. From detecting fraud attempts and cyber breaches to monitoring user behavior to establish patterns and predict customer actions, the purview to boost performance and streamline processes is prodigious. Nonetheless, machine learning tools are only as good as the data used to train them.

image courtesy of freepik


Author: Gabriel Lando

Best Practices to Get The Most Out of Machine Learning

There is currently a lot of excitement surrounding artificial intelligence (AI), machine learning (ML), and natural language processing (NLP). Despite the fact that these technologies have existed for decades, new algorithmic developments coupled with advancements in compute power have made these technologies more attractive to enterprises. Organizations are adopting advanced analytics technologies in order to better understand customer behaviors, improve operational efficiencies and gain a competitive advantage. Businesses want more accurate insights and the ability to quickly respond to change. Machine learning allows them to build systems that are capable of learning from data to recognize patterns and predict future outcomes with minimal human intervention. How can enterprises get the most accurate results using machine learning on issue after issue? How can organizations empower business analysts to utilize the predictive powers of machine learning? What do these businesses need to know? This post explores some of the best practices enterprises can employ to get the most out of machine learning.

1. Use machine learning in lieu of a complex heuristic

Machine learning is a powerful tool that enables organizations to gain insights into several kinds of behaviors. It is utilized in vertical and horizontal applications to help enterprises become more proactive. If it is possible to structure a set of ‘if-then scenarios’ or rules to handle the problem in its entirety, then the need for machine learning is invalidated. Additionally, if there is no precedent for a successful outcome after applying machine learning to a specific issue, its may not be the best foray into the world of machine learning. Focusing objectives on specific use cases that will have a meaningful impact for the business is a key to success. Once you have the data and a solid idea of what you are trying to accomplish, move on to machine learning. Bluntly put, if you have a $100 million problem, spending $20 million is not a big deal.

2. Garbage in, garbage out

“Clean data is better than big data” is a phrase that is regularly echoed amongst data science professionals. Some people assume that a large volume of data for machine learning negates any data quality concerns. I you have mounds of disjointed and unstructured data; you will have to ‘clean’ it before you can gain any insights from it. Good quality data is crucial for models that are in production; otherwise the ML models will deteriorate quickly. The effects of poor data quality are not limited to the degradation of machine learning algorithms, they also affect reporting, decision making, and operational efficiencies. If limited data is available, enterprises should first start by applying supervised machine learning, and use existing labelled training data to begin finding insights.

3. Build the right training data set

It is not uncommon for enterprises to initially make mistakes when building out their training data. Training data sets need multiple example predictor variables to predict or classify a response. In machine learning, the predictor variables are referred to as features while the responses are referred to as labels. The best way to go about it is working backward from the solution, explicitly defining the problem and mapping out the data required to populate the models. Capturing temporal variations in your data training set is crucial. The data can be biased by the model; ideally, you should introduce an aspect of exploration or randomness so as to get less biased samples.

4. A data strategy goes a long way

AI-based tools need help to unlock the valuable information lurking in the data generated by your systems. A comprehensive data strategy that focuses on data availability, acquisition, labelling, and the technology needed to pull data from disparate systems, is a good place to start. Data quality and data governance are like two sides of a coin; one is not plausible without the other. This means that a data governance process that is inclusive of practices and policies has to be put in place. Existing governance practices may have to be revamped or expanded as well, but this should be a joint effort between the business and IT.

5. Continuous monitoring and optimization

Machine learning models have to be continuously monitored and updated to prevent degradation over time. Depending on the business issue at hand, a model may have to be frequently updated. Keeping track of the model also helps maintain institutional knowledge. Organizations should consider solutions that centrally monitor, analyze, configure and execute tasks like replication across multiple endpoints. This facilitates capacity planning, performance management, and troubleshooting. A consolidated command center ensures that data remains available, and ready for machine learning analytics.

6. Leave room for error

Machine learning models require both data and time to adapt, grow, and be informed by experience. This is why an ML solution will typically be incorrect a certain percentage of time, particularly when its being informed by varied or new stimuli. Building a solid machine learning solution requires time to carefully think and test out selecting data, labelling data, selecting algorithms and testing in a production environment. There are no ‘’off-the-shelf” ML solutions for complex and unique business use cases. If you task has absolutely no room for error, then machine learning is not the best solution for the job.

The Role of A.I In Regulatory Compliance

In a data-driven world, organizations possess vast amounts of data, since data is produced at an accelerated speed; as files are created, saved and shared, leading to the creation of terabytes or even petabytes of data. The challenge becomes apparent when its time to pinpoint sensitive data in millions of files, in unstructured and structured data; which in most cases is an impossible endeavor. With regulations enforcing strict data processing practices on companies that handle personal data, the demand for innovative monitoring and auditing tools, as well as sedulous security assessments, has never been higher. It also creates a need for the modification of existing data classification models and algorithms, in order to comply with the introduced data security standards.

There is, however, a light at the end of the regulatory tunnel. By utilizing robust data governance solutions that incorporate machine learning and the analytical powers of data science, organizations can get a deeper understanding of the information they possess and aid in easing compliance. With the insights obtained from these solutions in hand and regulatory compliance at bay, organizations can start making data-driven decisions that will elevate their business to the next level.

Classification Puts the Focus Where It Matters

Classification is a critical step towards securing data because it keeps the focus on the data that matters the most. Classification algorithms come in handy when the desired output is a distinct label. Not only does it aid in meeting regulatory requirements and overall governance, but also significantly simplifies the process for internal stakeholders to search for and retrieve data. Computing power has become available to train bigger and more complex models much quicker. Graphics Processing Units (GPUs) that were initially designed to render video game graphics have now been repurposed to effectuate the data crunching needed for machine learning. This compute capacity has further been aggregated into hyper-scalable data centers that can be accessed via the cloud. Machine learning algorithms consume enormous amounts of data and support superlative complexity and variability in the data. More importantly, they are more adaptable to changing data points and parameters.

Leveraging the powers of machine learning (ML) to classify content makes it possible to easily identify similar data sets and group them together for faster retrieval and search. When ML is coupled with natural language processing (NLP), data inputs like metadata, documents, instant messaging, emails, or even spoken word can be accurately interpreted. This can be taken a step further by developing an algorithm that matches specific regulatory requirements, thus simplifying the process entirely. A robust, well trained, AI-based classification engine is posed to locate Personally Identifiable Information (PII) across an organization’s entire data landscape and trigger actions that help enterprises delete or retain data – both challenging yet critical aspects of the GDPR. Regulated data typically comprises structured data with a consistent pattern. Training a machine learning algorithm with patterns for recognition of medical records, social security, and credit card numbers, and other forms of PII together with policies for HIPAA, GDPR and other regulations from around the globe accelerates compliance readiness. Add in features like automatic identification of risky keywords and the end result is enriched classified content that can be searched with precision and ease.

Manual processes are typically inconsistent, arduous, and unenforceable. However, by utilizing modern AI based classification technologies, organizations can make sensitive data easier to locate, and redundant data easier to delete.

AI is The Key to Efficient Compliance Teams

Simply keeping an eye out for fraudulent practices no longer cuts it. Organizations have to monitor communications for ‘intent’ as well. This can only be accomplished by obtaining additional context relating to monitored users and their respective activities – which may include behavioral anomalies or other fluctuations in communications data. The increasing complexities in regulatory requirements are driving the expectations for such detailed analysis. The current approach is to throw more bodies at the issue, but this is expensive and simply not scalable – especially as the enterprise grows and expands into new regions, which introduces more regulatory pressure. Artificial intelligence powered by big data and machine learning has the ability to wholly revolutionize the compliance industry. Applying AI and automation activities foster productivity growth and other benefits for compliance teams.

Machine learning technologies can only enhance the compliance process if they fit into the organization’s current workflow. AI can’t operate in a silo in the compliance world (at least not yet). However, ML enhances the compliance team’s view of monitored users, help identify compliance related communications, and if properly implemented, facilitate analyst decision making. A compliance team benefits in the following ways.

1. Significantly lowered costs – compliance in itself is a costly venture. ML will result in large cost reductions as a result of more accurate analysis of big data.
2. Coherent regulatory compliance – the coherence comes as a direct result of real-time risk detection abilities as well as the digitization and automation of manual compliance and reporting processes.
3. Lower risk of fraud – with an exhaustive focus on monitoring all communication channels, including speech, which has been especially vulnerable to fraudulent activity, instances of fraud are easily identified and minimized.
4. Information de-duplication – clusters of similar or duplicate content are easily identified, thus cutting through the noise and reducing the amount of content that needs review.
5. Unlocking the value in content – compliance professionals can rely on machine learning to help them make sense of the large amount of data they encounter on a daily basis, including updates and regulatory changes.

In Closing

Regulatory requirements such as the GDPR will challenge an organizations’ existing data governance cultures and processes. However, with a sound approach to data management that is fueled by machine learning and data science, this anxiety-filled, time-consuming procedure can not only lead to more feasible compliance but analytics insights that foster strategic decision-making.


Author: Gabriel Lando

Top 5 Limitations of Machine Learning in an Enterprise Setting

Astounding technological breakthroughs in the field of Artificial Intelligence (AI) and its sub-field Machine Learning (ML) have been made in the last couple of years. Machines can now be trained to behave like humans enabling them to mimic complex cognitive functions like informed decision-making, deductive reasoning, and inferences. Robots behaving like humans is no longer science fiction, but a reality in multiple industry practices today. As a matter of fact, human society is gradually becoming more reliant on smart machines to solve day to day challenges and make decisions. A good example of a simple use case for machine learning that has completely permeated our day-to-day lives is spam filters, which intrinsically determine whether a message is junk based on how closely it matches emails with a similar tag.

“A.I … is more profound than … electricity or fire”
– Sundar Pichai

However, these basic applications have evolved into ‘deep learning’ enabling software to complete complex tasks with significant implications for the way business is conducted. In all the hype surrounding these game-changing technologies, the reality that often times gets lost amidst both the fears and the headline victories like Cortana, Alexa, Google Duplex, Waymo, and AlphaGo, is that AI technologies have several limitations that will still need a substantial amount of effort to overcome. This post explores some of those limitations.

i. Machine Learning Algorithms Require Massive Stores of Training Data

AI systems are ‘trained’, not programmed. This means that they require enormous amounts of data to perform complex tasks at the level of humans. Despite the fact that data is being created at an accelerated pace and the robust computing power needed to efficiently process it is available; massive data sets are not simple to create or obtain for most business use cases. Deep learning utilizes an algorithm called backpropagation that adjusts the weights between nodes, to ensure an input translates to the right output. Supervised learning occurs when neural nets are trained to recognize photographs, for example, using millions or billions of previous labeled examples. And every slight variation in an assigned task calls for another large data set to conduct additional training. The major limitation is that neural networks simply require too much ‘brute force’ to function at a level similar to human intellect.

This limitation can be overcome by coupling deep learning with ‘unsupervised’ learning techniques that don’t heavily rely on labeled training data. For example, deep reinforcement learning models ideally learn via trial and error as opposed to via example. The model is optimized over multiple steps by penalizing unfavorable steps and incentivizing effective steps.

ii. Labeling Training Data Is a Tedious Process

Supervised machine learning using deep neural networks forms the basis for AI. Labeling is a requisite stage of data processing in supervised learning. This model training style utilizes predefined target attributes from historical data. Data labeling is simply the process of cleaning up raw data and organizing it for cognitive systems (machines) to ingest. Deep learning requires lots of labeled data, and while labeling is not rocket science, it is still a complex task to complete. If unlabeled data is fed into the AI, it is not going to get smart over time. An algorithm can only develop the ability to make decisions, perceive, and behave in a way that is consistent with the environment within which it is required to navigate in the future if a human mapped target attributes for it.

To establish what is in the data, a time-consuming process of manually spotting and labeling items is required. However, promising new techniques are coming up, like in-stream supervision, where data is labeled during natural usage. App designers can accomplish this by ‘sneaking in’ features in the design that inherently grow training data. High-quality data collection from users can be used to enhance machine learning over time.

iii. Machines Cannot Explain Themselves

Researchers at MIT hypothesize that the human brain has an intuitive physics engine. This basically means that the information we are able to collect via our sense is noisy and imprecise; however, we make conclusions about what we think will likely happen. For decades, common sense has been the most difficult challenge in the field of Artificial Intelligence. A large majority of AI-based models currently deployed is based on statistical machine learning that relies on tons of training data to build a statistical model. This is the main reason why adoption of some AI tools is still low in areas where explainability is crucial. A good example is in regulations such as GDPR, which requires a ‘right to explanation’.

Whether the decision is good or bad, having visibility into how/ why it was made is crucial, so that the human expectation can be brought in line with how the algorithm actually behaves. There are techniques that can be used to interpret complicated machine learning models like neural networks. A nascent approach is Local Interpretable Model-Agnostic Explanations (LIME), which attempts to pinpoint the parts of input data a trained ML model depends on most to create predictions, by feeding inputs similar to the initial ones and observing how these predictions vary.

iv. There is Bias in the Data

As AI and machine learning algorithms are deployed, there will likely be more instances in which potential bias finds its way into algorithms and data sets. In some instances, models that are seemingly performing well maybe actually picking up noise in the data. As much as transparency is important, unbiased decision making builds trust. The infallibility of an AI solution is based on the quality of its inputs. For example, facial recognition has had a large impact on social media, human resources, law-enforcement and other applications. But biases in the data sets provided by facial recognition applications can lead to inexact outcomes. If the training data is not neutral the outcomes will inherently amplify the discrimination and bias that lies in the data set. The most ideal way to mitigate such risks is by collecting data from multiple random sources. A heterogeneous dataset limits the exposure to bias and results in higher quality ML solutions.

v. A.I Algorithms Don’t Collaborate

Despite the multiple breakthroughs in deep learning and neural networks, AI models still lack the ability to generalize conditions that vary from the ones they encountered in training. AI models have difficulty transferring their experiences from one set of circumstances to the other. This means that anything a model has achieved for a specific use case will only be applicable to that use case. As a result, organizations are forced to continuously commit resources to train other models, even when the use cases are relatively similar. A solution to this scenario comes in the form of transfer learning. Knowledge obtained from one task can be used in situations where little labeled data is available. As this and other generalized approaches mature, organizations will have the ability to build new applications more rapidly.



Author: Gabriel Lando

Can Artificial Intelligence (AI) Enrich Content Collaboration? Or Is It Just a Lipstick?

Artificial Intelligence (AI) enrich Content Collaboration

Is Artificial Intelligence (AI) the new lipstick? Sure, it is being put on many pigs. Can artificial intelligence improve Enterprise File Sharing and Sync (EFSS), Enterprise Content Management (ECM) and Collaboration?  We want to explore if we could find some obvious collaboration use cases that can be improved using machine learning. In this article, we will not venture into AI techniques or impact of AI or evolution of AI. We are interested in exploring how EFSS benefits from “machine learning” – a technique that allows systems to learn by digesting data. The key is ‘learning’ – a system that can learn and evolve vs. explicitly programmed.  Machine learning is not a new technology; many applications, such as search engines (Google, Bing), already use machine learning.

In the past year, many large players, such as Google, Amazon, and Microsoft, have started offering AI tools and infrastructure as service. With many of the basic building blocks available, developers can focus on building right models and applications. Here are a few scenarios in Enterprise File Sharing and Sync (EFSS), and Enterprise Content Collaboration, where we can apply machine learning soon.


Search is a significant part of our everyday life. Google and Amazon have made the search box the center of navigation. For instance, a decade ago, the top half of the Amazon homepage was filled with links, which is now replaced by a search box at the top.  However, search hasn’t taken a significant role in enterprise collaboration, yet. Every day, we search for files that don’t fit in a simple search criteria. Think of search that goes ‘looking for a design proposal from a vendor x I received six months back.’ Today, we manually sort through files to find an image that satisfies the above search criteria.  We could use a simple query processing,  a crawler, and a sophisticated ranker to surface file search results, based on estimated relevance. Such a search feature can continue to learn and improve to provide better results each time. Already, we have many such machine learning algorithms and techniques available to index files, identify relevance, and rank search results per relevance. Hence, applying to enterprise scenarios requires a focused effort from the solution providers.

Predict and organize relevant content

A technique in machine learning, called unsupervised learning, involves building a model by supplying it with many examples, but without telling it what to look for. The model learns to recognize logical groups (cluster), based on certain unspecified factors, revealing patterns within a data set.  Imagine your files are automatically organized, based on the projects you are working on. Any file will have a set of related files just one click away. Won’t such a feature have a significant productivity boost?


Collaboration across different languages will be simplified with many advanced translation tools available today. Google Cloud Translation API provides a straightforward API to translate a string from and to many languages. Translation of user comments and meta data, such as tags, image information, can be very useful for any large organization that involves working with partners and vendors across the globe. With translation combined with machine learning, translation within an enterprise can improve by learning domain knowledge (medical, law, technology etc.) and internal jargon. Systems can extract right meta data, apply domain knowledge, and translate them for employees, partners, and customers, so they easily communicate and collaborate.

User Interface

Interaction with EFSS applications need not be just clicks and texts.  Users can have more engaging user experiences that include conversational interactions, e.g., users could just say “open the sales report that I shared with my manager last week.” Personal assistants, such Siri, Cortana, and Alexa, already provide such conversational interfaces for many personal and home scenarios. Though it sounds complex, some of the technology, such as automatic speech recognition for converting speech to text and natural language understanding, are available in Amazon APIs. Converting the conversation into an actual query might not be as complex as it sounds.

Security and Risk Assessment

Machine learning has an excellent application in monitoring network traffic patterns to spot abnormal activities that might be caused by cyber-attack or malicious activities. Solutions like FileCloud use some of these techniques to identify ransomware and highlight potential threats. Similar techniques can identify compliance risks to analyze if any documents being shared have any personal identifiable information (credit card) PII or personal health information PHI. Systems can predict and warn security risks before the breach happens.

These ideas are just a linear extrapolation of the near future. Even these simple linear extrapolations look promising and interesting. Many predict that, within a few years, almost every device and service will have intelligence embedded in them. In future, the concept of file and folders might be replaced by some other form of data abstraction. As AI and collaboration continue to evolve, resulting applications evolve exponentially better than our linear extrapolations, and our current thoughts could appear naïve. Hope it doesn’t evolve, as Musk puts it, “with artificial intelligence, we’re summoning the demon.”

The Intelligent Cloud : Artificial intelligence (AI) Meets Cloud Computing

intelligent cloud

If you thought that mobile communications and the Internet have drastically changed the world, just wait. Coming years will prove to be even more disruptive and mind-blowing.  Over the last few years, cloud computing has been lauded as the next big disruption in technology and true to the fact it has become a mainstream element of modern software solutions just as common as databases or websites; but is there a next phase for cloud computing? is it an intelligent cloud?

Artificial intelligence (AI) is the type of technology with the capacity to not only enhance current cloud platform incumbents but also power an entirely new generation of cloud computing technologies. AI is moving beyond simple chat applications like scheduling support and customer service, to impact the enterprise in more profound ways; as automation and intelligent systems further develop to serve the purpose of critical enterprise functions. AI is bound to become ubiquitous in every industry where decision-making is being fundamentally transformed by ‘Thinking Machines’. The need for smarter and faster decision making and the management of big data is the driving factor behind the trend.

Remember Moore’s Law? In 1965, Intel’s co-founder, Gordon Moore observed that the transistors per square inch on integrated circuits had doubled in number each year since their invention. For the next 50 years, Moore’s Law was maintained. In the process, multiple sectors like robotics and biotechnology saw remarkable innovation because machines that ran on computers and computing power all became faster and smaller with time as the transistors on the integrated circuits became more efficient. Now, something even more extraordinary is happening. Accelerating technologies such as big data and artificial intelligence are converging to trigger the next major wave of change. This ‘digital transformation’ will reshape every aspect of the enterprise, including cloud computing.

Artificial intelligence (AI) is expected to burgeon in the enterprise in 2017. Several IT players, including today’s top IT companies, have heavily invested in the space with plans to increase efforts in the foreseeable future.

Despite the fact that AI has been around since the 60’s, advances in networking and graphic processing units, along with demand for big data, have put it back at the forefront of several companies’ minds and strategies. Given the recent explosion of data from Internet of Thing (IoT) and applications, and the necessity for quicker, real-time decision making, AI is well on its way to becoming a key differentiator and requirement for major cloud providers.

AI-First Enterprises

In a market that has for the longest time been dominated by four major companies – IBM, Amazon, Microsoft, and Google –an AI first approach has the potential to disrupt the current dynamic.

“I think we will evolve in computing from a mobile-first to an AI-first world.”

-Sundar Pichai, Chief executive of Google

The consumer world is not new to AI-based systems; products like Siri, Cortana and Alexa have been making our lives easier for a while now. However, the enterprise applications for AI are completely different. An AI first enterprise approach should be designed to allow business leaders and data professionals to organize, collect, secure and govern data efficiently so they can gain the insights they require to become a cognitive business. In order to maintain a competitive advantage, businesses today have to get insights from data; however, acquiring those insights is complex and requires work from skilled data scientists. The ability to predict strategic and tactical purposes has evaded enterprises due to prohibitive resource requirements.

Cloud computing solves the two largest hurdles for AI in the enterprise; abundant, low cost computing and a means to leverage large volumes of data.


Today, this new breed of Platform as a Service (AIaaS) can be applied on all the data that enterprises have been collecting. Major cloud providers are making AI more accessible “as-a-service” via open source platforms. For enterprises with an array of complex issues to solve, the need for disparate platforms working together can’t be ignored. This is why making machine learning and other variations of AI applications and technology available via open source is critical to the enterprise. By leveraging AI-as-a-service, businesses can innovate solutions that solve infinite problems.

As machine learning becomes more popular as a service, organizations will have to decide the level at which they want to be involved. While the power of cognitive intelligence is undeniably high, wanting to use it and being able to use it are two completely different things. For this reason, most companies will opt to use a PaaS vendor to manage their entire cycle of data intelligence as opposed to an in-house attempt, allowing them to focus on powering and developing their applications. When looking for an AI provider, you have to ask the right questions. The ideal vendor should be in a position to elucidate both how they handle data and how they intend to solve your specific enterprise problem.

There are multiple digital trends that have the potential to be disruptive; the only way to guarantee smarter business processes, more agility, and increased productivity is by planning ahead for the change and impact that is coming. The main differentiating factor between competing vendors in this space will be how the technology is applied to improve business processes and strategies.

Author: Gabriel Lando

Image Courtesy: