The purpose of this report is to discuss the both the advantages and disadvantages of using Business Intelligence within a business. As well as to discuss the potential algorithms which could be used to achieve datamining which will allow for discovery of information who may be existing or potential future customers. By the end of this report I aim to make it clear the advantages of incorporating these tools and techniques within the business, and the benefits that will be seen.
Business Intelligence (Business Intelligence , 2007) is a collection various technologies and tools which are used for collecting, organizing and analysing data and information, and then providing the user with the information in a form which will help them with making business decisions. There are 3 major parts to business intelligence Reporting, Integration and Analysis. Reporting is essentially the creation and use of reports, while integration is about taking data from a source and being able to modify it to fit another purpose and data source. Finally, Analysis is the producing and organizing structures that have been filled with data taken from a separate source, commonly tools such as OLAP (OLAP, n.d.) (Online Analytical Processing) are used in order to achieve this. This process if often referred to as Data Mining.
Using Business Intelligence has numerous advantages and is something that every company should consider using. One of its most obvious advantages is that it can help show trends and correlation in statistics (E.G user activity, sales, and complaints) and this can then be used by businesses in order to improve. Another considerable advantage of using Business Intelligence is also the reliability of the presented information and allows for relatively accurate prediction which greatly improves planning.
Although it should be noted that there are some disadvantages to using Business Intelligence, this is that the historical data that is recorded needs to be stored somewhere, and this takes up more memory, which not only means more cost in storage, but also a speed reduction as there will be a huge amount data to be analysed. Another notable disadvantage is the potentially high initial cost, as well as maintenance cost, and although these costs should pay for themselves with improved decision making there is a possibility of the investment not paying off. There are not many disadvantages to using business intelligence, but they should still be taken into consideration. (Disadvantages of Business Intelligence, n.d.)
A good example of business intelligence being used by other recognisable companies is that Netflix (Business Intelligence, 2015), the online media streaming service, using this system of business intelligence to work out which shows will be popular, and which of their categories may need a little reworking. This gives them the information they need to stay ahead of the curve and to make sure the shows that remain on the site are popular.
With computers being used more and more within businesses, the information that the business needs to function is also stored on these computers (E.G Sales records, customer information etc.) so the ability to scan and analyse these massive amounts of information is incredibly beneficial to not only making business based decisions, but to predict sales trends or areas in need of improvement. There are a wide range of different Data Mining algorithms available to use, the ones discussed here will be the Decision tree, Bayesian Classification as well as K-Means. I have chosen to discuss and compare these 3 as they are quite different in how they operate.
A One of the most commonly used Data Mining algorithms is the Decision Tree (Decision Tree Algorithm, n.d.), at the top of the decision tree we have a Root, which is essentially a check on an attribute, and from there the answers to the check make the branches. The leaves of the tree are in fact formed from each class label. The advantages of using this algorithm compared to the others is that in order to function it requires to prior knowledge of the domain, the other huge advantage which makes an attractive solution is that it is also very easy to follow and understand compared to more complex algorithms. The complexity for this algorithm can be worked out by the number of leaves that the decision tree has. This algorithm is often called ‘Supervised Learning’, this basically means that the data is already labelled within classes.
(Image taken from ((Decision Tree Algorithm, n.d.))
The second mostly commonly used algorithm for Data Mining is known as Bayesian Classification (Bayesian Classifcation, n.d.), this algorithm effectively works via predicting the probability that a pattern or set of information belongs to a specific class. This algorithm is often favoured among the Data Mining techniques for its efficient results, although it needs to be taken into consideration that if the data is highly random then another algorithm would be preferred over the Bayesian Classification. It is also not recommended to use this algorithm with small data sets as this came mean a very low precision as well as recall. Although this algorithm might seem simple, it’s also highly accurate and is used often in filtering software (email spam, language filters). This algorithm is a supervised learning, as the user provides it with an already labelled dataset.
The third algorithm which should be considered for the Crowd Funding System is ‘K-Means’ (k-means, n.d.). This algorithm works by creating groups based on the set of objects this results in the in the members of the group more similar, this algorithm is often referred to as ‘Cluster Analysis’. ‘Cluster Analysis’ is a collection of different algorithms which all follow the same pattern (Clusters, n.d.). The pattern being that they create groups (or clusters) in a way which means that the cluster members are much more similar as opposed to non-grouped members. This is not quite unsupervised nor supervised learning, this is because the user states the number of clusters needed, but it still features unsupervised learning as well as the algorithm learns where the cluster belongs without the user needing to provide it with any more information.
I’ve compared 3 algorithms, a decision tree based one, a clustering based one and a naA?ve one. My recommendation for use with the Crowd Funding Software would be the Decision Tree, this is for a number of reasons, the first being that it’s extremely easy to follow, even by someone who has no prior knowledge to the algorithm. Also because it’s easy to follow and understand it’s also easy to maintain and tweak it depending on the circumstance. Another major reason that I would chose decision tree is that they work quickly as well being non-parametric. Non-parametric means that the algorithm doesn’t need specific data distribution in order to function.
Data-mining advantages and disadvantages
The main advantage of using Data-Mining for the Crowd Funding System would be that it could use ‘Affinity Analysis’ (Affinity Analysis, n.d.), this is basically a scan off all the customers previous shopping history and then be able to advertise to them directly. This applies to the Crowd Funding System as we can use data mining to find out what projects a customer prefers and then advertise those projects directly to them. (E.G If a particular user often supports Gaming Software projects on the webpage, then we can use this information to have all Gaming Software projects as the top hit on their home page.) Affinity Analysis can often be used to detect fraud, which is useful for any company. Another advantage that this business can gain from Data Mining is Customer Segmentation, this is the process of breaking the customers down into smaller group based on say age, occupation or even gender. The advantage of doing this is that you can then target your advertisement to people who will be highly interested, and the more effective the advertising the more money people will donate to the projects. This applies directly to the Crowd Funding Systems first example, using this customer segmentation the film writer / director will be advertise her project to all her previous fans, or even people who are interested in that genre, this will mean she can reach a much more interested user base. The other huge advantage of Data Mining that can be applied to the CFS is that it can help to achieve Sales Forecasting, this is exactly what it sounds like, and it uses previous sales records to relatively accurately provide predictions for future sales. This can be used by the system for the second example the Kinect mobile phone battery, if they can predict how many donations the project is going to get they can either boost its advertisement, or perhaps communicate with the user that previous similar projects haven’t been able to reach their goal or at least direct them where they went wrong.
One of the concerns the current business has is damage to its reputation, using data mining techniques they will be able to not only boost their donations and improve their advertisement, but also be able to learn more from the customers, and this can only be beneficial for the company. Donor exhaustion was also on the companies list of concerns, but data mining will be able to prevent this because it can be used to keep track of what advertisement has been sent to who, and what projects they are likely to bid on, so routinely changing the projects they are advertised will keep the users hopefully interested.
CRM stands for Customer Relationship Management (CRM, n.d.) And is used by businesses to keep their customers happy, it uses data mining techniques in order to get feedback and improve on their products constantly. The data mining algorithms discussed earlier are extremely useful for gathering and analysing information and data about customers and opinions on projects. We can then use this information to make improvements or changes where they are needed, and this will greatly increase customer satisfaction as customers will be able to see the changes they wanted. Although it is recommended to only try this with a vast amount of data, and huge amount of transactions. As smaller amounts of data can provide in accurate information. Using CRM will greatly improve the Crowd Funding Company’s reputation and mean they have a lot more satisfied donors.
In conclusion I strongly recommend that the crowd funding system decides to include data mining algorithms. It has a long list of advantages including sales prediction, improved advertisement, and mostly importantly improved customer satisfaction. I would also highly recommend the use of the decision tree algorithm as it’s easy to follow and can easily be modified depending on the information that needs to be collected. It should be noted that choice of data source is important, as some of them may provide useful information, but there are quite a few that should be ignored.A CRM should also be taken into consideration, as using this software has proven to greatly improve the publics opinion of a business. A modern business can’t afford not to use these data mining techniques, as failure to utilize these tools will mean a huge disadvantages against its competitors. The more information that can be collected from this companies customers, the more value the company can provide them, and the happier the customer the more donations that will be made.
Affinity Analysis. (n.d.). Retrieved from https://en.wikipedia.org/wiki/Affinity_analysis
Bayesian Classifcation. (n.d.). Retrieved from https://www.tutorialspoint.com/data_mining/dm_bayesian_classification.htm
Business Intelligence . (2007, March 6). Retrieved from http://www.cio.com/article/2439504/business-intelligence/business-intelligence-business-intelligence-definition-and-solutions.html
Business Intelligence. (2015, Febuary 26). Retrieved from http://businessintelligence.com/big-data-case-studies/data-driven-proof-netflix-needs-buy-blockbuster/
Clusters. (n.d.). Retrieved from https://en.wikipedia.org/wiki/Cluster_analysis
CRM. (n.d.). Retrieved from http://searchcrm.techtarget.com/definition/CRM
Decision Tree Algorithm. (n.d.). Retrieved from http://www.saedsayad.com/decision_tree.htm
Disadvantages of Business Intelligence. (n.d.). Retrieved from http://business.mapsofindia.com/business-intelligence/disadvantages.html
k-means. (n.d.). Retrieved from https://en.wikipedia.org/wiki/K-means_clustering
OLAP. (n.d.). Retrieved from http://olap.com/olap-definition/
The use of BLOB is so that the users can store there video sales pitches within the database, after some research I realised there wasn’t a dedicated media storage format and instead have to suffice story it in binary.