Intelligent Web Mining

Menasalvas, Ernestina; Marbán, Oscar; Millán, Socorro; Peña, Jose M.

doi:10.1007/978-3-7908-1772-0_22

Intelligent Web Mining

Menasalvas, Ernestina ¹
Marbán, Oscar ²
Millán, Socorro ³
Peña, Jose M. ¹

1 Universidad Politécnica de Madrid

Universidad Politécnica de Madrid

Madrid, España

ROR https://ror.org/03n6nwv02
2 Universidad Carlos III de Madrid

Universidad Carlos III de Madrid

Madrid, España

ROR https://ror.org/03ths8210
3 Universidad del Valle (Colombia)

Universidad del Valle (Colombia)

Santiago de Cali, Colombia

ROR https://ror.org/00jb9vg53

Mostrar afiliaciones +

Libro:

Intelligent Exploration of the Web. Studies in Fuzziness and Soft Computing, vol 111

Editorial: Physica

ISSN: 1434-9922, 1860-0808

ISBN: 9783790825190, 9783790817720

Año de publicación: 2003

Páginas: 363-388

Tipo: Capítulo de Libro

DOI: 10.1007/978-3-7908-1772-0_22 GOOGLE SCHOLAR Acceso abierto editor

Resumen

Explosive growth in size and usage of the World Wide Web has made it Necessary for Web site administrators to track and analyze the navigation patterns of Web site visitors. However, data mining techniques are not easily applicable to Web data due to problems both related with the technology underlying the Web and the lack of standards in the design and implementation of Web pages. Information collected by Web servers and kept in the server log is the main source of data for analyzing user navigation patterns.Once logs have been preprocessed and sessions have been obtained there are several kinds of access pattern mining that can be performed depending on the needs of the analyst. It is important to mention that most efforts have relied on relatively simple techniques which can be inadequate for real user profile data since noise in the data has to be firstly tacked. Thus, there is a need for robust methods that integrates different intelligent techniques that are free of any assumptions about the noise contamination rate.In this paper, the problem of mining behavior patterns on the Web is studied in detail and different approaches to solve the problem are analyzed. An algorithm is given to calculate frequent access patterns. This algorithm is based on a model structure that has been called WPC-Tree that stores in each node relevant information about pages that make it possible to apply data mining techniques to obtain useful patterns.

Referencias bibliográficas

B. Mobasher, N. Jain, E. Han, and J. Srivastava. (1997) Web mining: Pattern discovery from WWW transaction. In Int Conference on Tools with Artificial Intellgence, pages 558–567, New port.
Jiawei Han and Micheline Kamber. (2001) Data Mining: Concepts and Techniques. Morgan Kaufmann publishers.
Oren Etzioni. (1996) The World-Wide Web: Quagmire or gold mine? Communications of the ACM, 39 (11): 65–77.
M. Perkowitz and O. Etzioni. (1998) Adaptive web sites: Automatically synthesizing web pages. In Fifteenth National Conference on Artificial Intelligence.
http://www.statlab.cam.ac.uk /sret1/analalog.
http://www.boutell.com /wusage.
http://www.internetworld.com/print/monthly/1997/06/iwlabs.html .
Jaideep Srivastava, Robert Cooley, Mukund Deshpande, and Pang-Ning Tan. (2000) Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1: 12–23.
D. Florescu, A. Levy, and A. Mendelzon. (1998) Database techniques for the World-Wide Web: A survey. SIGMOD Record (ACM Special Interest Group on Management of Data), 27 (3): 59.
Tak Woon Yan, Matthew Jacobsen, Hector Garcia-Molina, and Umeshwar Dayal. (1996) From user access patterns to dynamic hypertext linking. Computer Networks and ISDN Systems, 28 (7–11): 1007–1014.
Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava. (1999) Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1 (1).
M. Spiliopoulou, L. Faulstich, and K. Wilkler. (1999) A data miner analyzing the navigational behaviour of web users. In Proc. Of the Workshop on Machine Learning in User Modelling of the ACAI99, Greece.
Myra Spiliopoulou, Carsten Pohle, and Lukas Faulstich. (1999) Improving the efiectiveness of a web site with web usage mining. In Proceedings WEBKDD99.
Rob Barret, Paul Maglio, and Daniel Kellern (1997). Web browser Intelligence: Opening up the web. In Proceedings of COMPCON97, page 122.
J. C. Bezdek. (1981) Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York.
C. Shahabi, A. M. Zarkesh, J. Adibi, and V. Shah. (1997) Knowledge discovery from user’s web-page navigation. In Proceedings of the Seventh International Workshop on Research Issues in Data Engineering, High Performance Database Management for Large-Scale Applications (RIDE’97), pages 20–31, Washington- Brussels-Tokyo, IEEE.
Olfa Nasraoui, Hichem Frigui, Anupam Joshi, and Raghu Krishnapuram. (1999) Mining web access logs using relational competitive fuzzy clustering. In Proceedings of the International Fuzzy Systems Association Congress, Chungli, Taiwan.
R. J. Hathaway, J. W. Davenport, and J. C. Bezcez. (1989) Relational duals of the c-means algorithms. Pattern recognition, 22: 205–212.
O. Nasraoiu, R. Krisnapuram, and A. Joshi. Mining web access logs using a fuzzy realtional clustering algrotihm based on a robust estimator.
Yongjian Fu. Clustering of web users based on access patterns.
Jiawei Han, Yandong Cai, and Nick Cercone. (1992) Knowledge discovery in databases: An attribute-oriented approach. In Li-Yan Yuan, editor, Very large data bases: VLDB ‘82, proceedings of the 18th International Conference on Very Large Data Bases, August 23–27, 1992, Vancouver, Canada, pages 547–559, Los Altos, CA 94022, USA. Morgan Kaufmann Publishers.
Tian Zhang, Raghu Ramakrishnan, and Miron Livny. (1996) BIRCH: an effcient data clustering method for very large databases. In H. V. Jagadish and Inderpal Singh Mumick, editors, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4–6, 1996, pages 103–114, New York, NY 10036, USA. ACM Press.
B. Mobasher, H. Dai, T. Luo, M. Nakagawa, and J. Witshire. (2000) Discovery of aggregate usage profiles for web personalization. In Proceedings of the WebKDD Workshop.
Pang-Ning Tan and Vipin Kumar. (2000) Modeling of web robot navigational patterns. In Workshop on Web Mining for E-Commerce-Challenges and Opportunities Working Notes (KDD2000), pages 111–117, Boston, MA, August.
Gaul Wolfang and Schmidt-Thieme Lars. (2000) Mining web navigation path fragments. In Workshop on Web Mining for E-Commerce-Challenges and Opportunities Working Notes (KDD2000), pages 105–110, Boston, MA, August.
Jose Borges and Mark Levene. (2000) A fine grained heuristic to capture web navigation patterns. SIGKDD Explorations, 2 (1): 40–50.
J. Borges and M. Levene. (1999) Heuristics for mining high quality user web navigation patterns. Research Note RN/99/68, Department of Computer Science, Gower Street, London, UK, October.
J. Borges and M. Levene. (2000) A heuristic to capture longer user web navigation patterns. In Proc. Of the First International Conference on Electronic Commerce and Web Technologies, Greenwich, U.K., September.
Ming-Syan Chen, Jong Soo Park, and Philip S. Yu. (1998) EÆcient data mining for path traversal patterns. IEEE Transactions on knowledge and data engineering, 10(2):209–221, march/april.
Jian Pei, Jiawei Han, Behzad Mortazavi-AsI, and Hua Zhu. (2000) Mining access patterns eiEciently from web logs. In Proceedings Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’00).
Cyrus Shahabi, Adil Faisal, Farnoush Banaei Kashani, and Jabed Faruque. (2000) 1NSITE: A tool for real-time knowledge discovery from users web navigation. In Proceedings of Very Large Databases (VLDB’2000), Cairo, Egypt, September.
Cyrus Shahabi, Adil Faisal, Farnoush Banaei Kashani, and Jabed Faruque. (2000) Insite: A tool for interpreting users? interaction with a web space. In Amr El Abbadi, Michael L. Brodie, Sharma Chakravarthy, Umeshwar Dayal, Nabil Kamel, Gunter Schlageter, and Kyu-Young Whang, editors, VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10–14, 2000, Cairo, Egypt, pages 635–638. Morgan Kaufmann.
H. Kato, T. Nakayama, and Y. Yamane. (2000) Navigation analysis tool based on the correlation between contents distribution and access patterns. In Workshop on Web Mining for E-Commerce-Challenges and Opportunities Working Notes (KDD2000), pages 95–104, August.
Myra Spiliopoulou and Lukas C. Faulstich. (1998) WUM: a Web Utilization Miner. In Workshop on the Web and Data Bases (WebDB98), pages 109–115.
Stuart Schechter, Murali Krishnan, and Michael D. Smith. (1998) Using path profiles to predict HTTP requests. Computer Networks and ISDN Systems, 30(1–7):457–467, April.
Cyrus Shahabi, Farnoush Banaei-Kashani, Jabed Faruque, and Adil Faisal. (2001) Feature matrices: A model for elEcient and anonymous web usage mining. In Proceedings of EC-Web 2001, Germany, September.
John S. Breese, David Heckerman, and Carl Kadie. (1998) Empirical analysis of predictive algorithms for collaborative filtering. In Gregory F. Cooper and Serafin Moral, editors, Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 43–52, San Francisco, July 24–26. Morgan Kaufmann.
Joseph A. Konstan, Bradley N. Miller, David Maltz, Jonathan L. Herlocker, Lee R. Gordon, and John Riedl. (1997) GroupLens: Applying collaborative filtering to Usenet news. Communications of the ACM, 40(3):77–87, March.
Upendra Shardanand and Patti Maes. (1995) Social information filtering: Algorithms for automating \word of mouth“. In Proceedings of ACM CI-11’95 Conference on Human Factors in Computing Systems, volume I of Papers: Using the Information of Others, pages 210–217.
Daniel Billsus and Michael J. Pazzani. (1998) Learning collaborative information filters. In Proc. 15th International Conf. on Machine Learning, pages 46–54. Morgan Kaufmann, San Francisco, CA.
Slodoban Vucetic and Zoran Obradovic. (2000) A regression based approach for scaling-up personalized recommeder systems in e-commerce. In The Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Workshop on Web Mining for E-Commerce-Challenges and Opportunities), August.
Lise Getoor and Mehran Sahami. Using probabiistic relational models for collaborative filtering.
Thomas Hofmann and Jan Puzicha. (1999) Latent class models for collaborative filtering. In Dean Thomas, editor, Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99-Vol2), pages 688–693, S.F., July 31-August 6. Morgan Kaufmann Publishers.
Yezdi Lashkari. (1995) Feature guided atomated collaborative filtering. Master’s thesis, Massachutes institute of tech.
T. Joachims, D. Freitag, and T. Mitchell. (1997) Webwatcher: A tour guide for the world wide web. In Proceedings of IJCAI97.
Henry Lieberman, Christopher Fry, and Louis Weitzman. (2001) Exploring the web with reconnaissance agents. Communications of the ACM, 44 (8): 69–75.
Henry Lieberman. (1995) Letizia: An agent that assists web browsing. In Chris S. Mellish, editor, Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 924–929. Morgan Kaufmann publishers Inc.: San Mateo, CA, USA, August 20–25.
http://www.alexa.com .
J. Budzik, K.J. Hammond, C. Marlow, and A. Scheinkman. (1998) Anticipating information needs: Everyday applications as interfaces to internet information sources. In Proceedings of the 1998 World Conference on the W W W, Internet, and Intranet. AACE Press.
Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David Gibson, and Jon Kleinberg. (1998) Automatic resource compilation by analyzing hyperlink structure and associated text. Computer Networks and ISDN Systems, 30 (1–7): 65–74.
Y. S. Choi and S. I. Yoo. (1999) Multi-agent Web information retrieval: Neural network based approach. Lecture Notes in Computer Science, 1642: 499.
P. Werbos. (1974) Beyond Regression New Tools for Prediction and Analysis in the Behaviroal Scienes. PhD thesis, Harvard.
Y Yao, H. J. Hamilton, and X.W Wang. (2000) PagePrompter: An intelligent agent for web navigation created using data mining techniques. Technical report, Department of Computer Science, November.
J. Hartigan. (1975) Clustering Algorithm. John Willey.
Rakesh Agrawal and Ramakrishnan Srikant. (1994) Fast algorithms for mining association rules. In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487–499. Morgan Kaufmann, 12–15 September.
Juan Pedro Caraça-Valente and Ignacio Lopez-Chavarrias. (2000) Discovering similar patterns in time series. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-00), pages 497–505, N. Y., August 20–23. ACM Press.Lenz M., Hübner A., Kunze M. (1998). Textual CBR. In: Lenz M., Bartsch-Spörl B., Burkhard H.-D., Wess S. (Eds.) (1998): Case-Based Reasoning Technology. From Foundations to Applications. Springer Verlag, Berlin, Heidelberg.

Intelligent Web Mining

Universidad Politécnica de Madrid

Universidad Carlos III de Madrid

Universidad del Valle (Colombia)

Resumen

Referencias bibliográficas