|Understanding Instructional Support Needs of Emerging Internet Users for Web-based Information Seeking
NAMAN K. Gupta
Language Technologies Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh PA, 15213
carolyn penstein rosé
Language Technologies Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh PA, 15213
As the wealth of information available on the Web increases, Web-based information seeking becomes a more and more important skill for supporting both formal education and lifelong learning. However, Web-based information access poses hurdles that must be overcome by certain student populations, such as low English competency users, low literacy users, or what we will refer to as emerging Internet users, since the bulk of information available on the Web is provided in a small number of high profile languages such as English, Korean, and Chinese. These issues continue to be problematic despite research in cross-linguistic information retrieval and machine translation, since these technologies are still too brittle for extensive use by these user populations for the purpose of bridging the language gulf. In this paper, we propose a mixed-methods approach to addressing these issues specifically in connection with emerging Internet users, with data mining as a key component. The target emerging Internet users of our investigation is that of rural children who have recently become part of a technical university student population in the Indian state of Andhra Pradesh. As Internet penetration increases in the developing world and at the same time populations shift from rural to urban life, such populations of emerging Internet users will be an important target for scaffolding and educational support. In this context, in addition to using the Internet for their own personal information needs, students are expected to be able to receive assignments in English and use the Web to meet information needs specified in their assignments. Thus, we began our investigation with a small, qualitative study in which we investigate in detail the problems faced by these students responding to search tasks that were given to them in English. We first present a qualitative analysis of the result write-up of an information-seeking task along with some observations about the corresponding search behavior. This analysis reveals difficulties posed by the strategies students were observed to employ to compensate for difficulties understanding the search task statement. Based on these specific observations, we ran an extensive controlled study in which we manipulated both characteristics of the search task as well as the manner in which it was presented (i.e., in English only, in the native language of Telugu only, or presented both in English and the native language). One important contribution of this work is a dataset from roughly 2,000 users including their pre-search response to the task statement, a log of their click behavior during search, and their post-search write up. A data mining methodology is presented that allows us to understand more broadly the difficulties faced by this student population as well as how the experimental manipulation affected their search behavior. Results suggest that using machine translation for the limited task of translating information seeking task statements, which is more feasible than translating queries or large scale translation of search results, may be beneficial for these users depending on the type of task. The data mining methodology itself, which can be applied as an assessment technique for evaluating search behavior in subsequent research, is a second contribution. Finally, the findings from statistical analysis of the study results and data mining are a third contribution of the work.
Key Words and Phrases: Personalization, Emerging Internet Users, Nonnative English speakers, Web-log Analysis, Information Seeking Task, Search strategies.
Process mining during inquiry based learning has been an area of interest within the Educational Data Mining community [Howard et al., 2010; Montalvo et al., 2010;Bachmann et al., 2010; Jeong et al., 2010]. While many special purpose inquiry learning environments exist and have been evaluated in this literature using these techniques, naturalistic inquiry learning typically happens on the Web. In this article we expand the focus of this work to include process mining of Inquiry learning on the Web. With the rise of the Internet, both in terms of opportunities for on-line social interactions and learning oriented discussions as well as web-based information seeking, more and more learning, including both formal and informal learning, is taking place on-line. Many institutions of higher learning, including that of the authors, requires undergraduate students to participate in formal instruction related to information literacy in general, and specifically targeting Web based information seeking. Consistent with this vision, web based information seeking is one of the skills discussed as part of the agenda presented in the Roadmap for Education Technology [Woolf et al., 2010, pp 12-13]:
“Students need support … in using a variety of exploratory and inquiry tools. This can be accomplished through agents, simulations and artificial intelligence methods that … scaffold learners and support student exploration. Students should be supported to search for a wide variety of information, connect to the real world, gather and analyze data, and communicate through a variety of social channels.”
The Web provides an immense resource for adult learning, which is frequently motivated by real life needs, as well as for classroom learning, which is frequently initiated by formal assignments. The specific student population investigated in this article is being trained to use web-based information seeking technology in an Information Technology course all first-year students at their institution are required to take. In that context, learning to use this technology is both a formal learning objective as well as a tool to support both formal and informal learning. However, it is neither clear yet who are successful Web information searchers, nor what makes them good, and even less how to support them to be good information seekers. There is much evidence that struggles with poor information seeking and information management skills persist into adulthood [Rice et al., 2001; Grassian & Kaplowitz, 2001; Neely, 2006], and are exacerbated within low-literacy and less educated populations. Because of this, information literacy education is increasingly a topic of focus within the education community. In this paper we present a methodology for investigating this phenomenon, which makes heavy use of large scale data analysis and data mining. We present an example application of our methodology to a unique user population, which we refer to as emerging Internet users, and offer new insights into this timely problem as well as a broader vision for applying these findings towards technology for supporting students in becoming more effective information seekers.
So far much work related to web based information seeking has taken place primarily within the developed world, as evidenced by the rarity of papers targeting the developing world at information retrieval conferences such as SIGIR. However, as Internet penetration begins to make its way into less developed regions, different support needs will become salient. In order for the ideal of lifelong learning to be realized in the developing world, support for information seeking that targets the unique needs of students within these communities of emerging Internet users is needed.
The users in our study are unique among studies of search behavior previously published [Agichtein et al., 2006; Duggan and Payne 2008; White et al., 2009]. The users in our study are 11th grade students1 from the Indian state of Andhra Pradesh who have come to study at a university developed as an outreach to the rural youth of that state. Students at this university were selected because they were the top ranking students in the village schools they came from. Many of these students had never seen or used a computer before coming to the campus. Furthermore, although they had studied English for 10 years prior to coming to the campus, more than half of them had done their schooling primarily in their mother tongue, Telugu. Each student who comes to study at this university is given a laptop, and most of the instruction is delivered in a computer supported fashion. All of its courses are conducted purely with English as the medium of instruction. Thus, the students who come to the campus are faced with two major challenges. First, they must adapt to the computer-based infrastructure, and second, they must adapt to English-based instruction. At the time of the study, the students were about to complete their first year at the university. During this year, they were provided Internet access for a short period of 1 month. So most students have a minimal experience of searching on the Web.
As universities in the developing world begin to reach further into rural communities and provide opportunities for students from those communities to obtain a quality education and move to a more developed part of the country, the specific needs of these types of students will need to be addressed.
In the remainder of the paper we first present the theoretical foundation for this research, including a review of prior work in the area and a broad vision for how the findings from this research can be applied to instructional interventions. We then present the details of a small scale qualitative study we conducted in order to investigate up close the difficulties this student population has with information seeking tasks presented in English. From this we develop hypotheses about possible supportive technologies. We test these hypotheses in a large scale experimental study, which provides a large corpus for our data mining experiments. We present both the mixed methods approach, with data mining as a key component, as one contribution of the work along with the data set itself, which provides another contribution. Finally, we provide an in depth analysis of the data, which provides additional insight into the difficulties this user population has with Web based information access as well as the potential impact of a proposed form of support. We conclude with a discussion of the limitations of this work and directions for continued research.
2. Theoretical BACkground
Assessments of information literacy skills at the college level show that even at top tier universities adults continue to struggle with information literacy skills, which include the ability to find information to meet specific needs [Rice et al., 2001; Grassian & Kaplowitz, 2001; Neely, 2006]. At the other end of the spectrum are individuals who have not had the benefit either of training with information technology or sufficient grounding in literacy skills. For example, individuals from rural regions in developing countries who are learning to use information technology struggle much more [Hölscher & Strube , 2000; Brandt & Uden, 2003; Jenkins, Corritore, & Wiedenbeck, 2003]. Only relatively recently has a substantial effort been invested in design issues related to information access for the emerging market of users in developing regions who are transitioning from rural life to urban life. In this article, we build on this recent earlier work as well as work on web based information seeking more generally and current work in the field of information retrieval. Beyond that we argue why we believe this is a problem ripe for educational data mining.