- Human speechome project
The Human Speechome Project (pronounced "speech-ome", rhymes with "genome") is being conducted at the
Massachusetts Institute of Technology 's Media Laboratory by theCognitive Machines Group , headed by Associate Professor Deb Roy. It is an effort to observe and model the language acquisition of a single child unobtrusively at his English-speaking home in great detail over the first three years of his life. The resultant data is being used to createcomputational model s which could yield further insight into language acquisition. cite web
last = Roy, et al
first= Deb
title="The Human Speechome Project"
year=2006
url=http://www.media.mit.edu/press/speechome/speechome-cogsci.pdf
accessdate = 2008-01-03]Rationale
Most studies of human speech acquisition in children have been done in laboratory settings and with sampling rates of only a couple of hours per week. The need for studies in the more natural setting of the child's home, and at a much higher sampling rate approaching the child's total experience, led to the development of this project concept.
Methodology
A digital network consisting of eleven video cameras, fourteen microphones, and an array of data capture hardware has been installed in the home of the subject, giving as complete, 24-hour coverage of the child's experiences as possible. The motion-activated cameras are ceiling-mounted, wide-angle, unobtrusive units providing overhead views of all primary living areas. Sensitive
boundary layer microphone s are located in the ceilings near the cameras.Video image resolution is sufficient to capture gestures and head orientation of people and identity of mid-sized objects anywhere in a room, but insufficient to resolve direction of eye gaze and similar subtle details. Audio is sampled at greater than CD quality, yielding recordings of speech that are easily transcribed. A cluster of ten computers and audio samplers with a capacity of five terabytescite web
last=Wright
first=Sarah H.
title="Media Lab project explores language acquisition"
year=2006
url=http://web.mit.edu/newsoffice/2006/minding-baby.html
publisher=MIT News Office|accessdate=2008-01-03] is located in the basement of the house to capture the data. Data from the cluster is moved manually to the MIT campus as necessary for storage in a one-million-gigabyte (one-petabyte) storage facility.Privacy Issues
To provide control of the observation system to the occupants of the house, eight touch-activated displays have been wall-mounted throughout the house. These allow for stopping and starting video and or audio recording, and also provide an "oops" capability wherein the occupants can erase any number of minutes of recording permanently from the system. Motorized "privacy shutters" move to cover the cameras when video recording is turned off, providing natural feedback of the state of the system. On most days, audio recording is turned off throughout the house at night after the child is asleep and then turned back on in the morning. Audio and/or video are also often turned off periodically at the discretion of the participants, for example, during the adult dinner time.
Data Analysis Tools
Data is being gathered at an average rate of 200 gigabytes per day. This has necessitated the development of sophisticated
data-mining tools to reduce analysis efforts to a manageable level. This includes analysis of audio spectrograms. Transcripts of significant speech (all that is heard and produced by the child) add a labor-intensive dimension to the study, and advanced techniques are being developed to cope with this burden. In order to securely store the project's data, a large storage array is being constructed at theMIT Media Lab . This construction is in collaboration withBell Microproducts ,Seagate , andZetera Corporation . cite web
title= "News Announcement"
year=2006
url = http://www.media.mit.edu/press/speechome/speechome-sponsor.pdf
accessdate = 2008-01-03 ]Modeling Efforts
Building upon earlier efforts of the Cognitive Machines Group, researchers are advancing from a simpler modeling of noun-picture relationships to address issues of
semantic grounding in terms of physical and social action, and recognition of intentions. Semi-automation of learning behavior grammars from video data is being advanced to construct a behavior lexicon. Extensions of this work are focusing on developing a video parser that uses grammars constructed from acquired behavior patterns to infer latent structure underlying movement patterns. Cross-situational learning algorithms are being developed to learn mappings from spoken words and phrases to these latent structures.References
ee also
*
MIT Media Lab
*Massachusetts Institute of Technology External links
* [http://web.media.mit.edu/~dkroy/ Deb Roy's MIT home page]
* [http://technology.newscientist.com/article/dn9167-watch-language-grow-in-the-baby-brother-house.html Article in New Scientist]
* [http://www.wired.com/wired/archive/15.04/truman.html Article in Wired Magazine]
* [http://users.ecs.soton.ac.uk/harnad/Papers/Py104/pinker.langacq.html Language Acquisition] , an article by Steven Pinker of MIT. This is a non-final, draft version of this highly informative article.
Wikimedia Foundation. 2010.