IF the OCR process was accurate, it certainly would have located the title of the paper (which is just a few lines below). As you can see in the above image, you can't even Control-F for the title of the document: there are zero hits for the title. That's when I noticed that much of the first page of text had NOT been recognized! Huh. Which led me to a lovely Help Center article about how to import a PDF file into your Google Drive, then open it with Docs. ![]() I also remembered that Google Docs had some OCR capability, so my first query was: So this Challenge is really about "tool finding" - can you figure out how to convert from a scanned document into a readable / findable / searchable one?Īs we've talked about before, taking a scanned document and converting the scan into recognizable text is called "Optical Character Recognition," or OCR, so I'm going to use that in my query. Once you've done that, can you determine how many times the authors refer to "multiple documents" in that paper? (This was my original search task-finding interesting papers about how people read multiple documents at the same reading session. ![]() How can you transform this document ( LINK) into something that you can search within? 2. Let's review: the SearchResearch Challenge for this week is meant to give you an additional powerful tool for importing scanned documents and making them findable.ġ. ![]() ![]() there are many ways to search in a scanned PDF for some text.
0 Comments
Leave a Reply. |