Semantic Shopping Monkeys

July 3rd, 2008

Yahoo! SearchMonkey, Yahoo!’s initiative to open up its search engine for third-party developers to add more flavors to the currently text-only search results, sounded like such a great idea that we jumped in as soon as we heard about it. is the only one in the world, as far as we know, who has the Semantic Web technology that can scrape, uh, I mean, extract the product image and price out of any product page URL from any merchant site. So wouldn’t it make a lot of sense to use this technology to show these product images and prices right on the first search result pages? One of our SearchMonkey applications ShoppingNotes Enhanced Result does just that. To see it in action:

  1. Click on this link
  2. Click on Add Enhanced Results on the right to allow ShoppingNotes Enhanced Result to be shown in Yahoo! search results. You will be asked to sign in to a Yahoo! account if you haven’t.
  3. Search for products on Yahoo! Search, especially “soft goods” (clothing, shoes, bags, etc.) products, for example, donna karan dress, dior bag, christian louboutin sandal, giuseppe zanotti sandal, and so on.

If your search results have URLs coming from the over 80 merchant sites (see here for the full list) that we currently support for Search Monkey, you will see our cute little logo showing up below those results, which means our ShoppingNotes Enhanced Result has been triggered to show the product images and prices.

Due to SearchMonkey’s current limitations, the first time you do a search, our ShoppingNotes Enhanced Result may be folded and you’ll need to click on the blue little downward triangle to pull it down. If you repeat that search, the product images and prices are supposed to show up automatically, without you having to click through one by one to see them all, like in this picture below:

So what are the benefits? Well, that’s obvious. This SearchMonkey application combines search engine’s quality in ranking, relevance, and authority with comparison shopping engine’s richer product information on one search result page. Product images and prices certainly make search results more useful and attractive and therefore encourage click-throughs. Compare these enhanced results to the same searches on Yahoo! Shopping or other comparison shopping engines, and you’ll see comparison shopping engines are still pretty weak and incomplete in these “soft goods” searches.

Today there are still far more people using search engines than comparison shopping engines for shopping. Enhancements on search results like this are going to benefit essentially the entire user base of Yahoo! Search. What’s more, there is no need for merchant sites to add semantic tags (like microformat, RDF, etc.) to their existing web pages or tweak their existing systems. It just works with what we have today, not what we will have. The list of supported merchant sites can also be easily extended to include any merchant sites by simply adding more trigger URL patterns. No need to re-do the scraping information extracting work for each new merchant site as automatically takes care of that.

We also have another SearchMonkey application ShoppingNotes Combo, which shows product images and prices plus two convenient links to save to My Shopping Notes and to set price alerts. You can follow the same 3-step instructions above for this link to try it out. Because this application contains links pointing to a domain different from those of the URLs in the search results, SearchMonkey requires that you click on the little blue downward triangle to pull it down.

We know the current implementation may not be perfect, due to SearchMonkey’s limitations on applications that call outside web services (as in our case). Ideally our scraper information extractor could be run at the time the search engine crawler fetches a copy from the URLs, but that would make the whole things very complicated. Nevertheless, SearchMonkey still has a pretty good start. I enjoyed working on SearchMonkey applications. And I want to give the SearchMonkey team a big applause for their innovation and execution (I was told the SearchMonkey project went from ideas to completion within only 6 months), especially Amit Kumar, the SearchMonkey product manager, who always gives quick responses in the developer group, and Paul Tarjan, the Chief Technical Monkey, who replied to my post around 02:00 in the morning.

As Microsoft just announced the acquisition of PowerSet, a Semantic Web search engine, I guess we are in the beginning to see more and more Semantic Web technology being applied to search. We at are proud of our Semantic Web technology, and we just showed a way our technology can be used to improve search. I think we are in an interesting position as all we do is about shopping. After all, as Michael Arrington puts it, e-commerce searches are all that matters, aren’t they?

Update: Yahoo! just announced BOSS, which opens up even more of their search platform. Looks like Yahoo! is serious about this open approach. Bravo!

Manual Trackback: We just got featured as the first one on the official Yahoo! Search blog The Latest Cool SearchMonkey Apps. Thanks!

Puritan's Pride: 3 for 1 - 468X60

Yahoo! is launching its search developer platform SearchMonkey (I don’t know why they name it monkey. Maybe because monkeys have some intelligence but only enough to handle trivial tasks?!). Early responses from the blogosphere seem quite positive. So I signed up for an account and played around with it a little bit. A couple of thoughts came to my mind:

First, it’s a really cool concept! This will for sure significantly improve search results. Actually, search engines are already showing this type of metadata for queries about maps, stock tickers, celebrities, etc. with Google OneBox and Yahoo! Shortcut. Now SearchMonkey is taking this concept one step further to include broader subjects and more web sites. Just imagine some day you’ll get a lot more Google OneBox type search results for a lot more queries.

However, for the technical part, SearchMonkey uses a DOM-based approach (XPath) to do the data extraction (a.k.a. scraping). That is, SearchMonkey is going to have the same disadvantages that all DOM-based approaches are born with. For example, it requires a person (or monkey) to program new XPath expressions for each new site added. Even for old sites you have dealt with before, you still need to constantly come back to re-program them when their HTML layouts are changed. With SearchMonkey, Yahoo! looks to be trying to enlist and organize an army of people (or monkeys, against Google’s army of robots, I guess) experienced in scraping and making their works shareable among the community. It is basically like a open-source, teamwork approach.

DOM-based approaches have been around for years, and has become people’s choice when you want some smartness in your scraping. But from our experience with the shopping vertical, DOM-based approaches simply don’t work so elegantly.

We at use a fundamentally different approach. We look at not only the DOM structure of an HTML page but also the semantics and many other things inside it. The result is, given any product page from any shopping site, our intelligent software is able to extract its product price and image. The process is fully automated without any involvement of people (or monkeys). That is, no XPath expressions or templates or scripts or whatever need to be programmed for any particular site.

Sounds impossible?! That’s most people’s response when they first hear about this. In fact, even Wikipedia currently says this is undoable (maybe I should try to get that page revised). Well, maybe not any more. Head to now and see it working live for yourself! Simply enter any product page URL from any shopping site and your email address, and we’ll scrape its product image and price real-time for you. And there are no monkeys working behind the scene as you send in your request:

While with Yahoo! SearchMonkey, you’ll need to deal with XML, XSLT, XPath, etc., which may just disqualify many people to be SearchMonkeys who don’t understand these things (including me):

So hopes to be the monkey for you in the shopping vertical so that you don’t have to. We do think that our technology will be an interesting complement to the SearchMonkey platform. In fact, we’d be happy to wrap our product scraping function as a SearchMonkey Data Service. What do you guys think?! Anyway, I’m going to the SearchMonkey Launch Party on May 15. I’d be happy to chat about this. Any ideas on how our technology can be used are welcome.

P.S.: Our scraping algorithm is already working with most shopping sites, although we are still fine-tuning it. We know it’s not perfect yet, but we are confident that we are heading in the right direction, and that we will get there soon.

Update: I had a chance to meet Amit Kumar, Yahoo! Director and product manager of SearchMonkey, at the Launch Party. He let me know that SearchMonkey indeed has another Web Service interface besides the DOM-based approach I previously mentioned. This makes wrapping our product scraping function as a SearchMonkey Data Service possible (and not difficult). So we’ll get started right away. Thanks, Amit!