Oldie but a Goldie: XPath Metadata Extraction

It's been quite some time since I've written an Alfresco blog post, but I finally decided to commit to being more active in the Alfresco community through blogs and other activities. Since I'm dusting off the old blog and starting anew, I thought it would be fitting to dust off an old Alfresco feature that can still prove to be useful. 

When thinking about a theme for the post, the first thing that came to mind was...well...hipsters...

 

Hipsters often revitalize old trends and from the picture above we find that some trends are better than others. As a Principal Solutions Engineer, I've found one vintage Alfresco feature that has proven to be useful in more than one of my pre-sales opportunities with prospective customers. If we harken back to the Alfresco 2.x days, the now defunct WCM AVM product had a cool feature to perform XPath Metadata Extraction. Essentially this was used to map XML content to AVM Web Forms that would later be used in a presentation layer of your choice. While AVM has been laid to rest, the XPathMetadataExtracter class lives on as a core repository capability. 

Now does this vintage and possibly hipster Alfresco feature deserve to be brought back into the mainstream? I say, absolutely! This is actually a very common pattern being used by lots of organizations for processing anything from financial statements to technical publications (DITA).

So lets actually dive in with an example in which we'll manage hipster or indie rock artist XML content within Alfresco. DISCLAIMER: I actually have very hipster taste in music...so don't judge too harshly ;)

The first thing we'll need is a hipster content model to manage our artists. In the model below, you'll see that its a pretty simple model with text and date properties. A few of which are multi-value properties. 

hipster-model.xml:

<?xml version="1.0" encoding="UTF-8"?>
<model name="hip:hipsterModel" xmlns="http://www.alfresco.org/model/dictionary/1.0">
  
  <description>Hipster Content Model</description>
  <author>Kyle Adams</author>
  <version>1.0</version>
  
  <imports>
    <import uri="http://www.alfresco.org/model/dictionary/1.0" prefix="d" />
    <import uri="http://www.alfresco.org/model/content/1.0" prefix="cm" />
  </imports>
  
  <namespaces>
    <namespace uri="http://www.massnerder.io/model/1.0" prefix="hip" />
  </namespaces>
  <constraints>
    <!-- Indie Genre Constant -->
    <constraint name="hip:genreConst" type="LIST">
      <parameter name="allowedValues">
        <list>
          <value>Emo</value>
          <value>Garage Rock</value>
          <value>Hardcore</value>
          <value>Indie Americana</value>
          <value>Indie Doo-Wop</value>
          <value>Indie Folk</value>
          <value>Indie Pop</value>
          <value>Indietronica</value>
          <value>Lo-fi</value>
          <value>Nu-hula</value>
          <value>Pop Punk</value>
          <value>Post Hardcore</value>
          <value>Surf Rock</value>
        </list>
      </parameter>
    </constraint>
  </constraints>
  
  <!-- Content Types -->
  <types>
    <!-- Artist Type -->
    <type name="hip:artist">
      <title>Artist</title>
      <parent>cm:content</parent>
      <properties>
        <property name="hip:artistName">
          <title>Artist Name</title>
          <type>d:text</type>
          <index enabled="true">
            <atomic>true</atomic>
            <stored>false</stored>
            <tokenised>false</tokenised>
          </index>
        </property>
        <property name="hip:label">
          <title>Label</title>
          <type>d:text</type>
          <index enabled="true">
            <atomic>true</atomic>
            <stored>false</stored>
            <tokenised>false</tokenised>
          </index>
        </property>
        <property name="hip:origin">
          <title>Origin</title>
          <type>d:text</type>
          <index enabled="true">
            <atomic>true</atomic>
            <stored>false</stored>
            <tokenised>false</tokenised>
          </index>
        </property>
        <property name="hip:genres">
          <title>Genres</title>
          <type>d:text</type>
          <multiple>true</multiple>
          <index enabled="true">
            <atomic>true</atomic>
            <stored>false</stored>
            <tokenised>false</tokenised>
          </index>
          <constraints>
            <constraint ref="hip:genreConst"/>
          </constraints>
        </property>
        <property name="hip:members">
          <title>Members</title>
          <type>d:text</type>
          <multiple>true</multiple>
          <index enabled="true">
            <atomic>true</atomic>
            <stored>false</stored>
            <tokenised>false</tokenised>
          </index>
        </property>
        <property name="hip:formed">
          <title>Date Formed</title>
          <type>d:date</type>
        </property>
        <property name="hip:disbanded">
          <title>Date Disbanded</title>
          <type>d:date</type>
        </property>
      </properties>
    </type>
  </types>
</model>

For now we'll assume that you know that you'll have to bootstrap the content model XML using a Spring context file and we'll still to the important files. Next up, we a Spring context file to bootstrap our XPathMetadata Extraction configuration. The most important part of this Spring context file is where we bootstrap the hipster-model-mappings.properties file and the hipster-model-xpath-mappings.properties file in the extracter.xml.HipsterModelMetadataExtracter bean definition. 

hipster-xml-metadata-extraction-context.xml:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

<!-- Configurations for XmlMetadataExtracters -->
<beans>
   <!-- An extractor that operates on Alfresco models -->
   <bean id="extracter.xml.HipsterModelMetadataExtracter"
         class="org.alfresco.repo.content.metadata.xml.XPathMetadataExtracter"
         parent="baseMetadataExtracter"
         init-method="init" >
      <property name="mappingProperties">
         <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
            <property name="location">
               <value>classpath:alfresco/module/massnerder-blog-xpath-metadata-extraction/metadata/extraction/hipster-model-mappings.properties</value>
            </property>
         </bean>
      </property>
      <property name="xpathMappingProperties">
         <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
            <property name="location">
               <value>classpath:alfresco/module/massnerder-blog-xpath-metadata-extraction/metadata/extraction/hipster-model-xpath-mappings.properties</value>
            </property>
         </bean>
      </property>
   </bean>

   <!-- A selector that executes XPath statements -->
   <bean
         id="extracter.xml.selector.HipsterXPathSelector"
         class="org.alfresco.repo.content.selector.XPathContentWorkerSelector"
         init-method="init">
      <property name="workers">
         <map>
            <entry key="/*">
               <ref bean="extracter.xml.HipsterModelMetadataExtracter" />
            </entry>
         </map>
      </property>
   </bean>

   <!-- The wrapper XML metadata extracter -->
   <bean
         id="extracter.xml.HipsterXMLMetadataExtracter"
         class="org.alfresco.repo.content.metadata.xml.XmlMetadataExtracter"
         parent="baseMetadataExtracter">
      <property name="overwritePolicy">
         <value>EAGER</value>
      </property>
      <property name="selectors">
         <list>
            <ref bean="extracter.xml.selector.HipsterXPathSelector" />
         </list>
      </property>
   </bean>
</beans>

Let's have a closer look at the hipster-model-mappings.properties. The main purpose of this file is to tell the XPathMetadataExtracter class which content model and which metadata properties we will be using during the extraction.

 hipster-model-mappings.properties:

# Namespaces
namespace.prefix.hip=http://www.massnerder.io/model/1.0

# Mappings
artistName=hip:artistName
label=hip:label
origin=hip:origin
genres=hip:genres
members=hip:members
formed=hip:formed
disbanded=hip:disbanded

The hipster-model-xpath-mappings.properties is where all the magic happens. This properties file will use metadata property names defined in the previous hipster-model-mappings.properties properties file and will use a corresponding XPath expression to extract values in XML files we'll upload afterwards. 

hipster-model-xpath-mappings.properties

# Hipster Property XPath Mappings
artistName=/artist/@name
label=/artist/label
origin=/artist/origin
genres=/artist/genres/genre/text()
members=/artist/members/member/text()
formed=/artist/formed
disbanded=/artist/disbanded

And lastly we have a sample artist XML file with the following content that we'll upload to a Share collaboration site.

shakey-graves.xml

<?xml version="1.0" encoding="UTF-8"?>
<artist name="Shakey Graves">
    <label>Indepedent</label>
    <origin>Austin, Texas, USA</origin>
    <genres>
        <genre>Indie Americana</genre>
    </genres>
    <members>
        <member>Alejandro Rose-Garcia</member>
    </members>
    <formed>2007</formed>
    <disbanded></disbanded>
</artist>

Then we upload the shakey-graves.xml file to our Discography collaboration site and specialize it to the Artist content type, we have the following results:

METADATA EXTRACTION RESULTS IN ALFRSCO SHARE

 

Check out this video to see XPath Metadata Extraction working in real-time...

 

In true hipster fashion, we've taken an old feature and brought it back to life. Hipsters and a hipster trends get a bad wrap, but I think XPath Metadata Extraction has proven to be pretty useful in my experience. So get out there and grow an ironic mustache, make your own clothing, and pickle things that should never be pickled! When you've got all your hipster gear in check, grab the source from this post on GitHub here. 

Also come join us at Alfresco Day in San Francisco on August 4th, 2015. The registration page and event details can be found here!

Keep Calm and Hipster On!!!