Discussion:
[basex-talk] Extract XML from zipped docx file
Dharmendra Singh
2017-06-28 13:22:03 UTC
Permalink
Hi All,
I have a zipped docx i have to extract the XML from the zipped docx file how can i achieve this, and what function will be used for that.
Thanks & Regards
Dharmendra Kumar Singh
Christian Grün
2017-06-28 13:28:18 UTC
Permalink
Hi Dharmendra,

In our documentation, there are some examples (e.g. [1]) that demonstrate
how to access the contents of MS Office files.

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/Archive_Module#archive:update


Am 28.06.2017 15:22 schrieb "Dharmendra Singh" <***@gmail.com>:

Hi All,

I have a zipped docx i have to extract the XML from the zipped docx file
how can i achieve this, and what function will be used for that.

Thanks & Regards

Dharmendra Kumar Singh
Dharmendra Singh
2017-06-28 14:01:42 UTC
Permalink
Hi Christian,
Thanks for your reply i tried to extract the XML from Zipped docx file by below code
let $sourceDir := 'D:\2017\doctest\Elastic search.zip'let $archive  := file:read-binary($sourceDir)for $entry in archive:entries($archive)for $extract in  fn:parse-xml(archive:extract-text($archive, $entry)) return $extract
but it throwing error :  [experr:ARCH0004] String conversion: Invalid XML character (#3). 
so what am i doing wrong here
Thanks & Regards
Dharmendra Kumar Singh

On Wednesday, 28 June 2017 6:58 PM, Christian GrÃŒn <***@gmail.com> wrote:


Hi Dharmendra,
In our documentation, there are some examples (e.g. [1]) that demonstrate how to access the contents of MS Office files.
Hope this helps,Christian
[1] http://docs.basex.org/wiki/Archive_Module#archive:update

Am 28.06.2017 15:22 schrieb "Dharmendra Singh" <***@gmail.com>:

Hi All,
I have a zipped docx i have to extract the XML from the zipped docx file how can i achieve this, and what function will be used for that.
Thanks & Regards
Dharmendra Kumar Singh
Christian Grün
2017-06-28 15:46:14 UTC
Permalink
Post by Dharmendra Singh
so what am i doing wrong here
First of all, it would be helpful if you used variable names that are
not misleading. Is $sourceDir supposed to point to a directory or a
ZIP archive?

Next, it would be awesome if you could iteratively simplify your
examples until the error disappears. This will increase chances for
helpful feedback from the list.
Post by Dharmendra Singh
i tried to extract the XML from Zipped docx file by
below code
Files of type docx are archives in itself (once again, please see
original example). If your docx files are zipped, you need to
additionally unzip the result of your archive.
Post by Dharmendra Singh
Hi Christian,
let $sourceDir := 'D:\2017\doctest\Elastic search.zip'
let $archive := file:read-binary($sourceDir)
for $entry in archive:entries($archive)
for $extract in fn:parse-xml(archive:extract-text($archive, $entry))
return $extract
but it throwing error : [experr:ARCH0004] String conversion: Invalid XML
character (#3).
Thanks & Regards
Dharmendra Kumar Singh
On Wednesday, 28 June 2017 6:58 PM, Christian Grün
Hi Dharmendra,
In our documentation, there are some examples (e.g. [1]) that demonstrate
how to access the contents of MS Office files.
Hope this helps,
Christian
[1] http://docs.basex.org/wiki/Archive_Module#archive:update
Hi All,
I have a zipped docx i have to extract the XML from the zipped docx file how
can i achieve this, and what function will be used for that.
Thanks & Regards
Dharmendra Kumar Singh
Loading...