Email List: Xaustin-group-lX
[All Lists]

Re: find + xargs

To: yyyyyyyyyyyyyy@xxxxxxxxxxxxx
Subject: Re: find + xargs
From: Paul Eggert <yyyyyy@xxxxxxxxxxx>
Date: Fri, 23 Mar 2001 11:47:42 -0800 (PST)
References: <200103231720.RAA04194@squonk.unisoft.com> <200103231811.KAA08450@ellibrocorto.com>
> From: yyy@xxxxxxxxxxx (Geoff Clare)
> Date: Fri, 23 Mar 2001 17:20:26 +0000
> 
> The other existing solution is an extension to "find" to make it do
> the argument aggregation internally, so xargs is not needed:
> 
>     find . -type f -exec some_command {} +
> 
> Here the use of "+" instead of ";" as the command terminator causes
> find to substitute sets of pathnames for "{}" instead of single
> pathnames.  I believe this works on all SVR4-derived systems.

It doesn't work on Solaris 8, at least not for me.  Also, it is not a
pure extension, since it invalidates conforming commands like this:

find . -type f -exec echo + {} ';'


> Date: Fri, 23 Mar 2001 10:11:48 -0800 (PST)
> From: Eric Fischer <yyy@xxxxxxxxx>
> 
> I would like to suggest something that is less common but seems a
> little more useful to me.  A couple of years ago I submitted a patch
> to NetBSD to add a new "-printx" flag to the find command.  Like
> print0, it generates output in a format that xargs can use safely,
> but unlike print0, it works just by adding a backslash before every
> space, tab, apostrophe, quote, newline, or backslash.  The advantage
> I see to doing it this way is that the output can still be processed
> by other filters,

Not reliably, since many filters get it wrong in the presence of
multibyte characters.  Did your patch quote all backslash bytes, or
just backslash characters?  If the former, it's not correct for
Shift-JIS.

And even if the filter gets it right, you'll definitely have problems
if 'find' and 'xargs' run in different locales.

Worst of all: if the file name has an encoding error, you are outside
the realm of the standard if you process it with a text-based tool;
you cannot reliably handle the name in that case.  These names
unfortunately occur in practice, for various reasons (not all of them
nefarious :-).

> unlike text with NUL terminated lines, which is
> basically suitable only for handing directly to xargs.

It's also suitable for handing to "perl -0", to GNU "sort -z", etc.

It should be easy to write a little perl script that generates the
backslash form from the NUL-terminated form, though again I'd still
worry about multibyte characters particularly if you have mixed
locales.

<Prev in Thread] Current Thread [Next in Thread>